I was discussing with my friend on the issue I have discussed in my post Referential Integrity (https://mjawaid.wordpress.com/2009/04/01/referential-integrity/) and Mapping tables (https://mjawaid.wordpress.com/2009/04/02/mapping-tables/). My friend told me the scenario that when we create multiple indexes on the table we will get deadlocks. The reason he told me was the bucket lock, or in other words gap lock. Actually what server does is that due to multiple indexes when you lock any one record it locks multiple records, since it searches indexes and all the records it encounters during search, it locks them. That what I understood he was trying to say. I wasn’t convinced and thought of trying to create few tables with indexes and test them. For test I used MySQL4 and InnoDB engine, since the issue we were facing was on it.
So I created five tables indxtest1, indxtest2, indxtest3, indxtest4, and indxtest5.
CREATE TABLE `indxtest1` (
`id` bigint(20) NOT NULL auto_increment,
`name` varchar(255) default NULL,
`fk` bigint(20) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC
CREATE TABLE `indxtest2` (
`id` bigint(20) NOT NULL auto_increment,
`name` varchar(255) default NULL,
`fk` bigint(20) default NULL,
PRIMARY KEY (`id`),
KEY `NewIndex1` (`fk`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC
CREATE TABLE `indxtest3` (
`id` bigint(20) NOT NULL auto_increment,
`name` varchar(255) default NULL,
`fk` bigint(20) default NULL,
PRIMARY KEY (`id`),
KEY `NewIndex1` (`id`,`fk`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC
CREATE TABLE `indxtest4` (
`id` bigint(20) NOT NULL auto_increment,
`name` varchar(255) default NULL,
`fk` bigint(20) NOT NULL default ‘0’,
PRIMARY KEY (`id`,`fk`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC
CREATE TABLE `indxtest5` (
`id` bigint(20) NOT NULL auto_increment,
`name` varchar(255) default NULL,
`fk` bigint(20) default NULL,
PRIMARY KEY (`id`),
KEY `FK_indxtest4` (`fk`),
CONSTRAINT `FK_indxtest4` FOREIGN KEY (`fk`) REFERENCES `indxtest5parent` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC
And another table indxtest5parent as parent for indxtest5.
CREATE TABLE `indxtest5parent` (
`id` bigint(20) NOT NULL auto_increment,
`test` varchar(255) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC
Let me explain what is the difference between the tables. All tables except the indxtest5parent contain three fields: id, name, and fk. Don’t confuse fk with a foreign key. Id is the primary key in all tables. The main difference between the tables is the indexing on the fk field.
– The indxtest1 has no index on fk.
– The indxtest2 has an index on fk, so fk is indexed.
– The indxtest3 has a multi-field-index on id and fk combined, in addition to primary key index on id.
– The indxtest4 has a composite primary key id, fk. Therefore there is a primary key index on id and fk i.e multi-field-index.
– The indxtest5 has a foreign key fk mapping to id field of indxtest5parent. So it has foreign key index on fk.
After that I inserted some data in these tables, around just 10 records. I inserted that few records since I wasn’t testing performance with huge data, instead I was just testing that how records are searched in table with indexes, multiple indexes, and without indexes, which is useful in knowing how records are locked, implicitly when updating or explicitly.
So all tables look almost like this after inserting data:
id
|
name
|
fk
|
1
|
One
|
10
|
2
|
Two
|
12
|
3
|
three
|
13
|
4
|
Four
|
14
|
5
|
Five
|
15
|
6
|
Six
|
15
|
7
|
seven
|
14
|
8
|
eight
|
13
|
9
|
Nine
|
12
|
10
|
Ten
|
10
|
Now run the following queries on all tables:
select * from indxtest1 where id = 1;
select * from indxtest2 where id = 1;
select * from indxtest3 where id = 1;
select * from indxtest4 where id = 1;
select * from indxtest5 where id = 1;
These all queries will result in same output, i.e the first record of the table. This is very simple, and since result was filtered using the primary key in the where clause so it scanned only one record during search. We can see this by running following queries:
explain select * from indxtest1 where id = 1;
explain select * from indxtest2 where id = 1;
explain select * from indxtest3 where id = 1;
explain select * from indxtest4 where id = 1;
explain select * from indxtest5 where id = 1;
You will notice that in result the rows column will show 1, that means only one record was scanned during the search.
Now let’s filter the result using the fk field in the where clause:
select * from indxtest1 where fk = 10;
select * from indxtest2 where fk = 10;
select * from indxtest3 where fk = 10;
select * from indxtest4 where fk = 10;
select * from indxtest5 where fk = 10;
All these queries will return the same result, two records with id in 1 and 10. But how many records were scanned during search? To find out the answer run the following queries:
explain select * from indxtest1 where fk = 10;
explain select * from indxtest2 where fk = 10;
explain select * from indxtest3 where fk = 10;
explain select * from indxtest4 where fk = 10;
explain select * from indxtest5 where fk = 10;
In the rows column you will notice that for the indxtest1 table it scanned 10 records. That is reasonable since there was no indexing. Now let’s see for indxtest2 table, 2 records were scanned. This is also reasonable since fk was indexed. So far so good. Now for indxtest3 table 10 records were scanned. Hmm… ok we will discuss this in a moment. Let’s check other queries first. For indxtest4 table 10 records were scanned, and for indxtest5 table only 2 records were scanned. Result for indxtest5 table is also reasonable since it has a foreign key index on it.
Now what are the cases with indxtest3 and indxtest4? If you notice both tables have a multi-column index. According to MySQL documentation:
MySQL uses multiple-column indexes in such a way that queries are fast when you specify a known quantity for the first column of the index in a WHERE
clause, even if you do not specify values for the other columns. (http://dev.mysql.com/doc/refman/4.1/en/multiple-column-indexes.html)
It is clearly stated in the documentation that when second column is specified, MySQL will not use the index, or even if second column is used with first column with OR condition, it will not use the index. That’s why queries on indxtest3 and indxtest4 scanned all 10 records during search/select.
Now what is the effect of indexing on locking? According to MySQL documentation:
A locking read, an UPDATE
, or a DELETE
generally set record locks on every index record that is scanned in the processing of the SQL statement. (http://dev.mysql.com/doc/refman/4.1/en/innodb-locks-set.html)
And what is record lock?
Record lock: This is a lock on an index record. (http://dev.mysql.com/doc/refman/4.1/en/innodb-record-level-locks.html)
Few more points from MySQL documentation (http://dev.mysql.com/doc/refman/4.1/en/innodb-locks-set.html):
1 – For SELECT ... FOR UPDATE
or SELECT ... IN SHARE MODE
, locks are acquired for scanned rows.
2 – SELECT ... FROM ... FOR UPDATE
sets exclusive next-key locks on all index records the search encounters.
3 – UPDATE ... WHERE ...
sets an exclusive next-key lock on every record the search encounters.
4 – DELETE FROM ... WHERE ...
sets an exclusive next-key lock on every record the search encounters.
Conclusion
So, according to MySQL documentation, either we are explicitly locking records (first two points) or locks are implicit (last two points), locks will be acquired on records the search encounters. So if indexing is proper no locks will be acquired on rows on which we don’t want to. Even MySQL documentation says that:
It is important to create good indexes so that your queries do not unnecessarily need to scan many rows. (http://dev.mysql.com/doc/refman/4.1/en/innodb-locks-set.html)
Scanning many records will result in lock on those records due to which deadlocks can occur, performance can be degraded, or anything bad can happen.
If you are going to test the above scenario then you will also notice the performance differences between the queries if you have reasonable number of records in tables.