Bug #11745929 CHANGE LOCK PRIORITY SO THAT THE TRANSACTION HOLDING S-…

…LOCK GETS X-LOCK F This is a fix for 15y.o. bug, which had been reported over and over again, for example: Bug #21356 Change lock priority so that the transaction holding S-lock gets X-lock first Bug #105655 Wait for the lock that has already been acquired. Bug #101695 Deadlock behavior The issue is related to our starvation-preventing mechanism, in which new lock requests have to avoid overtaking transactions which already have a conflicting waiting request. The motiviating scenario is that there is a constant flood of S-locking transactions, and one poor transaction waiting for an X-lock, and now there's a new transaction which also wants an S-lock, what do you do? If you handle such cases by granting the S-lock (as it doesn't conflict with already *granted* locks), then you risk starving the X-locker indefinitely. So, InnoDB, used to have a strict rule, that it doesn't matter if a lock is waiting or granted, when we check for conflicts, and thus the new-comming transaction's S-lock had to wait for the (still waiting) X-lock to be granted and released. This prevented starvation, and as a very simple FIFO model was easy to analyze. Then this become a bit more complicated with CATS algorithm, which follows the above rule only when adding the transaction to the queue, but later, whenever some transaction releases a lock and we rescan the queue to see if one of the waiting transactions can be granted the lock, we use a relaxed rule in which we do not care about FIFO order, nor about conflicts with waiting transactions. The starvation of X-waiters by S-lockers was prevented by other means in the CATS algorithm: if a newcomming transaction trx1 conflicted with some existing (even waiting) lock of trx2 then an edge "trx1 --waits-for--> trx2" was added to the wait for graph, and trx1 could not be granted a lock until trx2 released its lock. In particular a newcomming S-locker would see the (oldest) waiting X-latcher and stored the edge to it, and not overtake it. But all of these approaches had a difficulty in handling "lock escalation": a situation where trx1 already had an S-lock, but needed to upgrade it to an X-lock - in such case it often lead to deadlocks with other X-lock waiters, as follows: 1. trx1 gets an S-lock granted 2. trx2 wants an X-lock, but has to wait for trx1 3. trx1 now wants an X-lock, but the starvation-prevention mechanism forces it to wait for trx2 This lead to a cycle in the waits-for graph - a deadlock. You'll find this scenario in avoid_deadlock_with_blocked.test The solution proposed in this patch is based on the following realisation: When trx1's lock request conflicts with trx2's waiting request AND we see that trx2 is already blocked by trx1, then we know we have only two options: 1. trx1 should wait for trx2 2. trx1 should ignore trx2's waiting request Option 1 leads to a deadlock immediatelly, and deadlocks are costly to handle by InnoDB and apps, and are annoying to users. Option 2. avoids a deadlock in this simple case (to solve this in general we'd have to look at the full transitive closure of the waits-for relation) and is justifiable: - trx2 already had to wait for trx1, - in theory, trx1 could have taken that stronger lock earlier which would lead to equivalent result - our code correctness doesn't really depend on honouring waiting lock requests - for example High Priority transactions overtake them, and CATS ignores them during lock granting logic - this will not lead to starvation: trx2 already waited for trx1, so trx2 is not "a new comming transaction", it was in the queue before trx2, and such old transactions should drain eventually This patch uses Option 2. in the specific case where the trx1 is requesting an `X` or `X|REC_NOT_GAP` lock and trx2 is WAITING for `X` or `X|REC_NOT_GAP` lock and trx1 already holds a GRANTED `S`, `S|REC_NOT_GAP`, `X` or `X|REC_NOT_GAP` lock on the same heap_no. (Note: in reality it's unlikely that the trx1 would request `X` or `X|REC_NOT_GAP` lock again when already holding `X` lock, because the code checks for stronger or equal lock before requesting new one) It is important that this decission rule depends solely on locks held by trx1 and trx2 (in the sense, that nothing a third transaction trx3 can do, can influence the outcome of this rule) - thanks to that Lock System doesn't have to revaluate this rule every time some other transaction releases or requests a lock. It is important that this decission rule doesn't allow INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S|GAP`, `X`, `X|GAP`), as inserting a record into a gap would split such WAITING lock, violating the invariant that each transaction can have at most single WAITING lock at any time. This patch also optimizes a few existing functions, as they will be used more often now: - `lock_rec_find_set_bit` which searches for the first bit set in a bitmap used bit-by-bit loop. Now it uses 13x times faster implementation which tries to skip 64,then 32,16, or 8 bits at a time. This is important for WAITING locks which have just a single bit set, in a bitmap with number of bits equal to the number of records on a page (which can be ~500). - `lock_rec_has_expl` checked if a transaction already had a GRANTED equal or stronger lock then the currently requested, but it didn't use the fact that the queue is ordered so that all GRANTED locks are before all WAITING locks. The new implementation stops as soon as it finds a WAITING lock This patch also adapts existing testcases: The innodb.innodb_replace is very old and not documented, so I am not sure if I understood its spirit. It looks like the original bug for which it was created complained about deadlocks, and the MTR did produce some deadlock, so I am not sure if the test was supposed to demonstrate the problem can't be fixed or some other kind of deadlock than the one from report. Anyway this code no longer causes a deadlock, so I've documented that. The main.xa test doesn't really care how we create a deadlock, because it wants to test how deadlocks are handled by XA, not what causes them. So, I've changed the scenario to one which is guaranteed to cause a deadlock in any conceivable algorithm: we have two resources, and trx1 and trx2 grab one of them, then try to grab the other. The only difficulty is that the test requires the specific of the two transaction to be chosen as the victim, and this depends on the implementation which currently favors to keep "heavier" transactions alive, so I've made sure one of them is heavier by letting it modify (DELETE) some rows. Reviewed by: Marcin Babij <[email protected]> RB:27456 Change-Id: I3f4cb5af7133e75e006d3c5ec1a9b0c7ee5987ad
mysql · Jan 31, 2022 · 7037a0b · 7037a0b
1 parent 4a3a056
commit 7037a0b
Show file tree

Hide file tree

Showing 23 changed files with 1,958 additions and 310 deletions.
diff --git a/mysql-test/r/xa_default.result b/mysql-test/r/xa_default.result
@@ -238,20 +238,23 @@ DROP TABLE t1;
 #                ASSERTION THD->TRANSACTION.XID_STATE.XID.IS_NULL()
 #                FAILED
 #
-CREATE TABLE t1 (a INT);
+CREATE TABLE t1 (a INT) ENGINE=InnoDB;
 INSERT INTO t1 VALUES (1);
+CREATE TABLE t2 (a INT) ENGINE=InnoDB;
+INSERT INTO t2 VALUES (1);
 START TRANSACTION;
-SELECT a FROM t1 WHERE a=1 FOR SHARE;
-a
-1
+DELETE FROM t1;
 connect  con2,localhost,root;
 XA START 'xid1';
 # Sending:
+SELECT a FROM t2 WHERE a=1 FOR SHARE;
+a
+1
 SELECT a FROM t1 WHERE a=1 FOR UPDATE;
 connection default;
 # Waiting for until a transaction with 'SELECT...FOR UPDATE'
 # will be locked inside innodb subsystem.
-SELECT a FROM t1 WHERE a=1 FOR UPDATE;
+SELECT a FROM t2 WHERE a=1 FOR UPDATE;
 a
 1
 connection con2;
@@ -268,6 +271,7 @@ XA PREPARE 'xid1';
 XA ROLLBACK 'xid1';
 connection default;
 DROP TABLE t1;
+DROP TABLE t2;
 disconnect con2;
 #
 # Bug#14670465 PLEASE PRINT HUMAN READABLE, ESCAPED

diff --git a/mysql-test/r/xa_detach_on_prepare_off.result b/mysql-test/r/xa_detach_on_prepare_off.result
@@ -215,21 +215,23 @@ DROP TABLE t1;
 #                ASSERTION THD->TRANSACTION.XID_STATE.XID.IS_NULL()
 #                FAILED
 #
-DROP TABLE IF EXISTS t1;
 CREATE TABLE t1 (a INT) ENGINE=InnoDB;
 INSERT INTO t1 VALUES (1);
+CREATE TABLE t2 (a INT) ENGINE=InnoDB;
+INSERT INTO t2 VALUES (1);
 START TRANSACTION;
-SELECT a FROM t1 WHERE a=1 FOR SHARE;
-a
-1
+DELETE FROM t1;
 # Connection con2
 XA START 'xid1';
 # Sending:
+SELECT a FROM t2 WHERE a=1 FOR SHARE;
+a
+1
 SELECT a FROM t1 WHERE a=1 FOR UPDATE;
 # Connection default
 # Waiting for until a transaction with 'SELECT...FOR UPDATE'
 # will be locked inside innodb subsystem.
-SELECT a FROM t1 WHERE a=1 FOR UPDATE;
+SELECT a FROM t2 WHERE a=1 FOR UPDATE;
 a
 1
 # Connection con2
@@ -246,6 +248,7 @@ XA PREPARE 'xid1';
 XA ROLLBACK 'xid1';
 # Connection default
 DROP TABLE t1;
+DROP TABLE t2;
 #
 # Bug#14670465 PLEASE PRINT HUMAN READABLE, ESCAPED
 #              XID DATA IN XA RECOVER OUTPUT

diff --git a/mysql-test/suite/innodb/include/cleanup_show_locks.inc b/mysql-test/suite/innodb/include/cleanup_show_locks.inc
@@ -0,0 +1,3 @@
+DROP PROCEDURE show_locks;
+DROP PROCEDURE register_connection;
+DROP TABLE connections;
diff --git a/mysql-test/suite/innodb/include/prepare_show_locks.inc b/mysql-test/suite/innodb/include/prepare_show_locks.inc
@@ -0,0 +1,24 @@
+CREATE TABLE connections (
+  trx_mysql_thread_id BIGINT PRIMARY KEY,
+  name VARCHAR(200)
+) ENGINE=InnoDB;
+
+DELIMITER //;
+CREATE PROCEDURE show_locks ()
+BEGIN
+  # this is needed because INFORMATION_SCHEMA.INNODB_TRX is refreshed with debounce 100ms
+  SELECT SLEEP(1);
+  SELECT
+    connections.name, object_name, index_name, lock_type, lock_mode, lock_status, lock_data
+  FROM connections
+  JOIN INFORMATION_SCHEMA.INNODB_TRX USING (trx_mysql_thread_id)
+  JOIN performance_schema.data_locks ON (trx_id=engine_transaction_id)
+  ORDER BY 1,2,3,4,5,6,7;
+END //
+
+CREATE PROCEDURE register_connection (name VARCHAR(200))
+BEGIN
+  INSERT INTO connections VALUES (CONNECTION_ID(), name);
+END //
+
+DELIMITER ;//
diff --git a/mysql-test/suite/innodb/r/avoid_deadlock_with_blocked.result b/mysql-test/suite/innodb/r/avoid_deadlock_with_blocked.result
@@ -0,0 +1,236 @@
+CREATE TABLE connections (
+trx_mysql_thread_id BIGINT PRIMARY KEY,
+name VARCHAR(200)
+) ENGINE=InnoDB;
+CREATE PROCEDURE show_locks ()
+BEGIN
+# this is needed because INFORMATION_SCHEMA.INNODB_TRX is refreshed with debounce 100ms
+SELECT SLEEP(1);
+SELECT
+connections.name, object_name, index_name, lock_type, lock_mode, lock_status, lock_data
+FROM connections
+JOIN INFORMATION_SCHEMA.INNODB_TRX USING (trx_mysql_thread_id)
+JOIN performance_schema.data_locks ON (trx_id=engine_transaction_id)
+ORDER BY 1,2,3,4,5,6,7;
+END //
+CREATE PROCEDURE register_connection (name VARCHAR(200))
+BEGIN
+INSERT INTO connections VALUES (CONNECTION_ID(), name);
+END //
+CALL register_connection("con1");
+CALL register_connection("con2");
+CALL register_connection("con3");
+CREATE TABLE t1 (id INT PRIMARY KEY);
+INSERT INTO t1 (id) VALUES (1);
+BEGIN;
+SELECT * FROM t1 FOR SHARE;
+id
+1
+BEGIN;
+SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con2_will_wait';
+SELECT * FROM t1 FOR UPDATE;
+SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
+SELECT * FROM t1 FOR UPDATE;
+id
+1
+COMMIT;
+id
+1
+COMMIT;
+BEGIN;
+SELECT * FROM t1 WHERE id=1 FOR UPDATE;
+id
+1
+BEGIN;
+SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con2_will_wait';
+SELECT * FROM t1 FOR SHARE;
+SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+con1	t1	NULL	TABLE	IX	GRANTED	NULL
+con1	t1	PRIMARY	RECORD	X,REC_NOT_GAP	GRANTED	1
+con2	t1	NULL	TABLE	IS	GRANTED	NULL
+con2	t1	PRIMARY	RECORD	S	WAITING	1
+INSERT INTO t1 VALUES (0);
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+con1	t1	NULL	TABLE	IX	GRANTED	NULL
+con1	t1	PRIMARY	RECORD	X,GAP,INSERT_INTENTION	GRANTED	1
+con1	t1	PRIMARY	RECORD	X,REC_NOT_GAP	GRANTED	1
+ROLLBACK;
+ERROR 40001: Deadlock found when trying to get lock; try restarting transaction
+COMMIT;
+BEGIN;
+SELECT * FROM t1 FOR SHARE;
+id
+1
+BEGIN;
+SELECT * FROM t1 WHERE id=1 FOR SHARE;
+id
+1
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+con1	t1	NULL	TABLE	IS	GRANTED	NULL
+con1	t1	PRIMARY	RECORD	S	GRANTED	1
+con1	t1	PRIMARY	RECORD	S	GRANTED	supremum pseudo-record
+con2	t1	NULL	TABLE	IS	GRANTED	NULL
+con2	t1	PRIMARY	RECORD	S,REC_NOT_GAP	GRANTED	1
+SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con3_will_wait';
+SELECT * FROM t1 FOR UPDATE;
+SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+con1	t1	NULL	TABLE	IS	GRANTED	NULL
+con1	t1	PRIMARY	RECORD	S	GRANTED	1
+con1	t1	PRIMARY	RECORD	S	GRANTED	supremum pseudo-record
+con2	t1	NULL	TABLE	IS	GRANTED	NULL
+con2	t1	PRIMARY	RECORD	S,REC_NOT_GAP	GRANTED	1
+con3	t1	NULL	TABLE	IX	GRANTED	NULL
+con3	t1	PRIMARY	RECORD	X	WAITING	1
+SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con1_will_wait';
+INSERT INTO t1 VALUES (0);
+SET DEBUG_SYNC = 'now WAIT_FOR con1_will_wait';
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+con1	t1	NULL	TABLE	IS	GRANTED	NULL
+con1	t1	NULL	TABLE	IX	GRANTED	NULL
+con1	t1	PRIMARY	RECORD	S	GRANTED	1
+con1	t1	PRIMARY	RECORD	S	GRANTED	supremum pseudo-record
+con1	t1	PRIMARY	RECORD	X,GAP,INSERT_INTENTION	WAITING	1
+con2	t1	NULL	TABLE	IS	GRANTED	NULL
+con2	t1	PRIMARY	RECORD	S,REC_NOT_GAP	GRANTED	1
+con3	t1	NULL	TABLE	IX	GRANTED	NULL
+con3	t1	PRIMARY	RECORD	X	WAITING	1
+COMMIT;
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+con1	t1	NULL	TABLE	IS	GRANTED	NULL
+con1	t1	NULL	TABLE	IX	GRANTED	NULL
+con1	t1	PRIMARY	RECORD	S	GRANTED	1
+con1	t1	PRIMARY	RECORD	S	GRANTED	supremum pseudo-record
+con1	t1	PRIMARY	RECORD	S,GAP	GRANTED	0
+con1	t1	PRIMARY	RECORD	X,GAP,INSERT_INTENTION	GRANTED	1
+ROLLBACK;
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+ERROR 40001: Deadlock found when trying to get lock; try restarting transaction
+BEGIN;
+SELECT * FROM t1 FOR SHARE;
+id
+1
+BEGIN;
+SELECT * FROM t1 WHERE id=1 FOR SHARE;
+id
+1
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+con1	t1	NULL	TABLE	IS	GRANTED	NULL
+con1	t1	PRIMARY	RECORD	S	GRANTED	1
+con1	t1	PRIMARY	RECORD	S	GRANTED	supremum pseudo-record
+con2	t1	NULL	TABLE	IS	GRANTED	NULL
+con2	t1	PRIMARY	RECORD	S,REC_NOT_GAP	GRANTED	1
+SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con3_will_wait';
+SELECT * FROM t1 FOR UPDATE;
+SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
+SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con1_will_wait';
+SELECT * FROM t1 WHERE id=1 FOR UPDATE;
+SET DEBUG_SYNC = 'now WAIT_FOR con1_will_wait';
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+con1	t1	NULL	TABLE	IS	GRANTED	NULL
+con1	t1	NULL	TABLE	IX	GRANTED	NULL
+con1	t1	PRIMARY	RECORD	S	GRANTED	1
+con1	t1	PRIMARY	RECORD	S	GRANTED	supremum pseudo-record
+con1	t1	PRIMARY	RECORD	X,REC_NOT_GAP	WAITING	1
+con2	t1	NULL	TABLE	IS	GRANTED	NULL
+con2	t1	PRIMARY	RECORD	S,REC_NOT_GAP	GRANTED	1
+con3	t1	NULL	TABLE	IX	GRANTED	NULL
+con3	t1	PRIMARY	RECORD	X	WAITING	1
+COMMIT;
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+con1	t1	NULL	TABLE	IS	GRANTED	NULL
+con1	t1	NULL	TABLE	IX	GRANTED	NULL
+con1	t1	PRIMARY	RECORD	S	GRANTED	1
+con1	t1	PRIMARY	RECORD	S	GRANTED	supremum pseudo-record
+con1	t1	PRIMARY	RECORD	X,REC_NOT_GAP	GRANTED	1
+con3	t1	NULL	TABLE	IX	GRANTED	NULL
+con3	t1	PRIMARY	RECORD	X	WAITING	1
+id
+1
+COMMIT;
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+id
+1
+COMMIT;
+BEGIN;
+SELECT * FROM t1 FOR SHARE;
+id
+1
+SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con2_will_wait';
+SELECT * FROM t1 FOR UPDATE;
+SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
+SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con3_will_wait';
+SELECT * FROM t1 FOR UPDATE;
+SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+con1	t1	NULL	TABLE	IS	GRANTED	NULL
+con1	t1	PRIMARY	RECORD	S	GRANTED	1
+con1	t1	PRIMARY	RECORD	S	GRANTED	supremum pseudo-record
+con2	t1	NULL	TABLE	IX	GRANTED	NULL
+con2	t1	PRIMARY	RECORD	X	WAITING	1
+con3	t1	NULL	TABLE	IX	GRANTED	NULL
+con3	t1	PRIMARY	RECORD	X	WAITING	1
+SELECT * FROM t1 WHERE id=1 FOR UPDATE;
+id
+1
+CALL show_locks();
+SLEEP(1)
+0
+name	object_name	index_name	lock_type	lock_mode	lock_status	lock_data
+con1	t1	NULL	TABLE	IS	GRANTED	NULL
+con1	t1	NULL	TABLE	IX	GRANTED	NULL
+con1	t1	PRIMARY	RECORD	S	GRANTED	1
+con1	t1	PRIMARY	RECORD	S	GRANTED	supremum pseudo-record
+con1	t1	PRIMARY	RECORD	X,REC_NOT_GAP	GRANTED	1
+con2	t1	NULL	TABLE	IX	GRANTED	NULL
+con2	t1	PRIMARY	RECORD	X	WAITING	1
+con3	t1	NULL	TABLE	IX	GRANTED	NULL
+con3	t1	PRIMARY	RECORD	X	WAITING	1
+COMMIT;
+id
+1
+COMMIT;
+id
+1
+COMMIT;
+DROP TABLE t1;
+DROP PROCEDURE show_locks;
+DROP PROCEDURE register_connection;
+DROP TABLE connections;
diff --git a/mysql-test/suite/innodb/r/innodb_replace.result b/mysql-test/suite/innodb/r/innodb_replace.result
@@ -18,8 +18,9 @@ SET DEBUG_SYNC='now WAIT_FOR default_will_wait';
 SET DEBUG_SYNC='now SIGNAL select1';
 ERROR 23000: Duplicate entry '3' for key 't1.PRIMARY'
 INSERT INTO t1 VALUES(3,3) ON DUPLICATE KEY UPDATE b=b+10;
-ERROR 40001: Deadlock found when trying to get lock; try restarting transaction
 COMMIT;
+a	b
+3	11
 SET DEBUG_SYNC='write_row_replace SIGNAL insert2 WAIT_FOR select2';
 REPLACE INTO t1 VALUES(3,4);
 SET DEBUG_SYNC='now WAIT_FOR insert2';