Skip to content

Commit

Permalink
Bug #11745929 CHANGE LOCK PRIORITY SO THAT THE TRANSACTION HOLDING S-…
Browse files Browse the repository at this point in the history
…LOCK GETS X-LOCK F

This is a fix for 15y.o. bug, which had been reported over and over again, for example:
Bug #21356  Change lock priority so that the transaction holding S-lock gets X-lock first
Bug #105655 Wait for the lock that has already been acquired.
Bug #101695 Deadlock behavior

The issue is related to our starvation-preventing mechanism, in which new lock requests have to avoid overtaking transactions which already have a conflicting waiting request.

The motiviating scenario is that there is a constant flood of S-locking transactions, and one poor transaction waiting for an X-lock, and now there's a new transaction which also wants an S-lock, what do you do?

If you handle such cases by granting the S-lock (as it doesn't conflict with already *granted* locks), then you risk starving the X-locker indefinitely.
So, InnoDB, used to have a strict rule, that it doesn't matter if a lock is waiting or granted, when we check for conflicts, and thus the new-comming transaction's S-lock had to wait for the (still waiting) X-lock to be granted and released.
This prevented starvation, and as a very simple FIFO model was easy to analyze.

Then this become a bit more complicated with CATS algorithm, which follows the above rule only when adding the transaction to the queue, but later, whenever some transaction releases a lock and we rescan the queue to see if one of the waiting transactions can be granted the lock, we use a relaxed rule in which we do not care about FIFO order, nor about conflicts with waiting transactions.
The starvation of X-waiters by S-lockers was prevented by other means in the CATS algorithm: if a newcomming transaction trx1 conflicted with some existing (even waiting) lock of trx2 then an edge "trx1 --waits-for--> trx2" was added to the wait for graph, and trx1 could not be granted a lock until trx2 released its lock. In particular a newcomming S-locker would see the (oldest) waiting X-latcher and stored the edge to it, and not overtake it.

But all of these approaches had a difficulty in handling "lock escalation": a situation where trx1 already had an S-lock, but needed to upgrade it to an X-lock - in such case it often lead to deadlocks with other X-lock waiters, as follows:
1. trx1 gets an S-lock granted
2. trx2 wants an X-lock, but has to wait for trx1
3. trx1 now wants an X-lock, but the starvation-prevention mechanism forces it to wait for trx2
This lead to a cycle in the waits-for graph - a deadlock.
You'll find this scenario in avoid_deadlock_with_blocked.test

The solution proposed in this patch is based on the following realisation:
When trx1's lock request conflicts with trx2's waiting request AND we see that trx2 is already blocked by trx1, then we know we have only two options:
1. trx1 should wait for trx2
2. trx1 should ignore trx2's waiting request

Option 1 leads to a deadlock immediatelly, and deadlocks are costly to handle by InnoDB and apps, and are annoying to users.
Option 2. avoids a deadlock in this simple case (to solve this in general we'd have to look at the full transitive closure of the waits-for relation) and is justifiable:
- trx2 already had to wait for trx1,
- in theory, trx1 could have taken that stronger lock earlier which would lead to equivalent result
- our code correctness doesn't really depend on honouring waiting lock requests - for example High Priority transactions overtake them, and CATS ignores them during lock granting logic
- this will not lead to starvation: trx2 already waited for trx1, so trx2 is not "a new comming transaction", it was in the queue before trx2, and such old transactions should drain eventually

This patch uses Option 2. in the specific case where the trx1 is requesting an `X` or `X|REC_NOT_GAP` lock and trx2 is WAITING for `X` or `X|REC_NOT_GAP` lock and trx1 already holds a GRANTED `S`, `S|REC_NOT_GAP`, `X` or `X|REC_NOT_GAP` lock on the same heap_no.
(Note: in reality it's unlikely that the trx1 would request `X` or `X|REC_NOT_GAP` lock again when already holding `X` lock, because the code checks for stronger or equal lock before requesting new one)
It is important that this decission rule depends solely on locks held by trx1 and trx2 (in the sense, that nothing a third transaction trx3 can do, can influence the outcome of this rule) - thanks to that Lock System doesn't have to revaluate this rule every time some other transaction releases or requests a lock.
It is important that this decission rule doesn't allow INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S|GAP`, `X`, `X|GAP`), as inserting a record into a gap would split such WAITING lock, violating the invariant that each transaction can have at most single WAITING lock at any time.

This patch also optimizes a few existing functions, as they will be used more often now:
- `lock_rec_find_set_bit` which searches for the first bit set in a bitmap used bit-by-bit loop. Now it uses 13x times faster implementation which tries to skip 64,then 32,16, or 8 bits at a time. This is important for WAITING locks which have just a single bit set, in a bitmap with number of bits equal to the number of records on a page (which can be ~500).
- `lock_rec_has_expl` checked if a transaction already had a GRANTED equal or stronger lock then the currently requested, but it didn't use the fact that the queue is ordered so that all GRANTED locks are before all WAITING locks. The new implementation stops as soon as it finds a WAITING lock

This patch also adapts existing testcases:

The innodb.innodb_replace is very old and not documented, so I am not sure if I understood its spirit. It looks like the original bug for which it was created complained about deadlocks, and the MTR did produce some deadlock, so I am not sure if the test was supposed to demonstrate the problem can't be fixed or some other kind of deadlock than the one from report. Anyway this code no longer causes a deadlock, so I've documented that.

The main.xa test doesn't really care how we create a deadlock, because it wants to test how deadlocks are handled by XA, not what causes them.
So, I've changed the scenario to one which is guaranteed to cause a deadlock in any conceivable algorithm: we have two resources, and trx1 and trx2 grab one of them, then try to grab the other.
The only difficulty is that the test requires the specific of the two transaction to be chosen as the victim, and this depends on the implementation which currently favors to keep "heavier" transactions alive, so I've made sure one of them is heavier by letting it modify (DELETE) some rows.

Reviewed by: Marcin Babij <[email protected]>
RB:27456

Change-Id: I3f4cb5af7133e75e006d3c5ec1a9b0c7ee5987ad
  • Loading branch information
Jakub Łopuszański committed Jan 31, 2022
1 parent 4a3a056 commit 7037a0b
Show file tree
Hide file tree
Showing 23 changed files with 1,958 additions and 310 deletions.
14 changes: 9 additions & 5 deletions mysql-test/r/xa_default.result
Original file line number Diff line number Diff line change
Expand Up @@ -238,20 +238,23 @@ DROP TABLE t1;
# ASSERTION THD->TRANSACTION.XID_STATE.XID.IS_NULL()
# FAILED
#
CREATE TABLE t1 (a INT);
CREATE TABLE t1 (a INT) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);
CREATE TABLE t2 (a INT) ENGINE=InnoDB;
INSERT INTO t2 VALUES (1);
START TRANSACTION;
SELECT a FROM t1 WHERE a=1 FOR SHARE;
a
1
DELETE FROM t1;
connect con2,localhost,root;
XA START 'xid1';
# Sending:
SELECT a FROM t2 WHERE a=1 FOR SHARE;
a
1
SELECT a FROM t1 WHERE a=1 FOR UPDATE;
connection default;
# Waiting for until a transaction with 'SELECT...FOR UPDATE'
# will be locked inside innodb subsystem.
SELECT a FROM t1 WHERE a=1 FOR UPDATE;
SELECT a FROM t2 WHERE a=1 FOR UPDATE;
a
1
connection con2;
Expand All @@ -268,6 +271,7 @@ XA PREPARE 'xid1';
XA ROLLBACK 'xid1';
connection default;
DROP TABLE t1;
DROP TABLE t2;
disconnect con2;
#
# Bug#14670465 PLEASE PRINT HUMAN READABLE, ESCAPED
Expand Down
13 changes: 8 additions & 5 deletions mysql-test/r/xa_detach_on_prepare_off.result
Original file line number Diff line number Diff line change
Expand Up @@ -215,21 +215,23 @@ DROP TABLE t1;
# ASSERTION THD->TRANSACTION.XID_STATE.XID.IS_NULL()
# FAILED
#
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a INT) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);
CREATE TABLE t2 (a INT) ENGINE=InnoDB;
INSERT INTO t2 VALUES (1);
START TRANSACTION;
SELECT a FROM t1 WHERE a=1 FOR SHARE;
a
1
DELETE FROM t1;
# Connection con2
XA START 'xid1';
# Sending:
SELECT a FROM t2 WHERE a=1 FOR SHARE;
a
1
SELECT a FROM t1 WHERE a=1 FOR UPDATE;
# Connection default
# Waiting for until a transaction with 'SELECT...FOR UPDATE'
# will be locked inside innodb subsystem.
SELECT a FROM t1 WHERE a=1 FOR UPDATE;
SELECT a FROM t2 WHERE a=1 FOR UPDATE;
a
1
# Connection con2
Expand All @@ -246,6 +248,7 @@ XA PREPARE 'xid1';
XA ROLLBACK 'xid1';
# Connection default
DROP TABLE t1;
DROP TABLE t2;
#
# Bug#14670465 PLEASE PRINT HUMAN READABLE, ESCAPED
# XID DATA IN XA RECOVER OUTPUT
Expand Down
3 changes: 3 additions & 0 deletions mysql-test/suite/innodb/include/cleanup_show_locks.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
DROP PROCEDURE show_locks;
DROP PROCEDURE register_connection;
DROP TABLE connections;
24 changes: 24 additions & 0 deletions mysql-test/suite/innodb/include/prepare_show_locks.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
CREATE TABLE connections (
trx_mysql_thread_id BIGINT PRIMARY KEY,
name VARCHAR(200)
) ENGINE=InnoDB;

DELIMITER //;
CREATE PROCEDURE show_locks ()
BEGIN
# this is needed because INFORMATION_SCHEMA.INNODB_TRX is refreshed with debounce 100ms
SELECT SLEEP(1);
SELECT
connections.name, object_name, index_name, lock_type, lock_mode, lock_status, lock_data
FROM connections
JOIN INFORMATION_SCHEMA.INNODB_TRX USING (trx_mysql_thread_id)
JOIN performance_schema.data_locks ON (trx_id=engine_transaction_id)
ORDER BY 1,2,3,4,5,6,7;
END //

CREATE PROCEDURE register_connection (name VARCHAR(200))
BEGIN
INSERT INTO connections VALUES (CONNECTION_ID(), name);
END //

DELIMITER ;//
236 changes: 236 additions & 0 deletions mysql-test/suite/innodb/r/avoid_deadlock_with_blocked.result
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
CREATE TABLE connections (
trx_mysql_thread_id BIGINT PRIMARY KEY,
name VARCHAR(200)
) ENGINE=InnoDB;
CREATE PROCEDURE show_locks ()
BEGIN
# this is needed because INFORMATION_SCHEMA.INNODB_TRX is refreshed with debounce 100ms
SELECT SLEEP(1);
SELECT
connections.name, object_name, index_name, lock_type, lock_mode, lock_status, lock_data
FROM connections
JOIN INFORMATION_SCHEMA.INNODB_TRX USING (trx_mysql_thread_id)
JOIN performance_schema.data_locks ON (trx_id=engine_transaction_id)
ORDER BY 1,2,3,4,5,6,7;
END //
CREATE PROCEDURE register_connection (name VARCHAR(200))
BEGIN
INSERT INTO connections VALUES (CONNECTION_ID(), name);
END //
CALL register_connection("con1");
CALL register_connection("con2");
CALL register_connection("con3");
CREATE TABLE t1 (id INT PRIMARY KEY);
INSERT INTO t1 (id) VALUES (1);
BEGIN;
SELECT * FROM t1 FOR SHARE;
id
1
BEGIN;
SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con2_will_wait';
SELECT * FROM t1 FOR UPDATE;
SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
SELECT * FROM t1 FOR UPDATE;
id
1
COMMIT;
id
1
COMMIT;
BEGIN;
SELECT * FROM t1 WHERE id=1 FOR UPDATE;
id
1
BEGIN;
SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con2_will_wait';
SELECT * FROM t1 FOR SHARE;
SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
con1 t1 NULL TABLE IX GRANTED NULL
con1 t1 PRIMARY RECORD X,REC_NOT_GAP GRANTED 1
con2 t1 NULL TABLE IS GRANTED NULL
con2 t1 PRIMARY RECORD S WAITING 1
INSERT INTO t1 VALUES (0);
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
con1 t1 NULL TABLE IX GRANTED NULL
con1 t1 PRIMARY RECORD X,GAP,INSERT_INTENTION GRANTED 1
con1 t1 PRIMARY RECORD X,REC_NOT_GAP GRANTED 1
ROLLBACK;
ERROR 40001: Deadlock found when trying to get lock; try restarting transaction
COMMIT;
BEGIN;
SELECT * FROM t1 FOR SHARE;
id
1
BEGIN;
SELECT * FROM t1 WHERE id=1 FOR SHARE;
id
1
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
con1 t1 NULL TABLE IS GRANTED NULL
con1 t1 PRIMARY RECORD S GRANTED 1
con1 t1 PRIMARY RECORD S GRANTED supremum pseudo-record
con2 t1 NULL TABLE IS GRANTED NULL
con2 t1 PRIMARY RECORD S,REC_NOT_GAP GRANTED 1
SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con3_will_wait';
SELECT * FROM t1 FOR UPDATE;
SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
con1 t1 NULL TABLE IS GRANTED NULL
con1 t1 PRIMARY RECORD S GRANTED 1
con1 t1 PRIMARY RECORD S GRANTED supremum pseudo-record
con2 t1 NULL TABLE IS GRANTED NULL
con2 t1 PRIMARY RECORD S,REC_NOT_GAP GRANTED 1
con3 t1 NULL TABLE IX GRANTED NULL
con3 t1 PRIMARY RECORD X WAITING 1
SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con1_will_wait';
INSERT INTO t1 VALUES (0);
SET DEBUG_SYNC = 'now WAIT_FOR con1_will_wait';
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
con1 t1 NULL TABLE IS GRANTED NULL
con1 t1 NULL TABLE IX GRANTED NULL
con1 t1 PRIMARY RECORD S GRANTED 1
con1 t1 PRIMARY RECORD S GRANTED supremum pseudo-record
con1 t1 PRIMARY RECORD X,GAP,INSERT_INTENTION WAITING 1
con2 t1 NULL TABLE IS GRANTED NULL
con2 t1 PRIMARY RECORD S,REC_NOT_GAP GRANTED 1
con3 t1 NULL TABLE IX GRANTED NULL
con3 t1 PRIMARY RECORD X WAITING 1
COMMIT;
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
con1 t1 NULL TABLE IS GRANTED NULL
con1 t1 NULL TABLE IX GRANTED NULL
con1 t1 PRIMARY RECORD S GRANTED 1
con1 t1 PRIMARY RECORD S GRANTED supremum pseudo-record
con1 t1 PRIMARY RECORD S,GAP GRANTED 0
con1 t1 PRIMARY RECORD X,GAP,INSERT_INTENTION GRANTED 1
ROLLBACK;
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
ERROR 40001: Deadlock found when trying to get lock; try restarting transaction
BEGIN;
SELECT * FROM t1 FOR SHARE;
id
1
BEGIN;
SELECT * FROM t1 WHERE id=1 FOR SHARE;
id
1
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
con1 t1 NULL TABLE IS GRANTED NULL
con1 t1 PRIMARY RECORD S GRANTED 1
con1 t1 PRIMARY RECORD S GRANTED supremum pseudo-record
con2 t1 NULL TABLE IS GRANTED NULL
con2 t1 PRIMARY RECORD S,REC_NOT_GAP GRANTED 1
SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con3_will_wait';
SELECT * FROM t1 FOR UPDATE;
SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con1_will_wait';
SELECT * FROM t1 WHERE id=1 FOR UPDATE;
SET DEBUG_SYNC = 'now WAIT_FOR con1_will_wait';
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
con1 t1 NULL TABLE IS GRANTED NULL
con1 t1 NULL TABLE IX GRANTED NULL
con1 t1 PRIMARY RECORD S GRANTED 1
con1 t1 PRIMARY RECORD S GRANTED supremum pseudo-record
con1 t1 PRIMARY RECORD X,REC_NOT_GAP WAITING 1
con2 t1 NULL TABLE IS GRANTED NULL
con2 t1 PRIMARY RECORD S,REC_NOT_GAP GRANTED 1
con3 t1 NULL TABLE IX GRANTED NULL
con3 t1 PRIMARY RECORD X WAITING 1
COMMIT;
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
con1 t1 NULL TABLE IS GRANTED NULL
con1 t1 NULL TABLE IX GRANTED NULL
con1 t1 PRIMARY RECORD S GRANTED 1
con1 t1 PRIMARY RECORD S GRANTED supremum pseudo-record
con1 t1 PRIMARY RECORD X,REC_NOT_GAP GRANTED 1
con3 t1 NULL TABLE IX GRANTED NULL
con3 t1 PRIMARY RECORD X WAITING 1
id
1
COMMIT;
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
id
1
COMMIT;
BEGIN;
SELECT * FROM t1 FOR SHARE;
id
1
SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con2_will_wait';
SELECT * FROM t1 FOR UPDATE;
SET DEBUG_SYNC = 'now WAIT_FOR con2_will_wait';
SET DEBUG_SYNC = 'lock_wait_will_wait SIGNAL con3_will_wait';
SELECT * FROM t1 FOR UPDATE;
SET DEBUG_SYNC = 'now WAIT_FOR con3_will_wait';
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
con1 t1 NULL TABLE IS GRANTED NULL
con1 t1 PRIMARY RECORD S GRANTED 1
con1 t1 PRIMARY RECORD S GRANTED supremum pseudo-record
con2 t1 NULL TABLE IX GRANTED NULL
con2 t1 PRIMARY RECORD X WAITING 1
con3 t1 NULL TABLE IX GRANTED NULL
con3 t1 PRIMARY RECORD X WAITING 1
SELECT * FROM t1 WHERE id=1 FOR UPDATE;
id
1
CALL show_locks();
SLEEP(1)
0
name object_name index_name lock_type lock_mode lock_status lock_data
con1 t1 NULL TABLE IS GRANTED NULL
con1 t1 NULL TABLE IX GRANTED NULL
con1 t1 PRIMARY RECORD S GRANTED 1
con1 t1 PRIMARY RECORD S GRANTED supremum pseudo-record
con1 t1 PRIMARY RECORD X,REC_NOT_GAP GRANTED 1
con2 t1 NULL TABLE IX GRANTED NULL
con2 t1 PRIMARY RECORD X WAITING 1
con3 t1 NULL TABLE IX GRANTED NULL
con3 t1 PRIMARY RECORD X WAITING 1
COMMIT;
id
1
COMMIT;
id
1
COMMIT;
DROP TABLE t1;
DROP PROCEDURE show_locks;
DROP PROCEDURE register_connection;
DROP TABLE connections;
3 changes: 2 additions & 1 deletion mysql-test/suite/innodb/r/innodb_replace.result
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ SET DEBUG_SYNC='now WAIT_FOR default_will_wait';
SET DEBUG_SYNC='now SIGNAL select1';
ERROR 23000: Duplicate entry '3' for key 't1.PRIMARY'
INSERT INTO t1 VALUES(3,3) ON DUPLICATE KEY UPDATE b=b+10;
ERROR 40001: Deadlock found when trying to get lock; try restarting transaction
COMMIT;
a b
3 11
SET DEBUG_SYNC='write_row_replace SIGNAL insert2 WAIT_FOR select2';
REPLACE INTO t1 VALUES(3,4);
SET DEBUG_SYNC='now WAIT_FOR insert2';
Expand Down
Loading

0 comments on commit 7037a0b

Please sign in to comment.