Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Bug #11745929 CHANGE LOCK PRIORITY SO THAT THE TRANSACTION HOLDING S-…
…LOCK GETS X-LOCK F This is a fix for 15y.o. bug, which had been reported over and over again, for example: Bug #21356 Change lock priority so that the transaction holding S-lock gets X-lock first Bug #105655 Wait for the lock that has already been acquired. Bug #101695 Deadlock behavior The issue is related to our starvation-preventing mechanism, in which new lock requests have to avoid overtaking transactions which already have a conflicting waiting request. The motiviating scenario is that there is a constant flood of S-locking transactions, and one poor transaction waiting for an X-lock, and now there's a new transaction which also wants an S-lock, what do you do? If you handle such cases by granting the S-lock (as it doesn't conflict with already *granted* locks), then you risk starving the X-locker indefinitely. So, InnoDB, used to have a strict rule, that it doesn't matter if a lock is waiting or granted, when we check for conflicts, and thus the new-comming transaction's S-lock had to wait for the (still waiting) X-lock to be granted and released. This prevented starvation, and as a very simple FIFO model was easy to analyze. Then this become a bit more complicated with CATS algorithm, which follows the above rule only when adding the transaction to the queue, but later, whenever some transaction releases a lock and we rescan the queue to see if one of the waiting transactions can be granted the lock, we use a relaxed rule in which we do not care about FIFO order, nor about conflicts with waiting transactions. The starvation of X-waiters by S-lockers was prevented by other means in the CATS algorithm: if a newcomming transaction trx1 conflicted with some existing (even waiting) lock of trx2 then an edge "trx1 --waits-for--> trx2" was added to the wait for graph, and trx1 could not be granted a lock until trx2 released its lock. In particular a newcomming S-locker would see the (oldest) waiting X-latcher and stored the edge to it, and not overtake it. But all of these approaches had a difficulty in handling "lock escalation": a situation where trx1 already had an S-lock, but needed to upgrade it to an X-lock - in such case it often lead to deadlocks with other X-lock waiters, as follows: 1. trx1 gets an S-lock granted 2. trx2 wants an X-lock, but has to wait for trx1 3. trx1 now wants an X-lock, but the starvation-prevention mechanism forces it to wait for trx2 This lead to a cycle in the waits-for graph - a deadlock. You'll find this scenario in avoid_deadlock_with_blocked.test The solution proposed in this patch is based on the following realisation: When trx1's lock request conflicts with trx2's waiting request AND we see that trx2 is already blocked by trx1, then we know we have only two options: 1. trx1 should wait for trx2 2. trx1 should ignore trx2's waiting request Option 1 leads to a deadlock immediatelly, and deadlocks are costly to handle by InnoDB and apps, and are annoying to users. Option 2. avoids a deadlock in this simple case (to solve this in general we'd have to look at the full transitive closure of the waits-for relation) and is justifiable: - trx2 already had to wait for trx1, - in theory, trx1 could have taken that stronger lock earlier which would lead to equivalent result - our code correctness doesn't really depend on honouring waiting lock requests - for example High Priority transactions overtake them, and CATS ignores them during lock granting logic - this will not lead to starvation: trx2 already waited for trx1, so trx2 is not "a new comming transaction", it was in the queue before trx2, and such old transactions should drain eventually This patch uses Option 2. in the specific case where the trx1 is requesting an `X` or `X|REC_NOT_GAP` lock and trx2 is WAITING for `X` or `X|REC_NOT_GAP` lock and trx1 already holds a GRANTED `S`, `S|REC_NOT_GAP`, `X` or `X|REC_NOT_GAP` lock on the same heap_no. (Note: in reality it's unlikely that the trx1 would request `X` or `X|REC_NOT_GAP` lock again when already holding `X` lock, because the code checks for stronger or equal lock before requesting new one) It is important that this decission rule depends solely on locks held by trx1 and trx2 (in the sense, that nothing a third transaction trx3 can do, can influence the outcome of this rule) - thanks to that Lock System doesn't have to revaluate this rule every time some other transaction releases or requests a lock. It is important that this decission rule doesn't allow INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S|GAP`, `X`, `X|GAP`), as inserting a record into a gap would split such WAITING lock, violating the invariant that each transaction can have at most single WAITING lock at any time. This patch also optimizes a few existing functions, as they will be used more often now: - `lock_rec_find_set_bit` which searches for the first bit set in a bitmap used bit-by-bit loop. Now it uses 13x times faster implementation which tries to skip 64,then 32,16, or 8 bits at a time. This is important for WAITING locks which have just a single bit set, in a bitmap with number of bits equal to the number of records on a page (which can be ~500). - `lock_rec_has_expl` checked if a transaction already had a GRANTED equal or stronger lock then the currently requested, but it didn't use the fact that the queue is ordered so that all GRANTED locks are before all WAITING locks. The new implementation stops as soon as it finds a WAITING lock This patch also adapts existing testcases: The innodb.innodb_replace is very old and not documented, so I am not sure if I understood its spirit. It looks like the original bug for which it was created complained about deadlocks, and the MTR did produce some deadlock, so I am not sure if the test was supposed to demonstrate the problem can't be fixed or some other kind of deadlock than the one from report. Anyway this code no longer causes a deadlock, so I've documented that. The main.xa test doesn't really care how we create a deadlock, because it wants to test how deadlocks are handled by XA, not what causes them. So, I've changed the scenario to one which is guaranteed to cause a deadlock in any conceivable algorithm: we have two resources, and trx1 and trx2 grab one of them, then try to grab the other. The only difficulty is that the test requires the specific of the two transaction to be chosen as the victim, and this depends on the implementation which currently favors to keep "heavier" transactions alive, so I've made sure one of them is heavier by letting it modify (DELETE) some rows. Reviewed by: Marcin Babij <[email protected]> RB:27456 Change-Id: I3f4cb5af7133e75e006d3c5ec1a9b0c7ee5987ad
- Loading branch information