-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix zero hash items eviction #152
Conversation
Updated version info : 1.4.19 to 1.4.25 |
@dormando ping? |
Hey, This is pretty fun. May I ask if you actually ran into this in production, or if it was something you found in an audit? Not sure if this patch is enough. Since cur_hv can be 0 in the same way as the other items: This could reintroduce a corruption issue if the item pulls itself from the tail (in the case of do_item_replace and friends). Or am I missing something? |
Thank you for your reply.
Slab OOM suddenly happend in our production after runnning memcached for 1 month. Then I investigated it with gdb.
This patch itself doesn't make a corruption because
Before that, I'd like to confirm the reason of |
That's pretty wild. you should play the lottery. I feel like that was added to fix a specific bug that came up during stress testing. Struggling to remember and my notes aren't clear. I'll dig around for more or re-test, in case I was wrong and it's useless. |
First, seems like your fix won't... fix it: The line in question:
thread.c:item_trylock():
The item_trylock() is using the same value compared against cur_hv, except it lops off most of the bits since the trylock table is pretty small. It's much more likely you'd have 5 items in the tail (and then a current item) that evaluate to the same lock entry... which would make it unable to evict. Maybe that doesn't happen as often or something. Second, the introduction of that line wasn't in f7bf26c, that was a re-introduction after I'd accidentally dropped it. It actually got added in: fd39b19 which was part of a fix started in 59bd02c with a test on cb120c0 The full detail of that bug are here: https://code.google.com/archive/p/memcached/issues/260 and here: #67 (in the comments. ignore the patch). It's pretty complicated, but refreshing from this: It looks like there are (or were?) some cases where an item's lock is released but references are still held, and a call to item_alloc() happens. Which means the lock test will succeed but the item being currently worked on could get pulled from the tail, "evicted", and handed back. which then causes corruption as the original item gets freed. Way before that it looks like the cur_hv != hv was added as an optimization... but I feel stupid now, because fd39b19 just looks like it makes the logic more clear to read, but the lines are identical? Or am I unable to parse code today? |
Thanks for the information about Let me clarify the cases which leads to memory corruption and eviction stuck. without this patch:
with this patch
Right? I prefer |
I believe that's correct, but in all of my extensive notes on the bug I never expressly listed the code paths which can do that. I've piled in enough small fixes into the 'next' tree for a release, so I'm going to try to wrap up the feature I was working on to go along with them. Please feel free to keep inspecting this problem and I'll keep checking/responding, but I won't be directly digging into it again for a little bit. |
I'm still on the way to inspecting the problem, found that the commit fd39b19 is the essential to fix the 'issue 260'. These lines are not identical and the key for avoiding 'Issue 260' is caused by the race of To see the race, just incr after Checkout diff --git a/memcached.h b/memcached.h
index 8a51352..4401ca6 100644
--- a/memcached.h
+++ b/memcached.h
@@ -76,7 +76,7 @@
/** How long an object can reasonably be assumed to be locked before
harvesting it on a low memory condition. */
-#define TAIL_REPAIR_TIME (3 * 3600)
+#define TAIL_REPAIR_TIME 3
/* warning: don't use these macros with a function, as it evals its arg twice */
#define ITEM_get_cas(i) (((i)->it_flags & ITEM_CAS) ? \ And just printf "set abc 0 0 1\r\n1\r\n" | nc localhost 11211
# wait TAIL_REPAIR_TIME
sleep 5
printf "incr abc 2\r\n" | nc localhost 11211
echo "stats items" | nc localhost 11211
echo "stats cachedump 1 5" | nc localhost 11211 Result:
I think my patch will not cause the race because it always try to lock before pulling itself in |
Thanks for looking into it :) I knew I wrote that one-liner fix for a good reason. |
After inspecting the codes and past fixes around item refcount, I've come to think that it's the best fix for the problem to just remove if ((hold_lock = item_trylock(hv)) == NULL) If there were some cases where an item's lock is released but references are still held, and a call to |
Sorry for the delay (1.4.26 went out!). Just re-read things again: The cur_hv is only ever passed in from do_store_item() with an append/prepend command, or in do_add_delta(). In all other cases it's zero, as thread.c's item_alloc() wrapper passes in zero there. The problem I was probably trying to avoid was deadlocking by avoiding having the thread owning a lock, lock it again... but nothing actually prevents that from happening already. This would be super dangerous if something holding an item lock called another item_lock(), as it'll just deadlock. The codebase isn't very clear about this being a problem. Thankfully since it's just a trylock() there, it'll return EBUSY. So, I think that check can just be removed. It could be fixed by changing it to a pointer that's passed in, then: I still think given the original bug report.. if 5 items with the same item_lock() value (which shaves bytes off of the full hv), you'd occasionally hit this. Holding an item lock while calling item_alloc() only seems to happen in append/prepend and do_add_delta as I noted above though. So we can start by removing the check... feel free to push a new change into this PR (replace the old one please), or I'll get to it at some point. You'd still be stuck because only item_alloc() can call lru_pull_tail() with do_evict == true. The background thread can't evict items... but it will expire them, if they're sitting in the bottom and expired. |
Please do not care. I like live logging feature, awsome work!
Yes, the range of masked hv used in
OK, I'll soon re-push the commit. |
2fd7e2e
to
1410be4
Compare
rebased on the latest master and re-pushed. |
If all hash values of five tail items are zero on the specified slab class, expire check is unintentionally skipped and items stay without being evicted. Consequently, new item allocation consume memory space every time an item is set, that leads to slab OOM errors.
1410be4
to
dd0739c
Compare
thanks! this won't quite make it into 1.4.27, but it'll go into the next one. The release after .27 will require burn-in testing, so it's a good fit for that one. Tests would end up being flaky; anytime the hash value changes/etc. I'll put it under the burn-in test to be absolutely sure there wasn't something I forgot. |
took a lot longer than anticipated to get the testing setup going again and then through the chunked item feature. it seems to be doing fine so far so I've pushed it up to next. sorry that took so long. |
If all hash values of five tail items are zero on the specified slab
class, expire check is unintentionally skipped and items stay without
being evicted. Consequently, new item allocation consume memory space
every time an item is set, that leads to slab OOM errors.
Details of the Problem
Version: 1.4.19 to 1.4.25
How reproducible: 100%
Steps to Reproduce:
Actual results:
Expected results: