-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for deadlock issue 1980 #2121
Fix for deadlock issue 1980 #2121
Conversation
This unit test fails until we can remove callbacks that are being executed.
This fixes a potential deadlock when a timer is being removed that's also being executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me. I also confirmed that this PR fixes the deadlock problem introduced here.
@fujitatomoya , is there anything else I can do to help get this issue merged and resolved? |
@jacobperron @mjcarroll @sloretz friendly ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good to me, thanks!
Also, thank you for the reproduction and unit test 🙇
* Add unit test for removing a callback that's being executed. This unit test fails until we can remove callbacks that are being executed. * Use the marked_for_removal flag if the callback is being executed. This fixes a potential deadlock when a timer is being removed that's also being executed.
We encountered the issue sketched in #1980 in a production system last week.
To understand the issue at hand I've made a minimal-non-working example here. Looking at the source code it seems that the callback queue is actually at fault, not the timer code. The
removeByID
deadlocks when a callback by that id is currently being executed.The first commit introduces a unit test that demonstrates this deadlock in a unit test.
The second commit is the proposed fix for this problem. We try to obtain the lock, if we fail to obtain the lock we defer removal until the next callback queue cycle using the already existing
marked_for_removal
boolean.FYI @guillaumeautran , @mikepurvis , @jasonimercer