-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly handle empty watch event cache #49992
Conversation
@kubernetes/sig-api-machinery-bugs |
I follow all the logic |
/test pull-kubernetes-verify |
on master, TestClientGoCustomResourceExample fails 20% of the time locally for me. with this change, I cannot get it to fail (ran it ~150 times in a row before stopping the loop) |
@ironcladlou this fix also resolves most of the timeout issues I was seeing with other integration tests depending on watch on CRD objects (like TestMixedRelationships) |
@liggitt - thanks a lot for this fix /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, wojtek-t Associated issue: 49956 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest |
/test all [submit-queue is verifying that this PR is safe to merge] |
@liggitt: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Automatic merge from submit-queue (batch tested with PRs 49992, 48861, 49267, 49356, 49886) |
Awesome, thanks. |
…2-upstream-release-1.7 Automatic merge from submit-queue Automated cherry pick of #49992 Cherry pick of #49992 on release-1.7. #49992: Correctly handle empty watch event cache ```release-note Fixed a bug in the API server watch cache, which could cause a missing watch event immediately after cache initialization. ```
Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
…2-upstream-release-1.6 Automatic merge from submit-queue Automated cherry pick of #49992 Cherry pick of #49992 on release-1.6. #49992: Correctly handle empty watch event cache ```release-note Fixed a bug in the API server watch cache, which could cause a missing watch event immediately after cache initialization. ```
Fixes #49956
Introduced by ada6023 which did not adjust the oldest available resourceVersion for an empty watch event cache.
Exposed by 74b9ba3, which allowed controllers to get list results from etcd before the watch cache is ready (normally they list with resourceVersion=0 which serves the list request from the watch cache, blocking until it is ready)
When the watch cache had an empty cache of watch events, it currently allows establishing a watch as if it can deliver a watch event for its currently synced resourceVersion. This results in an off-by-one error which can result in a missed watch event.
Scenario:
bob:
sally:
Watch cache:
The watch cache should have dropped sally's watch from resourceVersion=10 with a "gone" error, since it can't deliver the watch event for resourceVersion=11. This would force sally to relist (where she would get a list at resourceVersion=11) and rewatch (from resourceVersion=11)
This particularly affects tests that create CRD/TPRs and establish watches on the new types as the storage layer's watch cache is also populating for that type.