-
Notifications
You must be signed in to change notification settings - Fork 42k
fix(e2e): retry on conflict when deleting extended resource #128954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/retest |
6e18610 to
89e23dd
Compare
| delete(node.Status.Capacity, extendedResource) | ||
| delete(node.Status.Allocatable, extendedResource) | ||
| _, err = clientSet.CoreV1().Nodes().UpdateStatus(ctx, node, metav1.UpdateOptions{}) | ||
| err := retry.RetryOnConflict(retry.DefaultBackoff, func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to retry I think. You can just do patch with status subresource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, you intentionally used the update, not patch, for ease here, right?
#128194 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah @AnishShah since you moved from Patch to Update, then you need to deal with possible conflicts client side
This LGTM too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I thought there won't be resource conflict and hence used UpdateStatus. We can revert to patch again to avoid doing retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, that was for the implementation simplicity, right?
I'm not sure why you prefer to switch back to the patch here. Efficiency-wise too, the flake is pretty rare, that is, the retry would happen only rarely, and I don't think we need to mind (... and this is a test code either way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an e2e test so we don't need to over optimize, I think current approach is ok and avoids more code churn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, that was for the implementation simplicity, right? I'm not sure why you prefer to switch back to the patch here. Efficiency-wise too, the flake is pretty rare, that is, the retry would happen only rarely, and I don't think we need to mind (... and this is a test code either way).
patch requests avoid resource conflicts. so there won't be any need to retry and we can keep the logic simple and similar to AddExtendedResource func above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either way works for me, I also prefer to use patch if the patch is really simple ... I'm just trying to say we should not overthink it, chose one solution and move on , a retry on conflict is common in multiple places of the code, so is nota bad solution either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need either retry on update, or simply patch. Reported by @wendy-ha18 we got flakes and it's impacting 1.32 release: https://kubernetes.slack.com/archives/C09TP78DV/p1733547680436909
Agree with @aojea we should move on either way. (we can follow-up with the approach of switching to patch)
|
/triage accepted |
|
/lgtm |
|
LGTM label has been added. DetailsGit tree hash: 3a2f1802539034ca7af7be9ebf90f9763ed04337 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/milestone v1.32 |
What type of PR is this?
/kind bug
/kind flake
What this PR does / why we need it:
The test is flake with:
Probably due to the recent refactoring: #128194.
Which issue(s) this PR fixes:
Fixes #128911
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: