fix(e2e): retry on conflict when deleting extended resource #128954

sanposhiho · 2024-11-24T07:23:10Z

What type of PR is this?

/kind bug
/kind flake

What this PR does / why we need it:

The test is flake with:

[FAILED] Operation cannot be fulfilled on nodes "kind-worker": the object has been modified; please apply your changes to the latest version and try again
In [AfterEach] at: k8s.io/kubernetes/test/e2e/framework/node/helper.go:212 @ 11/16/24 06:17:12.464
https://storage.googleapis.com/k8s-triage/index.html?test=%20validates%20pod%20disruption%20condition%20is%20added%20to%20the%20preempted%20pod

Probably due to the recent refactoring: #128194.

Which issue(s) this PR fixes:

Fixes #128911

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

sanposhiho · 2024-11-24T07:28:46Z

/retest

AnishShah · 2024-11-24T13:48:55Z

test/e2e/framework/node/helper.go

-	delete(node.Status.Capacity, extendedResource)
-	delete(node.Status.Allocatable, extendedResource)
-	_, err = clientSet.CoreV1().Nodes().UpdateStatus(ctx, node, metav1.UpdateOptions{})
+	err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {


No need to retry I think. You can just do patch with status subresource.

But, you intentionally used the update, not patch, for ease here, right?
#128194 (comment)

Yeah @AnishShah since you moved from Patch to Update, then you need to deal with possible conflicts client side

This LGTM too

Yeah. I thought there won't be resource conflict and hence used UpdateStatus. We can revert to patch again to avoid doing retry.

I meant, that was for the implementation simplicity, right?
I'm not sure why you prefer to switch back to the patch here. Efficiency-wise too, the flake is pretty rare, that is, the retry would happen only rarely, and I don't think we need to mind (... and this is a test code either way).

this is an e2e test so we don't need to over optimize, I think current approach is ok and avoids more code churn

I meant, that was for the implementation simplicity, right? I'm not sure why you prefer to switch back to the patch here. Efficiency-wise too, the flake is pretty rare, that is, the retry would happen only rarely, and I don't think we need to mind (... and this is a test code either way).

patch requests avoid resource conflicts. so there won't be any need to retry and we can keep the logic simple and similar to AddExtendedResource func above.

either way works for me, I also prefer to use patch if the patch is really simple ... I'm just trying to say we should not overthink it, chose one solution and move on , a retry on conflict is common in multiple places of the code, so is nota bad solution either

We need either retry on update, or simply patch. Reported by @wendy-ha18 we got flakes and it's impacting 1.32 release: https://kubernetes.slack.com/archives/C09TP78DV/p1733547680436909

Agree with @aojea we should move on either way. (we can follow-up with the approach of switching to patch)

dims · 2024-11-25T14:48:56Z

/assign @pohly @aojea

BenTheElder · 2024-11-26T19:00:58Z

/triage accepted
/priority important-soon

aojea · 2024-11-30T06:16:28Z

/lgtm
/approve

k8s-ci-robot · 2024-11-30T06:16:34Z

LGTM label has been added.

Details

Git tree hash: 3a2f1802539034ca7af7be9ebf90f9763ed04337

k8s-ci-robot · 2024-11-30T06:16:41Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~test/e2e/framework/OWNERS~~ [aojea]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Huang-Wei · 2024-12-07T05:55:55Z

/milestone v1.32

k8s-ci-robot requested review from SataQiu and oomichi November 24, 2024 07:23

sanposhiho mentioned this pull request Nov 24, 2024

[Flaking Test] SchedulerPreemption [Serial] validates pod disruption condition is added to the preempted pod [Conformance] #128911

Closed

fix(e2e): retry on conflict when deleting extended resource

89e23dd

sanposhiho force-pushed the conflict-error branch from 6e18610 to 89e23dd Compare November 24, 2024 07:46

AnishShah reviewed Nov 24, 2024

View reviewed changes

k8s-ci-robot assigned aojea and pohly Nov 25, 2024

hshiina mentioned this pull request Nov 29, 2024

[Flaking Test] Guaranteed QoS pod, one container - increase CPU & memory with an extended resource #129022

Closed

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 30, 2024

k8s-ci-robot added this to the v1.32 milestone Dec 7, 2024

k8s-ci-robot merged commit a499fac into kubernetes:master Dec 7, 2024
14 of 15 checks passed

googs1025 mentioned this pull request Dec 7, 2024

[Failing test][sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance] #129112

Closed

hshiina mentioned this pull request Dec 13, 2024

[Flaking Test] SchedulerPreemption [Serial] validates basic preemption works [Conformance] #128733

Closed

fix(e2e): retry on conflict when deleting extended resource #128954

fix(e2e): retry on conflict when deleting extended resource #128954

Conversation

sanposhiho commented Nov 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

sanposhiho commented Nov 24, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AnishShah Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dims commented Nov 25, 2024

Uh oh!

BenTheElder commented Nov 26, 2024

Uh oh!

aojea commented Nov 30, 2024

Uh oh!

k8s-ci-robot commented Nov 30, 2024

Uh oh!

k8s-ci-robot commented Nov 30, 2024

Uh oh!

Huang-Wei commented Dec 7, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

sanposhiho commented Nov 24, 2024 •

edited

Loading

AnishShah Dec 3, 2024 •

edited

Loading