Skip to content

Conversation

@sanposhiho
Copy link
Member

@sanposhiho sanposhiho commented Nov 24, 2024

What type of PR is this?

/kind bug
/kind flake

What this PR does / why we need it:

The test is flake with:

[FAILED] Operation cannot be fulfilled on nodes "kind-worker": the object has been modified; please apply your changes to the latest version and try again
In [AfterEach] at: k8s.io/kubernetes/test/e2e/framework/node/helper.go:212 @ 11/16/24 06:17:12.464
https://storage.googleapis.com/k8s-triage/index.html?test=%20validates%20pod%20disruption%20condition%20is%20added%20to%20the%20preempted%20pod

Probably due to the recent refactoring: #128194.

Which issue(s) this PR fixes:

Fixes #128911

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/flake Categorizes issue or PR as related to a flaky test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 24, 2024
@k8s-ci-robot k8s-ci-robot added area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 24, 2024
@sanposhiho
Copy link
Member Author

/retest

delete(node.Status.Capacity, extendedResource)
delete(node.Status.Allocatable, extendedResource)
_, err = clientSet.CoreV1().Nodes().UpdateStatus(ctx, node, metav1.UpdateOptions{})
err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to retry I think. You can just do patch with status subresource.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, you intentionally used the update, not patch, for ease here, right?
#128194 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah @AnishShah since you moved from Patch to Update, then you need to deal with possible conflicts client side

This LGTM too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I thought there won't be resource conflict and hence used UpdateStatus. We can revert to patch again to avoid doing retry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, that was for the implementation simplicity, right?
I'm not sure why you prefer to switch back to the patch here. Efficiency-wise too, the flake is pretty rare, that is, the retry would happen only rarely, and I don't think we need to mind (... and this is a test code either way).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an e2e test so we don't need to over optimize, I think current approach is ok and avoids more code churn

Copy link
Contributor

@AnishShah AnishShah Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, that was for the implementation simplicity, right? I'm not sure why you prefer to switch back to the patch here. Efficiency-wise too, the flake is pretty rare, that is, the retry would happen only rarely, and I don't think we need to mind (... and this is a test code either way).

patch requests avoid resource conflicts. so there won't be any need to retry and we can keep the logic simple and similar to AddExtendedResource func above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either way works for me, I also prefer to use patch if the patch is really simple ... I'm just trying to say we should not overthink it, chose one solution and move on , a retry on conflict is common in multiple places of the code, so is nota bad solution either

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need either retry on update, or simply patch. Reported by @wendy-ha18 we got flakes and it's impacting 1.32 release: https://kubernetes.slack.com/archives/C09TP78DV/p1733547680436909

Agree with @aojea we should move on either way. (we can follow-up with the approach of switching to patch)

@dims
Copy link
Member

dims commented Nov 25, 2024

/assign @pohly @aojea

@BenTheElder
Copy link
Member

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 26, 2024
@aojea
Copy link
Member

aojea commented Nov 30, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

DetailsGit tree hash: 3a2f1802539034ca7af7be9ebf90f9763ed04337

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 30, 2024
@Huang-Wei
Copy link
Member

/milestone v1.32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Flaking Test] SchedulerPreemption [Serial] validates pod disruption condition is added to the preempted pod [Conformance]

8 participants