Skip to content

Conversation

@dom4ha
Copy link
Member

@dom4ha dom4ha commented Nov 25, 2024

What type of PR is this?

/kind feature
/kind flake

What this PR does / why we need it:

Adjust throughput threshold for new tests based on historical times to avoid flakiness.

Which issue(s) this PR fixes:

Part of #128221

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. kind/flake Categorizes issue or PR as related to a flaky test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 25, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 25, 2024
Copy link
Member Author

@dom4ha dom4ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test pull-kubernetes-scheduler-perf

@dom4ha dom4ha changed the title Adjust performance test threshold limits Adjust performance test throughput threshold limits Nov 25, 2024
@dom4ha
Copy link
Member Author

dom4ha commented Nov 25, 2024

/cc @macsko @pohly @sanposhiho

Adjusting the limits is very cumbersome. Yes, there are graphs of historical runs, but throughputs varies a lot (in all tests). It would be good to normalize results somehow, otherwise their value is a bit limited. Moreover, running tests locally gives very different results than the ones on ci machines, so it's hard to predict right limits. For instance SchedulingBasic on local machine gives 1000+, but CI gives 300+.

I also noticed that the tests does not show logs anymore if there are no failures, so there is no possibility to verify the numbers even when triggered in PR tests.

Do you have any recommendations to adjust limits other than watching historical runs?

@sanposhiho
Copy link
Member

sanposhiho commented Nov 26, 2024

Well, I'm not sure if we could normalize the results without running the tests multiple times in a single run (which sounds ...not elegant?).

So, the scheduler-perf alert email arrives when there's three (iirc) consecutive failures of the CI. So, in that sense, we kinda normalized the results for the alert already: the min throughput of three runs is compared against the threshold, and if it's lower, the email would come.
So, we can actually ignore some spikes in the test results. I assume a few reds (due to some spikes) in the testgrid is accepted for this job, unless there's any team hoping to keep all jobs completely green. (and I guess there isn't? because scheduler-perf was all red until very recently either way)

@macsko
Copy link
Member

macsko commented Nov 26, 2024

It would be good to normalize results somehow

I was thinking about it and I think it's not possible (in a simple way). The performance drops during the test are not constant, what results in only a few tests being affected by it. A bit of flakiness in the tests is not critical, especially given that alerts are generated for 3 consecutive failures as @sanposhiho pointed out.

Moreover, running tests locally gives very different results than the ones on ci machines, so it's hard to predict right limits.

That's why it's better to set either pessimistic thresholds for new tests or no thresholds. They could be always adjusted after obtaining some history.

I also noticed that the tests does not show logs anymore if there are no failures, so there is no possibility to verify the numbers even when triggered in PR tests.

That's right. For some reason result json is not written into job's artifacts for pull-kubernetes-scheduler-perf presubmit. There were some PRs trying to fix this, but still without a success.

@macsko
Copy link
Member

macsko commented Nov 26, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 26, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

DetailsGit tree hash: 11056d628d6b7fdc0cf26543314481a7b5fa3e8f

Copy link
Member

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dom4ha, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2024
@pohly
Copy link
Contributor

pohly commented Nov 26, 2024

That's right. For some reason result json is not written into job's artifacts for pull-kubernetes-scheduler-perf presubmit. There were some PRs trying to fix this, but still without a success.

I wasn't trying to fix that. As far as I remember, there's simply a command line parameter missing in the pull job. But I am not sure whether the pull job really runs in the same controlled environment as the CI job - that would have to be checked. Otherwise the results are not comparable and/or more flaky.

@dom4ha
Copy link
Member Author

dom4ha commented Nov 26, 2024

Well, I'm not sure if we could normalize the results without running the tests multiple times in a single run (which sounds ...not elegant?).

True. I was thinking about normalizing them post execution, for instance during performance graph creation. I wasn't aware that alerting does that, but using a rolling average of n past runs might be good enough and could let us more easily notice the trend (although with some additional delay). Taking into consideration overall tests duration could give even more stable results.

That's why it's better to set either pessimistic thresholds for new tests or no thresholds. They could be always adjusted after obtaining some history.

Setting pessimistic or no threshold sounds like the right approach for now.

That's right. For some reason result json is not written into job's artifacts for pull-kubernetes-scheduler-perf presubmit. There were some PRs trying to fix this, but still without a success.

Something must have changed recently, as tests triggered in PRs used to have full log, but the recent runs log only test verdicts.

@pohly
Copy link
Contributor

pohly commented Nov 26, 2024

Alerting doesn't average performance results. But it provides a "buffer" against being alerted for every single failed run. So as long as we get some good runs before a failed one, we won't get notified.

@k8s-ci-robot k8s-ci-robot merged commit 7cc1eb9 into kubernetes:master Dec 12, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants