Adjust performance test throughput threshold limits #128968

dom4ha · 2024-11-25T17:15:57Z

What type of PR is this?

/kind feature
/kind flake

What this PR does / why we need it:

Adjust throughput threshold for new tests based on historical times to avoid flakiness.

Which issue(s) this PR fixes:

Part of #128221

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2024-11-25T17:16:07Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

dom4ha

/test pull-kubernetes-scheduler-perf

dom4ha · 2024-11-25T20:08:15Z

/cc @macsko @pohly @sanposhiho

Adjusting the limits is very cumbersome. Yes, there are graphs of historical runs, but throughputs varies a lot (in all tests). It would be good to normalize results somehow, otherwise their value is a bit limited. Moreover, running tests locally gives very different results than the ones on ci machines, so it's hard to predict right limits. For instance SchedulingBasic on local machine gives 1000+, but CI gives 300+.

I also noticed that the tests does not show logs anymore if there are no failures, so there is no possibility to verify the numbers even when triggered in PR tests.

Do you have any recommendations to adjust limits other than watching historical runs?

sanposhiho · 2024-11-26T01:52:25Z

Well, I'm not sure if we could normalize the results without running the tests multiple times in a single run (which sounds ...not elegant?).

So, the scheduler-perf alert email arrives when there's three (iirc) consecutive failures of the CI. So, in that sense, we kinda normalized the results for the alert already: the min throughput of three runs is compared against the threshold, and if it's lower, the email would come.
So, we can actually ignore some spikes in the test results. I assume a few reds (due to some spikes) in the testgrid is accepted for this job, unless there's any team hoping to keep all jobs completely green. (and I guess there isn't? because scheduler-perf was all red until very recently either way)

macsko · 2024-11-26T09:13:52Z

It would be good to normalize results somehow

I was thinking about it and I think it's not possible (in a simple way). The performance drops during the test are not constant, what results in only a few tests being affected by it. A bit of flakiness in the tests is not critical, especially given that alerts are generated for 3 consecutive failures as @sanposhiho pointed out.

Moreover, running tests locally gives very different results than the ones on ci machines, so it's hard to predict right limits.

That's why it's better to set either pessimistic thresholds for new tests or no thresholds. They could be always adjusted after obtaining some history.

I also noticed that the tests does not show logs anymore if there are no failures, so there is no possibility to verify the numbers even when triggered in PR tests.

That's right. For some reason result json is not written into job's artifacts for pull-kubernetes-scheduler-perf presubmit. There were some PRs trying to fix this, but still without a success.

macsko · 2024-11-26T09:14:22Z

/lgtm

k8s-ci-robot · 2024-11-26T09:14:29Z

LGTM label has been added.

Details

Git tree hash: 11056d628d6b7fdc0cf26543314481a7b5fa3e8f

sanposhiho

/approve

k8s-ci-robot · 2024-11-26T09:16:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dom4ha, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~test/integration/scheduler_perf/OWNERS~~ [sanposhiho]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly · 2024-11-26T10:36:51Z

That's right. For some reason result json is not written into job's artifacts for pull-kubernetes-scheduler-perf presubmit. There were some PRs trying to fix this, but still without a success.

I wasn't trying to fix that. As far as I remember, there's simply a command line parameter missing in the pull job. But I am not sure whether the pull job really runs in the same controlled environment as the CI job - that would have to be checked. Otherwise the results are not comparable and/or more flaky.

dom4ha · 2024-11-26T11:23:42Z

Well, I'm not sure if we could normalize the results without running the tests multiple times in a single run (which sounds ...not elegant?).

True. I was thinking about normalizing them post execution, for instance during performance graph creation. I wasn't aware that alerting does that, but using a rolling average of n past runs might be good enough and could let us more easily notice the trend (although with some additional delay). Taking into consideration overall tests duration could give even more stable results.

That's why it's better to set either pessimistic thresholds for new tests or no thresholds. They could be always adjusted after obtaining some history.

Setting pessimistic or no threshold sounds like the right approach for now.

That's right. For some reason result json is not written into job's artifacts for pull-kubernetes-scheduler-perf presubmit. There were some PRs trying to fix this, but still without a success.

Something must have changed recently, as tests triggered in PRs used to have full log, but the recent runs log only test verdicts.

pohly · 2024-11-26T11:27:52Z

Alerting doesn't average performance results. But it provides a "buffer" against being alerted for every single failed run. So as long as we get some good runs before a failed one, we won't get notified.

Adjust performance test threshold limits

67b7469

k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 25, 2024

k8s-ci-robot requested review from denkensk and macsko November 25, 2024 17:16

dom4ha commented Nov 25, 2024

View reviewed changes

dom4ha changed the title ~~Adjust performance test threshold limits~~ Adjust performance test throughput threshold limits Nov 25, 2024

k8s-ci-robot requested review from pohly and sanposhiho November 25, 2024 20:08

dom4ha mentioned this pull request Nov 25, 2024

feature: Make Unschedulable scheduler performance test parametrized with the number of initial nodes. #128466

Merged

k8s-ci-robot assigned macsko Nov 26, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 26, 2024

sanposhiho approved these changes Nov 26, 2024

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2024

k8s-ci-robot merged commit 7cc1eb9 into kubernetes:master Dec 12, 2024
16 checks passed

k8s-ci-robot added this to the v1.33 milestone Dec 12, 2024

Adjust performance test throughput threshold limits #128968

Adjust performance test throughput threshold limits #128968

Conversation

dom4ha commented Nov 25, 2024

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Nov 25, 2024

Uh oh!

dom4ha left a comment

Choose a reason for hiding this comment

Uh oh!

dom4ha commented Nov 25, 2024

Uh oh!

sanposhiho commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

macsko commented Nov 26, 2024

Uh oh!

macsko commented Nov 26, 2024

Uh oh!

k8s-ci-robot commented Nov 26, 2024

Uh oh!

sanposhiho left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Nov 26, 2024

Uh oh!

pohly commented Nov 26, 2024

Uh oh!

dom4ha commented Nov 26, 2024

Uh oh!

pohly commented Nov 26, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sanposhiho commented Nov 26, 2024 •

edited

Loading