-
Notifications
You must be signed in to change notification settings - Fork 42k
Adjust performance test throughput threshold limits #128968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
dom4ha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/test pull-kubernetes-scheduler-perf
|
/cc @macsko @pohly @sanposhiho Adjusting the limits is very cumbersome. Yes, there are graphs of historical runs, but throughputs varies a lot (in all tests). It would be good to normalize results somehow, otherwise their value is a bit limited. Moreover, running tests locally gives very different results than the ones on ci machines, so it's hard to predict right limits. For instance SchedulingBasic on local machine gives 1000+, but CI gives 300+. I also noticed that the tests does not show logs anymore if there are no failures, so there is no possibility to verify the numbers even when triggered in PR tests. Do you have any recommendations to adjust limits other than watching historical runs? |
|
Well, I'm not sure if we could normalize the results without running the tests multiple times in a single run (which sounds ...not elegant?). So, the scheduler-perf alert email arrives when there's three (iirc) consecutive failures of the CI. So, in that sense, we kinda normalized the results for the alert already: the min throughput of three runs is compared against the threshold, and if it's lower, the email would come. |
I was thinking about it and I think it's not possible (in a simple way). The performance drops during the test are not constant, what results in only a few tests being affected by it. A bit of flakiness in the tests is not critical, especially given that alerts are generated for 3 consecutive failures as @sanposhiho pointed out.
That's why it's better to set either pessimistic thresholds for new tests or no thresholds. They could be always adjusted after obtaining some history.
That's right. For some reason result json is not written into job's artifacts for pull-kubernetes-scheduler-perf presubmit. There were some PRs trying to fix this, but still without a success. |
|
/lgtm |
|
LGTM label has been added. DetailsGit tree hash: 11056d628d6b7fdc0cf26543314481a7b5fa3e8f |
sanposhiho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dom4ha, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I wasn't trying to fix that. As far as I remember, there's simply a command line parameter missing in the pull job. But I am not sure whether the pull job really runs in the same controlled environment as the CI job - that would have to be checked. Otherwise the results are not comparable and/or more flaky. |
True. I was thinking about normalizing them post execution, for instance during performance graph creation. I wasn't aware that alerting does that, but using a rolling average of n past runs might be good enough and could let us more easily notice the trend (although with some additional delay). Taking into consideration overall tests duration could give even more stable results.
Setting pessimistic or no threshold sounds like the right approach for now.
Something must have changed recently, as tests triggered in PRs used to have full log, but the recent runs log only test verdicts. |
|
Alerting doesn't average performance results. But it provides a "buffer" against being alerted for every single failed run. So as long as we get some good runs before a failed one, we won't get notified. |
What type of PR is this?
/kind feature
/kind flake
What this PR does / why we need it:
Adjust throughput threshold for new tests based on historical times to avoid flakiness.
Which issue(s) this PR fixes:
Part of #128221
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: