-
Notifications
You must be signed in to change notification settings - Fork 42.1k
Closed
Labels
kind/failing-testCategorizes issue or PR as related to a consistently or frequently failing test.Categorizes issue or PR as related to a consistently or frequently failing test.release-blockersig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.Categorizes an issue or PR as relevant to SIG Scalability.
Milestone
Description
Kubemark-scale is currently constantly failing. There are different reasons of it:
- too high metrics
- some nodes becoming unready
- some pods not starting
- some requests getting "429" error
However, I looked into logs, and it seems there is one common underlying root cause of all of them. And the problem is:
millions of conflicts on Node objects
Those conflicts result in:
- some nodes not being able to heartbeat -> becoming unready
- more requests in the system in general -> higher latencies
- more requests in the system -> 429 errors
- system more overloaded -> pod status updated not being delivered on time
We should understand why those conflicts became so often and fix the problem.
This seems like a regression to me - so putting into 1.7 milestone.
@kubernetes/sig-scalability-bugs
Metadata
Metadata
Assignees
Labels
kind/failing-testCategorizes issue or PR as related to a consistently or frequently failing test.Categorizes issue or PR as related to a consistently or frequently failing test.release-blockersig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.Categorizes an issue or PR as relevant to SIG Scalability.