Skip to content

Millions of conflicts on Nodes in kubemark-scale tests #46851

@wojtek-t

Description

@wojtek-t

Kubemark-scale is currently constantly failing. There are different reasons of it:

  • too high metrics
  • some nodes becoming unready
  • some pods not starting
  • some requests getting "429" error

However, I looked into logs, and it seems there is one common underlying root cause of all of them. And the problem is:
millions of conflicts on Node objects

Those conflicts result in:

  • some nodes not being able to heartbeat -> becoming unready
  • more requests in the system in general -> higher latencies
  • more requests in the system -> 429 errors
  • system more overloaded -> pod status updated not being delivered on time

We should understand why those conflicts became so often and fix the problem.
This seems like a regression to me - so putting into 1.7 milestone.

@kubernetes/sig-scalability-bugs

Metadata

Metadata

Assignees

Labels

kind/failing-testCategorizes issue or PR as related to a consistently or frequently failing test.release-blockersig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions