-
Notifications
You must be signed in to change notification settings - Fork 42.1k
Add NotReady taint to new nodes during admission #73097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yeah, that is right, eventually the pods may get evicted from those nodes which become unavailable over time, also the pods get scheduled onto those kind of nodes - more on that scenario can be found here: #72548 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/the taint/the NotReadytaint.
the comment could be further expanded to something like the new node may receive taint from nodecontroller with some delay..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Probably a test to ensure we don't add this taint, if the taint already exists.
|
I think we'd want this in its own admission plugin that we could enable by default here:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there an existing key for the notready taint the node controller manages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one in scheduler API, but I didn't want to create a dependency to the API only for this label.
|
@answer1991 Please take a look, this patch is critical and align with your original proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we may also need to point out who is expected to un-taint the node in the future (NodeLifecycleController etc).
3137cec to
02c0e87
Compare
|
Thanks @liggitt, @ravisantoshgudimetla, @resouer for your comments and reviews. I addressed all the comments. PTAL. |
77107aa to
f2b0959
Compare
|
Seems still some golints errors to fix. :-) |
f2b0959 to
b91be7d
Compare
|
The bigger problem is the fact that a NotReady taint is removed in a node update causing |
then the safest alternative would seem to be to disable this admission plugin specifically for the node-e2e test, if it wants to run in a partial environment |
Node e2e was introduced mainly to test kubelet and things below that in different environments (e.g., various os images). The apiserver was added simply to make interacting with kubelet easier. The node e2e suite helps us identify node failures easily (and also allows us to change the kubelet configuration easily to test different features on the node). The interaction between kubelet and controllers are tested in cluster e2e. It may be useful to have a simpler environment to test control plane on the master with nodes, but that wasn't what node e2e was created for.
+1 to disable this. |
|
Thanks, @yujuhong! I just pushed a commit to disable the new admission plugin in node e2e tests. |
resouer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
let's see how CI says then :-)
…le controller does not run in the tests
9b780d3 to
37be9f5
Compare
|
/lgtm |
|
/lgtm Thanks ALL working on this issue! |
|
/retest Review the full test history for this PR. Silence the bot with an |
| } | ||
|
|
||
| func addNotReadyTaint(node *api.Node) { | ||
| notReadyTaint := api.Taint{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this can be declared as singleton outside the func
|
@bsalamat: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
…97-upstream-release-1.12 Automated cherry pick of #73097 upstream release 1.12
…97-upstream-release-1.13 Automated cherry pick of #73097 upstream release 1.13
What type of PR is this?
/kind bug
What this PR does / why we need it:
Fixes a race condition between node controller applying taints for new nodes and the scheduler using new nodes for scheduling pods by adding not ready taint to new nodes at admission time.
Which issue(s) this PR fixes:
Fixes #72129
Special notes for your reviewer:
My understanding is that we do not need to add the taint in node updates, mainly because node condition updates may arrive anytime and can race with other K8s operations. For example, pods may be scheduled on nodes that may become unavailable shortly afterwards in normal condition and K8s handles those scenarios well. The bigger problem that we are trying to fix is to avoid placing pods on newly added nodes.
UPDATE: Given that tainting a node at update reduces wasted work and provides better experience with respect to scheduling pods on more reliable nodes, I added the logic to taint nodes at updates as well.
Does this PR introduce a user-facing change?:
/sig scheduling
/sig node
/priority critical-urgent