support NoSchedule taints correctly in DaemonSet controller #48189

mikedanese · 2017-06-28T11:23:10Z

Support NoSchedule taints correctly in DaemonSet controller.

cc @kubernetes/sig-apps-pr-reviews

k8s-github-robot · 2017-06-28T11:27:05Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mikedanese
We suggest the following additional approver: timothysc

Assign the PR to them by writing /assign @timothysc in a comment when ready.

Associated issue: 48190

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/controller/daemon/OWNERS~~ [mikedanese]
plugin/pkg/scheduler/OWNERS

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

luxas · 2017-06-28T11:29:24Z

cc @kubernetes/sig-apps-pr-reviews @kubernetes/sig-scheduling-pr-reviews @kubernetes/kubernetes-release-managers

gmarek · 2017-06-28T11:51:59Z

This is a regression in 1.6. We need to cherry pick this there @enisoc @ethernetdan
I believe this means that we also need it in 1.7 @dchen1107. Sadly this surfaced only recently when users started to complain about fluentd being evicted on tainted Nodes. The bad thing is - it's a non-trivial change, so it has non-zero risk...

gmarek · 2017-06-28T12:19:08Z

pkg/controller/daemon/daemoncontroller.go

+	// only emit this event if insufficient resource is the only thing
+	// preventing the daemon pod from scheduling
+	if shouldSchedule && insufficientResourceErr != nil {
+		dsc.eventRecorder.Eventf(ds, v1.EventTypeWarning, FailedPlacementReason, "failed to place pod on %q: %s", node.ObjectMeta.Name, insufficientResourceErr.Error())


How often do we retry in case of this Error?

It would be much clearer to determine that if the following return contained all the actual variables that it returns. It seems that err is nil at this point so unless the controller needs to retry for a different error down the rest of the code path, this won't be retried.

I think it's ok to not requeue on insufficientResourceErr as long as we are requeueing when something changes that would cause resources to be freed up. Regardless of whether something is doing that right now, this is not a change in behavior and not the bug I am trying to fix.

gyliu513 · 2017-06-28T12:31:19Z

pkg/controller/daemon/daemoncontroller.go

 	}
 	if !fit {
 		predicateFails = append(predicateFails, reasons...)
 	}

+	fit, _, err = predicates.PodToleratesNodeNoExecuteTaints(pod, nil, nodeInfo)


A question here, can you please add some comments why check taint twice? At both line 1190 and line 1198?

I see, here is mainly to check if want to evict the pod or not, but this seems a bit strange: In above PodToleratesNodeTaints, it was already checked both taints, and here we need to check the NoExecute taint separately.

I think that @janetkuo 's proposal is better and would involve less code change.

k82cn · 2017-06-28T13:52:51Z

pkg/controller/daemon/daemoncontroller.go

@@ -1190,18 +1183,24 @@ func NewPod(ds *extensions.DaemonSet, nodeName string) *v1.Pod {

 // Predicates checks if a DaemonSet's pod can be scheduled on a node using GeneralPredicates
 // and PodToleratesNodeTaints predicate
-func Predicates(pod *v1.Pod, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
+func Predicates(pod *v1.Pod, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, bool, error) {
 	var predicateFails []algorithm.PredicateFailureReason
 	critical := utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) && kubelettypes.IsCriticalPod(pod)

 	fit, reasons, err := predicates.PodToleratesNodeTaints(pod, nil, nodeInfo)


IMO, we should consider Pods' status in PodToleratesNodeTaints: if pod is running, it tolerant node's unschedule effect (event spec.Toleration not). Or create another func for that. And I think we need to check other codes for this case.

@gmarek , WDYT ?

Please - bear with me for few more days. I promise that I'll be more responsive after 1.7 is released...

sure, np :).

@k82cn I think that is a reasonable change for PodToleratesNodeTaints to take into account whether .spec.nodeName is set on the pod (which is what I consider to be the definition of scheduled although others may disagree) but that change wouldn't help us here. The pod here always has a nodeName set when we call Predicates and never has a status. This could be fixed to work but it would be a larger change then what I would want to cherrypick.

TL;DR: It'd be too big change to cherry-pick for 1.7.

OK, we spoke about that and it all depends on the definition of what 'scheduled' means. Here the Pod that's being checked is completely artificial, created specially for this simulation, with NodeName already set. One can argue that semantics of 'NoSchedule' should prevent from setting NodeName, i.e. it should be ignored if NodeName is set. But I'd propose another meaning of word 'scheduled', i.e. 'Node agent already knows about it'. Sadly this would mean that the only thing that one can be certain, is that if Pod doesn't have NodeName set it's not scheduled. Otherwise it can't be said looking at the Pod object alone. Which means that 'PodTolerateTaints' need to just return what Taints are possibly violated and depend on the caller to figure out what to do with it. Exactly as we're doing in this PR. Post 1.7 we can discuss this.

@mikedanese, I means pod.Status.Phase is v1.PodRunning. As gmarek@ suggested, let's discuss this after 1.7.

Here's the draft diff (not test yet) in my mind: https://gist.github.com/k82cn/3dbd398ef2b138fee6a4c19ec3d2ddcf

It won't be anything, because this Pod is an artificial one, created for simulation.

wow, yes. but we already get the pod of that node in nodeShouldRunDaemonPod, we can keep its status.phase, update a bit of gist.

But this would mean that everyone who'd do such simulation needs to remember/know about it. Which is why I think it's not worth it, as I believe that it'd introduce more bugs that it would solve.

janetkuo

Can we do this without reverting the change (and keep all the test)?

Leave Predicates as is
Before calling hasIntentionalPredicatesReasons, check PodToleratesNodeNoExecuteTaints: if it doesn't fit, return false, false, false immediately
Move predicates.ErrTaintsTolerationsNotMatch case out of hasIntentionalPredicatesReasons and down to the for range reasons loop, and this case should only set wantToRun and shouldSchedule to false (since we're sure it's not NoExecute).

…e intentional reasons." This partially reverts commit 2b311fe. We drop the changes to the DaemonSet controller but leave the test. By reverting the changes, we make it easier to return different values of shouldContinueRunning for intentional predicate failures, rather then lumping all intentional predicate failures together. The test should continue to pass after the fix.

…ller And add some unit tests.

mikedanese · 2017-06-29T08:24:49Z

I reverted changes to Predicate but I don't like breaking up the switch. The function has a very clear contracted (documented in a comment at the beginning which I expanded on). Since hasIntentionalPredicatesReasons not used anywhere else, what is the point of it? By extracting it we are hiding the contract and reducing readability which will lead to more bugs.

k82cn · 2017-06-29T09:12:17Z

/test pull-kubernetes-kubemark-e2e-gce

gyliu513 · 2017-06-29T09:24:57Z

Since hasIntentionalPredicatesReasons not used anywhere else, what is the point of it? By extracting it we are hiding the contract and reducing readability which will lead to more bugs.

I recalled the major concern is the function nodeShouldRunDaemonPod is a large block, we want to make this function smaller.

mikedanese · 2017-06-30T07:36:26Z

I don't think breaking up the switch is the best way of doing that. It makes the logic very fragmented IMO. We can't treat all intentional predicates the same so hasIntentionalPredicatesReasons doesn't make sense. Moving the simulation out of the function would be better. It's a larger portion of the function and it's more self contained.

gyliu513 · 2017-06-30T08:26:22Z

Moving the simulation out of the function would be better.

@mikedanese what do you mean by simulation? Can you elaborate? Perhaps I can follow up a PR for it.

mikedanese · 2017-06-30T09:37:44Z

I'm referring to the section where we simulate what would happen if we were to add a new daemon pod to the node:

https://github.com/kubernetes/kubernetes/blob/v1.7.0/pkg/controller/daemon/daemoncontroller.go#L1089-L1136

That looks like it would cleanly extract to a function with signature:

func (dsc *DaemonSetsController) simulate(pod, node) (reasons, error)

janetkuo

Originally suggested doing this without reverting 2b311fe so that this change is small and the 1.7 cherrypick would be simpler. We can do code clean up as a follow up in master. However, reverting that commit makes cherrypicking to 1.6 easier so I'm okay with it now.

ok to revert

gyliu513 · 2017-07-01T01:09:06Z

@mikedanese got it, will try to create a PR for this later.

gyliu513 · 2017-07-01T14:09:06Z

@mikedanese a PR #48189 for the comments above.

mikedanese · 2017-07-03T13:35:41Z

/retest

k8s-github-robot · 2017-07-03T15:46:31Z

Automatic merge from submit-queue

Addressed comments from kubernetes#48189 (comment)

Automatic merge from submit-queue Factored out simulate from nodeShouldRunDaemonPod. Addressed comments from #48189 (comment) **What this PR does / why we need it**: **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # **Special notes for your reviewer**: **Release note**: ```release-note none ``` /sig apps

…8189-upstream-release-1.7 Automatic merge from submit-queue Automated cherry pick of #48189 upstream release 1.7 Cherry pick of #48189 on release-1.7. Needed for #48190 ```release-note Support NoSchedule taints correctly in DaemonSet controller. ```

k8s-cherrypick-bot · 2017-07-06T21:03:14Z

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 28, 2017

k8s-github-robot assigned 0xmichalis and janetkuo Jun 28, 2017

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 28, 2017

mikedanese mentioned this pull request Jun 28, 2017

DaemonSet controller doesn't handle NoSchedule taints correctly #48190

Closed

kubernetes deleted a comment from k8s-github-robot Jun 28, 2017

luxas added this to the v1.7 milestone Jun 28, 2017

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Jun 28, 2017

mikedanese force-pushed the fixds branch from 0eb1d54 to 50c642c Compare June 28, 2017 11:38

gmarek assigned dchen1107 Jun 28, 2017

mikedanese force-pushed the fixds branch from 50c642c to 08d51e2 Compare June 28, 2017 11:46

gmarek reviewed Jun 28, 2017

View reviewed changes

gyliu513 reviewed Jun 28, 2017

View reviewed changes

k82cn reviewed Jun 28, 2017

View reviewed changes

davidopp added the cherrypick-candidate label Jun 28, 2017

mikedanese changed the title ~~support NoExecute and NoSchedule taints correctly in DaemonSet controller~~ support NoSchedule taints correctly in DaemonSet controller Jun 28, 2017

davidopp assigned gmarek Jun 28, 2017

janetkuo previously requested changes Jun 28, 2017

View reviewed changes

mikedanese added 3 commits June 29, 2017 10:16

fix kubernetes#45780 slightly differently

1aede99

support NoExecute and NoSchedule taints correctly in DaemonSet contro…

8e6c2ea

…ller And add some unit tests.

mikedanese force-pushed the fixds branch from 08d51e2 to 8e6c2ea Compare June 29, 2017 08:16

janetkuo reviewed Jun 30, 2017

View reviewed changes

gyliu513 mentioned this pull request Jul 1, 2017

Factored out simulate from nodeShouldRunDaemonPod. #48383

Merged

mikedanese added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 3, 2017

k8s-github-robot merged commit eb2a560 into kubernetes:master Jul 3, 2017

gyliu513 added a commit to gyliu513/kubernetes that referenced this pull request Jul 5, 2017

Factored out simulate from nodeShouldRunDaemonPod.

cb7d74c

Addressed comments from kubernetes#48189 (comment)

mikedanese mentioned this pull request Jul 6, 2017

Automated cherry pick of #48189 upstream release 1.7 #48490

Merged

caesarxuchao added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jul 6, 2017

k8s-cherrypick-bot removed the cherrypick-candidate label Jul 6, 2017

k82cn mentioned this pull request Nov 7, 2018

Volunteer to be DaemonSet controller maintainer. #67927

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support NoSchedule taints correctly in DaemonSet controller #48189

support NoSchedule taints correctly in DaemonSet controller #48189

mikedanese commented Jun 28, 2017 •

edited

Loading

k8s-github-robot commented Jun 28, 2017

luxas commented Jun 28, 2017

gmarek commented Jun 28, 2017

gmarek Jun 28, 2017

0xmichalis Jun 28, 2017

mikedanese Jun 28, 2017 •

edited

Loading

gyliu513 Jun 28, 2017

gyliu513 Jun 29, 2017

k82cn Jun 28, 2017

gmarek Jun 28, 2017

k82cn Jun 28, 2017

mikedanese Jun 28, 2017

gmarek Jun 28, 2017

k82cn Jun 29, 2017

k82cn Jun 29, 2017

gmarek Jun 29, 2017

k82cn Jun 29, 2017

gmarek Jun 29, 2017

janetkuo left a comment

mikedanese commented Jun 29, 2017 •

edited

Loading

k82cn commented Jun 29, 2017

gyliu513 commented Jun 29, 2017

mikedanese commented Jun 30, 2017

gyliu513 commented Jun 30, 2017

mikedanese commented Jun 30, 2017

janetkuo left a comment

gyliu513 commented Jul 1, 2017

gyliu513 commented Jul 1, 2017

mikedanese commented Jul 3, 2017

k8s-github-robot commented Jul 3, 2017

k8s-cherrypick-bot commented Jul 6, 2017

support NoSchedule taints correctly in DaemonSet controller #48189

support NoSchedule taints correctly in DaemonSet controller #48189

Conversation

mikedanese commented Jun 28, 2017 • edited Loading

k8s-github-robot commented Jun 28, 2017

luxas commented Jun 28, 2017

gmarek commented Jun 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikedanese Jun 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janetkuo left a comment

Choose a reason for hiding this comment

mikedanese commented Jun 29, 2017 • edited Loading

k82cn commented Jun 29, 2017

gyliu513 commented Jun 29, 2017

mikedanese commented Jun 30, 2017

gyliu513 commented Jun 30, 2017

mikedanese commented Jun 30, 2017

janetkuo left a comment

Choose a reason for hiding this comment

gyliu513 commented Jul 1, 2017

gyliu513 commented Jul 1, 2017

mikedanese commented Jul 3, 2017

k8s-github-robot commented Jul 3, 2017

k8s-cherrypick-bot commented Jul 6, 2017

mikedanese commented Jun 28, 2017 •

edited

Loading

mikedanese Jun 28, 2017 •

edited

Loading

mikedanese commented Jun 29, 2017 •

edited

Loading