Ensure that DaemonSet respects termination #51279

kow3ns · 2017-08-24T17:02:23Z

What this PR does / why we need it:
#43077 correctly prevents the DaemonSet controller from adopting deleted Pods, but, as pointed out in #50477, the controller now has no sensitivity to the termination lifecycle (i.e TerminationGracePeriodSeconds) of the Pods it creates. This PR attempts to balance the two. DaemonSet controller will now consider deleted Pods owned by a DaemonSet during creation, but it will not consider deleted Pods as targets for adoption.

fixes #50477

Fix a problem of not respecting TerminationGracePeriodSeconds of the Pods created by DaemonSet controller.

kow3ns · 2017-08-24T17:49:14Z

/retest

enisoc · 2017-08-24T18:14:08Z

The purpose of #43077 was not to prevent adoption of terminating Pods. That was accomplished independently with #44150, which means you shouldn't need to make any changes to getDaemonPods.

kow3ns · 2017-08-24T20:19:56Z

@enisoc #43077 doesn't allow any Pods with a DeletionTimestamp to be considered with by ControllerRefManager. #44150 filters Pods with a DeletionTimestamp, both are referenced in #44144, which is the actual issue. #43077 introduces the code that makes the test in this PR fail. This line makes it impossible for the DaemonSet Controller to be responsive to terminated Pods. If you're claiming this is unnecessary, I can buy that, but we most certainly need to change getDaemonPods to ensure that main loop actually sees terminated Pods.

kow3ns · 2017-08-24T20:21:21Z

/retest

kow3ns · 2017-08-24T20:47:56Z

/test pull-kubernetes-unit

enisoc · 2017-08-24T20:56:33Z

The logic added in #44150 only applies to orphans. ClaimPods() will already include terminating Pods in its result, as long as you own them:

kubernetes/pkg/controller/controller_ref_manager.go

Lines 69 to 80 in 05e7f6d

    
           if controllerRef != nil { 
        
           	if controllerRef.UID != m.Controller.GetUID() { 
        
           		// Owned by someone else. Ignore. 
        
           		return false, nil 
        
           	} 
        
           	if match(obj) { 
        
           		// We already own it and the selector matches. 
        
           		// Return true (successfully claimed) before checking deletion timestamp. 
        
           		// We're still allowed to claim things we already own while being deleted 
        
           		// because doing so requires taking no actions. 
        
           		return true, nil 
        
           	}

So your change to add back ownedAndDeleted Pods is unnecessary. They are already in claimed.

kow3ns · 2017-08-24T21:08:22Z

I see so the code introduced in #43077 is completely made useless by #44150. After this PR ControllerRefManager does the right thing. That makes this PR shorter.

enisoc · 2017-08-24T21:23:18Z

Do we also need to bring back protection against double deletion?

https://github.com/kubernetes/kubernetes/pull/43077/files#diff-cb9055723bef8b239f9df77ac704ffacL556

Your PR is now basically a straight revert of #43077. I still think they intended that to have some effect unrelated to the problem of adopting terminating orphans (which they themselves fixed separately in #44150). We should make sure we understand what problem #43077 was trying to fix before we revert it.

cc @lukaszo @Kargakis

enisoc · 2017-08-25T17:55:23Z

pkg/controller/daemon/daemon_controller_test.go

@@ -1816,12 +1831,6 @@ func TestGetNodesToDaemonPods(t *testing.T) {
 			newPod("non-matching-owned-0-", "node-0", simpleDaemonSetLabel2, ds),
 			newPod("non-matching-orphan-1-", "node-1", simpleDaemonSetLabel2, nil),
 			newPod("matching-owned-by-other-0-", "node-0", simpleDaemonSetLabel, ds2),
-			func() *v1.Pod {


Maybe move this up to wantedPods so we are clear to future readers that this behavior is required.

enisoc · 2017-08-25T17:59:41Z

@kow3ns It seems like the intent of #43077 was to omit terminating-but-owned Pods from the "available" count. However, for some reason no test case was added for such a Pod:

https://github.com/kubernetes/kubernetes/pull/43077/files#diff-7f51010c4c3f3050dcce5c9d916fc3eeR138

Could you add a test case for a terminating-but-owned Pod in that function? I expect it will fail now that you've reverted the change from #43077. To avoid regressing on the bug that was meant to fix, we may have to add a new, more targeted fix in getUnavailableNumbers().

Also, you mentioned offline that you brought back the protection against double deletion. Did you not push that commit yet?

janetkuo · 2017-08-25T18:24:06Z

pkg/controller/daemon/daemon_controller_test.go

 		manager.dsStore.Add(ds)

 		syncAndValidateDaemonSets(t, manager, ds, podControl, 1, 0, 0)
 	}
 }

+// DaemonSet should launch a pod on a not ready node with taint notReady:NoExecute.


comment needs to be updated

janetkuo · 2017-08-25T23:24:09Z

I suggest we revert #43077 but keep the test cases. If we want to count terminating pods as unavailable (what #43077 originally tried to fix), we should update below code to not increment numberAvailable when the pod is terminating:

kubernetes/pkg/controller/daemon/daemon_controller.go

Lines 1033 to 1035 in 05e7f6d

    
           if podutil.IsPodAvailable(pod, ds.Spec.MinReadySeconds, metav1.Now()) { 
        
           	numberAvailable++ 
        
           }

I originally suggested that IsPodAvailable should count terminating pods as unavailable, but the decision made then was to let controllers to decide whether to filter out terminating pods or not:
https://github.com/kubernetes/kubernetes/pull/32771/files#diff-473b59d03b4c3dc229240ad9fb29c1edR84

We should probably revisit IsPodAvailable if we think it's a common requirement for all controllers to determine availability of pods.

kow3ns · 2017-08-29T16:09:09Z

/retest

lukaszo · 2017-08-29T20:11:42Z

The goal of #43077 was to make daemon controller behavior similar to replicaset controller where pods marked for deletion are also ignored.
If the behavior is wrong then probably ReplicaSet controller also should be fixed.

lukaszo · 2017-08-29T20:15:44Z

However, for some reason no test case was added for such a Pod:

@enisoc it's handled here https://github.com/kubernetes/kubernetes/pull/43077/files#diff-0dc431d36cd3969089646d1bd4640d1eR1270

janetkuo · 2017-08-29T20:18:23Z

The goal of #43077 was to make daemon controller behavior similar to replicaset controller where pods marked for deletion are also ignored.
If the behavior is wrong then probably ReplicaSet controller also should be fixed.

For DaemonSets:

It is okay to see terminating pods as unavailable during a rolling update (when making sure maxUnavailable isn't violated).
It is not okay to ignore terminating pods and try to create a new pod on the same node (DaemonSet controller start new pod when the old pod is still in Terminating state #50477).

kow3ns · 2017-08-29T22:17:13Z

/retest

kow3ns · 2017-08-30T05:57:58Z

/retest

enisoc · 2017-08-30T17:10:30Z

/lgtm

janetkuo · 2017-08-30T20:40:21Z

pkg/controller/daemon/daemon_controller.go

-		// Skip terminating pods
-		if pod.DeletionTimestamp != nil {
-			continue
-		}


Add these back to dsc.manage() (where it's deleted)?

… while waiting for a Pod that it has previously created to terminate.

kow3ns · 2017-08-31T20:30:59Z

@janetkuo @enisoc @Kargakis are we good with this?

enisoc · 2017-08-31T20:32:44Z

/lgtm

0xmichalis · 2017-08-31T21:03:46Z

/approve

janetkuo · 2017-08-31T21:55:22Z

/lgtm

k8s-github-robot · 2017-08-31T21:55:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enisoc, janetkuo, kargakis, kow3ns

Associated issue: 43077

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/controller/daemon/OWNERS~~ [janetkuo]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-09-01T07:11:19Z

Automatic merge from submit-queue (batch tested with PRs 51628, 51637, 51490, 51279, 51302)

janetkuo · 2017-12-06T05:11:17Z

Since the regression was introduced in 1.7, we should probably cherrypick this one into 1.7

…79-upstream-release-1.7 Automatic merge from submit-queue. Automated cherry pick of #51279 upstream release 1.7 Cherrypick #51279 to 1.7 Fixes #56653

kow3ns requested a review from janetkuo August 24, 2017 17:02

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 24, 2017

kow3ns requested a review from enisoc August 24, 2017 17:02

k8s-github-robot assigned mikedanese and lukaszo Aug 24, 2017

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 24, 2017

kow3ns changed the title ~~Ensure that DaemonSet respect termination~~ Ensure that DaemonSet respects termination Aug 24, 2017

kow3ns force-pushed the daemonset-respects-termination branch from 2fc0eae to f2c3153 Compare August 24, 2017 17:51

kow3ns force-pushed the daemonset-respects-termination branch from f2c3153 to 4248f2a Compare August 24, 2017 21:09

enisoc reviewed Aug 25, 2017

View reviewed changes

janetkuo self-assigned this Aug 25, 2017

janetkuo reviewed Aug 25, 2017

View reviewed changes

kow3ns force-pushed the daemonset-respects-termination branch from 4248f2a to 010ef98 Compare August 28, 2017 17:46

janetkuo added this to the v1.8 milestone Aug 28, 2017

janetkuo mentioned this pull request Aug 28, 2017

[DaemonSet] Consider pods that are terminating instead of counting them as unavailable #50533

Closed

hzxuzhonghu mentioned this pull request Aug 29, 2017

daemonset exists 2 pods on one node #51464

Closed

kow3ns force-pushed the daemonset-respects-termination branch from 010ef98 to e892999 Compare August 29, 2017 17:42

enisoc mentioned this pull request Aug 29, 2017

daemonset status.numberReady does not correspond to actual daemonset pod state #51278

Closed

k8s-ci-robot assigned enisoc Aug 30, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 30, 2017

janetkuo reviewed Aug 30, 2017

View reviewed changes

Ensures that the DaemonSet controller does not launch a Pod on a Node…

8ad18bf

… while waiting for a Pod that it has previously created to terminate.

kow3ns force-pushed the daemonset-respects-termination branch from e892999 to 8ad18bf Compare August 31, 2017 17:29

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 31, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 31, 2017

janetkuo approved these changes Aug 31, 2017

View reviewed changes

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 31, 2017

k8s-github-robot merged commit 8beb39d into kubernetes:master Sep 1, 2017

This was referenced Nov 1, 2017

Disable Kubernetes' termination grace period for calico/node projectcalico/calico#1293

Merged

Calico add-on: calico/node pod can take a long time to be restarted #55013

Closed

janetkuo mentioned this pull request Dec 11, 2017

Automated cherry pick of #51279 upstream release 1.7 #57048

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure that DaemonSet respects termination #51279

Ensure that DaemonSet respects termination #51279

kow3ns commented Aug 24, 2017 •

edited by wojtek-t

Loading

kow3ns commented Aug 24, 2017

enisoc commented Aug 24, 2017

kow3ns commented Aug 24, 2017 •

edited

Loading

kow3ns commented Aug 24, 2017

kow3ns commented Aug 24, 2017

enisoc commented Aug 24, 2017

kow3ns commented Aug 24, 2017

enisoc commented Aug 24, 2017

enisoc Aug 25, 2017

kow3ns Aug 25, 2017

enisoc commented Aug 25, 2017

janetkuo Aug 25, 2017

janetkuo commented Aug 25, 2017 •

edited

Loading

kow3ns commented Aug 29, 2017

lukaszo commented Aug 29, 2017

lukaszo commented Aug 29, 2017 •

edited

Loading

janetkuo commented Aug 29, 2017 •

edited

Loading

kow3ns commented Aug 29, 2017

kow3ns commented Aug 30, 2017

enisoc commented Aug 30, 2017

janetkuo Aug 30, 2017

kow3ns commented Aug 31, 2017

enisoc commented Aug 31, 2017

0xmichalis commented Aug 31, 2017

janetkuo commented Aug 31, 2017

k8s-github-robot commented Aug 31, 2017

k8s-github-robot commented Sep 1, 2017

janetkuo commented Dec 6, 2017

Ensure that DaemonSet respects termination #51279

Ensure that DaemonSet respects termination #51279

Conversation

kow3ns commented Aug 24, 2017 • edited by wojtek-t Loading

kow3ns commented Aug 24, 2017

enisoc commented Aug 24, 2017

kow3ns commented Aug 24, 2017 • edited Loading

kow3ns commented Aug 24, 2017

kow3ns commented Aug 24, 2017

enisoc commented Aug 24, 2017

kow3ns commented Aug 24, 2017

enisoc commented Aug 24, 2017

enisoc Aug 25, 2017

Choose a reason for hiding this comment

kow3ns Aug 25, 2017

Choose a reason for hiding this comment

enisoc commented Aug 25, 2017

janetkuo Aug 25, 2017

Choose a reason for hiding this comment

janetkuo commented Aug 25, 2017 • edited Loading

kow3ns commented Aug 29, 2017

lukaszo commented Aug 29, 2017

lukaszo commented Aug 29, 2017 • edited Loading

janetkuo commented Aug 29, 2017 • edited Loading

kow3ns commented Aug 29, 2017

kow3ns commented Aug 30, 2017

enisoc commented Aug 30, 2017

janetkuo Aug 30, 2017

Choose a reason for hiding this comment

kow3ns commented Aug 31, 2017

enisoc commented Aug 31, 2017

0xmichalis commented Aug 31, 2017

janetkuo commented Aug 31, 2017

k8s-github-robot commented Aug 31, 2017

k8s-github-robot commented Sep 1, 2017

janetkuo commented Dec 6, 2017

kow3ns commented Aug 24, 2017 •

edited by wojtek-t

Loading

kow3ns commented Aug 24, 2017 •

edited

Loading

janetkuo commented Aug 25, 2017 •

edited

Loading

lukaszo commented Aug 29, 2017 •

edited

Loading

janetkuo commented Aug 29, 2017 •

edited

Loading