Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test. #45748

mml · 2017-05-12T21:16:21Z

This is still a WIP... needs to be squashed at least, and I don't think it's currently passing until I increase the scale of the RC, but please have a look at the general outline. Thanks!

Fixes #38336

@kow3ns @bdbauer @krousey @erictune @maisem @davidopp

On GCE, node upgrades will now respect PodDisruptionBudgets, if present.

mml · 2017-05-12T21:16:47Z

/assign krousey

mml · 2017-05-12T21:17:36Z

/release-note-none

mml · 2017-05-12T21:18:17Z

No, actually probably need a release note.

mml · 2017-05-12T23:32:12Z

OK! This is now passing for me. Whee!

krousey · 2017-05-15T21:21:14Z

test/e2e/upgrades/services.go

@@ -73,7 +73,7 @@ func (t *ServiceUpgradeTest) Test(f *framework.Framework, done <-chan struct{},
 	case MasterUpgrade:
 		t.test(f, done, true)
 	default:
-		t.test(f, done, false)
+		t.test(f, done, true)


This should not be the default for two reasons.

The enum could expand to upgrade types that you don't intend to be tested this way.

Some cloud providers might not support a graceful node upgrade like this. There's should be a check for that.

mml · 2017-05-15T21:59:33Z

@krousey PTAL

mml · 2017-05-15T23:34:45Z

@krousey it's squashed now and should have a decent release note.

krousey · 2017-05-16T00:58:42Z

I only did a quick review earlier. Here's a more in-depth review. Overall it looks good.

krousey · 2017-05-16T00:42:36Z

cluster/gce/upgrade.sh

      fi
-      echo "Rolling update ${update} is still in ${result} state."
-      sleep 10
+      echo "== Waiting for ${instance} to become young. ==" >&2


Become young? That's... confusing. Why not wait for recreation to complete?

krousey · 2017-05-16T00:48:43Z

cluster/gce/upgrade.sh

-      sleep 10
+      echo "== Waiting for ${instance} to become young. ==" >&2
+      while true; do
+        age=$(( $(date +%s) - $(date --date=$("${KUBE_ROOT}/cluster/kubectl.sh"  get node "${instance}" --output='jsonpath={.metadata.creationTimestamp}' ) +%s) ))


This is very dense to read. I would rather something that checked the recreate operation's status. This can't really tell the difference between a recreate and a system crash/restart. Also, the 60 second window just makes me uncomfortable with timing flakiness. If this comes really quick after starting the cluster, you may mistake the instance as having been restarted. If for some reason this takes too long, then you might miss the window.

What about checking the status of the recreate operation? Something like

gcloud compute operations list --limit=1 --filter="targetLink:*${instance} AND operationType:recreate" --format='get(status)'

should print "DONE" when it's complete. It's probably as dense, but avoids timing ambiguities and date math. You shouldcheck with gcloud operations list --format=json to make sure I got the filters right.

It's also worth noting that this currently works because kubelet deletes and recreates the node object on upgrade. This would not work if kubelet switched to just updating the node object.

I agree about the timing flakiness, and I agree I'd like to be sure that the instance has been recreated.

But both of those can happen and it can still be the case that the kubelet hasn't deleted and recreated the node object, so we still need to make sure that the node is Ready, and we need to make sure we're checking that the new node is healthy, because we may observe the old node with the same name still being Ready.

I think I can do this if the script itself deletes the node once it's been drained. The order would be

kubectl drain

kubectl delete node

gcloud ... recreate-instances

poll gcloud compute operations list

poll kubectl get node waiting for SchedulingDisabled=false and Ready=true.

This depends on a different assumption about kubelet's behavior, that it won't re-add itself after I delete it. That seems to be the case if nothing happens on the node, but if kubelet randomly gets restarted (or changes its behavior), it will get re-added, and it will no longer be drained. IOW, deleting the node opens the door for the node to acquire pods again after the code thinks it's drained. A different race.

So having written all that, I'm inclined to poll the operation. You're right that seems cleaner. But I still need a way to distinguish old node from new node, and I can't think of a better way than polling for age. I guess I could poll for version with kubectl get -o jsonpath='{.status.nodeInfo}' node/foo. That would work as long as we're sure that we're actually changing kubelet versions.

Yes. I didn't mean not checking the nodes reported status. I just meant that checking the operation status is preferred to checking timestamps to see if the node rebooted. You still have all the other issues you mentioned.

Actually, there's another problem with the operation polling approach. The recreate isn't a single operation. It's two, apparently kicked off in succession asynchronously. So I need to wait for the insert operation to be DONE. But I need to make sure it's the right insert operation. Consider the output below if I'm working on rx0c. I don't want to accidentally match on the insert from 05-12 in this case if it's today.

This looks like I need more timestamp math. WDYT?

% gcloud compute operations list --limit=6 --filter='targetLink:us-central1-f/instances/e2e-minion-group-*' --format='table[box](insertTime,startTime,endTime,operationType,targetLink,status)' --sort-by='~insertTime' ┌───────────────────────────────┬───────────────────────────────┬───────────────────────────────┬────────┬───────────────────────────────────────────────┬────────┐ │ TIMESTAMP │ START_TIME │ END_TIME │ TYPE │ TARGET │ STATUS │ ├───────────────────────────────┼───────────────────────────────┼───────────────────────────────┼────────┼───────────────────────────────────────────────┼────────┤ │ 2017-05-16T15:18:37.154-07:00 │ 2017-05-16T15:18:37.709-07:00 │ 2017-05-16T15:18:57.966-07:00 │ insert │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE │ │ 2017-05-16T15:17:26.260-07:00 │ 2017-05-16T15:17:26.599-07:00 │ 2017-05-16T15:18:35.020-07:00 │ delete │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE │ │ 2017-05-16T15:14:30.903-07:00 │ 2017-05-16T15:14:31.369-07:00 │ 2017-05-16T15:14:52.354-07:00 │ insert │ us-central1-f/instances/e2e-minion-group-6lw8 │ DONE │ │ 2017-05-16T15:13:11.547-07:00 │ 2017-05-16T15:13:11.905-07:00 │ 2017-05-16T15:14:28.942-07:00 │ delete │ us-central1-f/instances/e2e-minion-group-6lw8 │ DONE │ │ 2017-05-12T16:20:35.272-07:00 │ 2017-05-12T16:20:35.990-07:00 │ 2017-05-12T16:20:54.404-07:00 │ insert │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE │ │ 2017-05-12T16:19:26.596-07:00 │ 2017-05-12T16:19:26.908-07:00 │ 2017-05-12T16:20:33.207-07:00 │ delete │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE │ └───────────────────────────────┴───────────────────────────────┴───────────────────────────────┴────────┴───────────────────────────────────────────────┴────────┘

So you don't think it would be sufficient to grab the latest insert op?

You can watch for the UID/externalID of the node object to change. Or you can keep fetching the instance from GCE and wait for the ID to change. This is the authoritative way to know that the instance did indeed get recreated.
Once the ID changes, you need to wait for the node to enter Ready state.

gcloud compute instances describe $INSTANCE --format='get(id)'

@krousey, no because the "latest insert op" will actually be an old insert op for the first minute or so (maybe longer).

@maisem this is good. I cache the id before I kick off recreate-instances, then poll until that id changes. Then I poll get nodes until the externalID matches. Then I can start polling node state.

% cluster/kubectl.sh get no e2e-minion-group-rx0c --output=jsonpath='{.spec.externalID}' 7990879616757056595 % gcloud compute instances describe e2e-minion-group-rx0c --format='get(id)' 7990879616757056595

@maisem I think you're the best one to review this file. Can you PTAL?

krousey · 2017-05-16T00:56:51Z

test/e2e/framework/service_util.go

@@ -490,13 +501,18 @@ func (j *ServiceTestJig) newRCTemplate(namespace string) *v1.ReplicationControll
 			Labels:    j.Labels,
 		},
 		Spec: v1.ReplicationControllerSpec{
-			Replicas: func(i int) *int32 { x := int32(i); return &x }(1),
+			Replicas: func(i int) *int32 { x := int32(i); return &x }(2),


... not your fault, but could use just var replicas int32 = 2 at the top of this function and use &replicas here.

krousey · 2017-05-16T00:58:06Z

test/e2e/framework/service_util.go

 			Selector: j.Labels,
 			Template: &v1.PodTemplateSpec{
 				ObjectMeta: metav1.ObjectMeta{
 					Labels: j.Labels,
 				},
 				Spec: v1.PodSpec{
+					Affinity: &v1.Affinity{


So this is starting to change all of the services e2e tests. I don't know enough about them to say that this is OK or not. You should find the people who own those tests to review this.

This is PROBABLY alright, but would we be safer to only set this for a testcase that is explicitly targetting this behavior?

There is actually a hook in RunOrFail where I can add this logic (tweak function). I'll externalize it and only set it up for this test.

Done. @krousey can you take a look at how I structured it? I used the existing tweak hook, but the implementation is a little ugly.

mml · 2017-05-25T18:01:23Z

cc @monopole

maisem · 2017-05-25T20:13:20Z

test/e2e/upgrades/services.go

@@ -36,6 +36,8 @@ type ServiceUpgradeTest struct {

 func (ServiceUpgradeTest) Name() string { return "service-upgrade" }

+func shouldTestPDBs() bool { return framework.ProviderIs("gce") }


This should be enabled for GKE as well.

maisem · 2017-05-25T20:13:56Z

@mml - upgrade.sh changes LGTM.

maisem

LGTM after comment fix.

maisem · 2017-06-02T00:09:59Z

test/e2e/upgrades/services.go

@@ -72,6 +79,9 @@ func (t *ServiceUpgradeTest) Test(f *framework.Framework, done <-chan struct{},
 	switch upgrade {
 	case MasterUpgrade:
 		t.test(f, done, true)
+	case NodeUpgrade:
+		// Node upgrades should test during disruption only on GCE for now.


Addressed.

vishh · 2017-06-02T00:26:01Z

/approve

lavalamp · 2017-06-02T00:29:09Z

/approve since @krousey is on vacation

(although someone else should review the shell :) )

Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test. Modified that test so that we include pod anti-affinity constraints and a PDB.

maisem · 2017-06-02T02:45:18Z

@k8s-bot unit test this

maisem · 2017-06-02T02:46:15Z

/lgtm

k8s-github-robot · 2017-06-02T02:46:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, maisem, mml, vishh

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~cluster/gce/OWNERS~~ [lavalamp,vishh]
~~test/OWNERS~~ [lavalamp]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

fejta · 2017-06-02T05:17:50Z

@k8s-bot pull-kubernetes-federation-e2e-gce test this
ref: #46827

shashidharatd · 2017-06-02T07:36:49Z

@k8s-bot pull-kubernetes-federation-e2e-gce test this
ref: #46827

mml · 2017-06-02T16:48:35Z

@shashidharatd thanks!

mml · 2017-06-02T16:55:23Z

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this #38275

k8s-ci-robot · 2017-06-02T18:01:11Z

@mml: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
Jenkins unit/integration	c71165ad34cd33873a77c8d3975c038cdfe91658	link	`@k8s-bot unit test this`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

mml · 2017-06-02T18:18:52Z

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

k8s-github-robot · 2017-06-04T13:11:58Z

Automatic merge from submit-queue

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 12, 2017

k8s-github-robot assigned mwielgus and gmarek May 12, 2017

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels May 12, 2017

mml added this to the v1.7 milestone May 12, 2017

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels May 12, 2017

mml removed the release-note-none Denotes a PR that doesn't merit a release note. label May 12, 2017

k8s-github-robot added the release-note-label-needed label May 12, 2017

mml assigned krousey and unassigned gmarek and mwielgus May 12, 2017

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 12, 2017

mml force-pushed the reliable-node-upgrade branch from 9d022d3 to 1d216df Compare May 12, 2017 23:30

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 12, 2017

mml force-pushed the reliable-node-upgrade branch 2 times, most recently from def5ce8 to 0ee98f4 Compare May 15, 2017 21:17

krousey reviewed May 15, 2017

View reviewed changes

mml force-pushed the reliable-node-upgrade branch from 66ad721 to aa797a1 Compare May 15, 2017 23:31

mml added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels May 15, 2017

krousey suggested changes May 16, 2017

View reviewed changes

maisem reviewed May 25, 2017

View reviewed changes

mml force-pushed the reliable-node-upgrade branch from c71165a to 9fd96d3 Compare June 2, 2017 00:02

mml assigned maisem and unassigned krousey Jun 2, 2017

maisem approved these changes Jun 2, 2017

View reviewed changes

mml force-pushed the reliable-node-upgrade branch from 9fd96d3 to ae99613 Compare June 2, 2017 00:12

mml added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2017

mml added 2 commits June 1, 2017 17:58

Respect PDBs during GCE node upgrades.

775f2ef

Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test. Modified that test so that we include pod anti-affinity constraints and a PDB.

update-bazel.sh

43e2bec

mml force-pushed the reliable-node-upgrade branch from ae99613 to 43e2bec Compare June 2, 2017 00:59

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2017

k8s-github-robot merged commit 3837d95 into kubernetes:master Jun 4, 2017

mml mentioned this pull request Jun 5, 2017

Node upgrades: graceful node evacuation #6080

Closed

		@@ -36,6 +36,8 @@ type ServiceUpgradeTest struct {

		func (ServiceUpgradeTest) Name() string { return "service-upgrade" }

		func shouldTestPDBs() bool { return framework.ProviderIs("gce") }

Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test. #45748

Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test. #45748

Conversation

mml commented May 12, 2017 • edited Loading

mml commented May 12, 2017

mml commented May 12, 2017

mml commented May 12, 2017

mml commented May 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mml commented May 15, 2017

mml commented May 15, 2017

krousey commented May 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mml May 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mml commented May 25, 2017

Choose a reason for hiding this comment

maisem commented May 25, 2017

maisem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vishh commented Jun 2, 2017

lavalamp commented Jun 2, 2017

maisem commented Jun 2, 2017

maisem commented Jun 2, 2017

k8s-github-robot commented Jun 2, 2017

fejta commented Jun 2, 2017

shashidharatd commented Jun 2, 2017

mml commented Jun 2, 2017

mml commented Jun 2, 2017

k8s-ci-robot commented Jun 2, 2017 • edited Loading

mml commented Jun 2, 2017

k8s-github-robot commented Jun 4, 2017

mml commented May 12, 2017 •

edited

Loading

mml May 16, 2017 •

edited

Loading

k8s-ci-robot commented Jun 2, 2017 •

edited

Loading