-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test. #45748
Conversation
/assign krousey |
/release-note-none |
No, actually probably need a release note. |
9d022d3
to
1d216df
Compare
OK! This is now passing for me. Whee! |
def5ce8
to
0ee98f4
Compare
test/e2e/upgrades/services.go
Outdated
@@ -73,7 +73,7 @@ func (t *ServiceUpgradeTest) Test(f *framework.Framework, done <-chan struct{}, | |||
case MasterUpgrade: | |||
t.test(f, done, true) | |||
default: | |||
t.test(f, done, false) | |||
t.test(f, done, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be the default for two reasons.
- The enum could expand to upgrade types that you don't intend to be tested this way.
- Some cloud providers might not support a graceful node upgrade like this. There's should be a check for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@krousey PTAL |
66ad721
to
aa797a1
Compare
@krousey it's squashed now and should have a decent release note. |
I only did a quick review earlier. Here's a more in-depth review. Overall it looks good. |
cluster/gce/upgrade.sh
Outdated
fi | ||
echo "Rolling update ${update} is still in ${result} state." | ||
sleep 10 | ||
echo "== Waiting for ${instance} to become young. ==" >&2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Become young? That's... confusing. Why not wait for recreation to complete?
cluster/gce/upgrade.sh
Outdated
sleep 10 | ||
echo "== Waiting for ${instance} to become young. ==" >&2 | ||
while true; do | ||
age=$(( $(date +%s) - $(date --date=$("${KUBE_ROOT}/cluster/kubectl.sh" get node "${instance}" --output='jsonpath={.metadata.creationTimestamp}' ) +%s) )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very dense to read. I would rather something that checked the recreate operation's status. This can't really tell the difference between a recreate and a system crash/restart. Also, the 60 second window just makes me uncomfortable with timing flakiness. If this comes really quick after starting the cluster, you may mistake the instance as having been restarted. If for some reason this takes too long, then you might miss the window.
What about checking the status of the recreate operation? Something like
gcloud compute operations list --limit=1 --filter="targetLink:*${instance} AND operationType:recreate" --format='get(status)'
should print "DONE" when it's complete. It's probably as dense, but avoids timing ambiguities and date math. You shouldcheck with gcloud operations list --format=json
to make sure I got the filters right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also worth noting that this currently works because kubelet deletes and recreates the node object on upgrade. This would not work if kubelet switched to just updating the node object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about the timing flakiness, and I agree I'd like to be sure that the instance has been recreated.
But both of those can happen and it can still be the case that the kubelet hasn't deleted and recreated the node object, so we still need to make sure that the node is Ready
, and we need to make sure we're checking that the new node is healthy, because we may observe the old node with the same name still being Ready
.
I think I can do this if the script itself deletes the node once it's been drained. The order would be
kubectl drain
kubectl delete node
gcloud ... recreate-instances
- poll
gcloud compute operations list
- poll
kubectl get node
waiting forSchedulingDisabled=false
andReady=true
.
This depends on a different assumption about kubelet's behavior, that it won't re-add itself after I delete it. That seems to be the case if nothing happens on the node, but if kubelet randomly gets restarted (or changes its behavior), it will get re-added, and it will no longer be drained. IOW, deleting the node opens the door for the node to acquire pods again after the code thinks it's drained. A different race.
So having written all that, I'm inclined to poll the operation. You're right that seems cleaner. But I still need a way to distinguish old node from new node, and I can't think of a better way than polling for age. I guess I could poll for version with kubectl get -o jsonpath='{.status.nodeInfo}' node/foo
. That would work as long as we're sure that we're actually changing kubelet versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I didn't mean not checking the nodes reported status. I just meant that checking the operation status is preferred to checking timestamps to see if the node rebooted. You still have all the other issues you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there's another problem with the operation polling approach. The recreate isn't a single operation. It's two, apparently kicked off in succession asynchronously. So I need to wait for the insert operation to be DONE. But I need to make sure it's the right insert operation. Consider the output below if I'm working on rx0c. I don't want to accidentally match on the insert
from 05-12
in this case if it's today.
This looks like I need more timestamp math. WDYT?
% gcloud compute operations list --limit=6 --filter='targetLink:us-central1-f/instances/e2e-minion-group-*' --format='table[box](insertTime,startTime,endTime,operationType,targetLink,status)' --sort-by='~insertTime'
┌───────────────────────────────┬───────────────────────────────┬───────────────────────────────┬────────┬───────────────────────────────────────────────┬────────┐
│ TIMESTAMP │ START_TIME │ END_TIME │ TYPE │ TARGET │ STATUS │
├───────────────────────────────┼───────────────────────────────┼───────────────────────────────┼────────┼───────────────────────────────────────────────┼────────┤
│ 2017-05-16T15:18:37.154-07:00 │ 2017-05-16T15:18:37.709-07:00 │ 2017-05-16T15:18:57.966-07:00 │ insert │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE │
│ 2017-05-16T15:17:26.260-07:00 │ 2017-05-16T15:17:26.599-07:00 │ 2017-05-16T15:18:35.020-07:00 │ delete │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE │
│ 2017-05-16T15:14:30.903-07:00 │ 2017-05-16T15:14:31.369-07:00 │ 2017-05-16T15:14:52.354-07:00 │ insert │ us-central1-f/instances/e2e-minion-group-6lw8 │ DONE │
│ 2017-05-16T15:13:11.547-07:00 │ 2017-05-16T15:13:11.905-07:00 │ 2017-05-16T15:14:28.942-07:00 │ delete │ us-central1-f/instances/e2e-minion-group-6lw8 │ DONE │
│ 2017-05-12T16:20:35.272-07:00 │ 2017-05-12T16:20:35.990-07:00 │ 2017-05-12T16:20:54.404-07:00 │ insert │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE │
│ 2017-05-12T16:19:26.596-07:00 │ 2017-05-12T16:19:26.908-07:00 │ 2017-05-12T16:20:33.207-07:00 │ delete │ us-central1-f/instances/e2e-minion-group-rx0c │ DONE │
└───────────────────────────────┴───────────────────────────────┴───────────────────────────────┴────────┴───────────────────────────────────────────────┴────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you don't think it would be sufficient to grab the latest insert op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can watch for the UID/externalID
of the node object to change. Or you can keep fetching the instance from GCE and wait for the ID to change. This is the authoritative way to know that the instance did indeed get recreated.
Once the ID changes, you need to wait for the node to enter Ready
state.
gcloud compute instances describe $INSTANCE --format='get(id)'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krousey, no because the "latest insert op" will actually be an old insert op for the first minute or so (maybe longer).
@maisem this is good. I cache the id before I kick off recreate-instances
, then poll until that id changes. Then I poll get nodes
until the externalID
matches. Then I can start polling node state.
% cluster/kubectl.sh get no e2e-minion-group-rx0c --output=jsonpath='{.spec.externalID}'
7990879616757056595
% gcloud compute instances describe e2e-minion-group-rx0c --format='get(id)'
7990879616757056595
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maisem I think you're the best one to review this file. Can you PTAL?
test/e2e/framework/service_util.go
Outdated
@@ -490,13 +501,18 @@ func (j *ServiceTestJig) newRCTemplate(namespace string) *v1.ReplicationControll | |||
Labels: j.Labels, | |||
}, | |||
Spec: v1.ReplicationControllerSpec{ | |||
Replicas: func(i int) *int32 { x := int32(i); return &x }(1), | |||
Replicas: func(i int) *int32 { x := int32(i); return &x }(2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... not your fault, but could use just var replicas int32 = 2
at the top of this function and use &replicas
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test/e2e/framework/service_util.go
Outdated
Selector: j.Labels, | ||
Template: &v1.PodTemplateSpec{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Labels: j.Labels, | ||
}, | ||
Spec: v1.PodSpec{ | ||
Affinity: &v1.Affinity{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is starting to change all of the services e2e tests. I don't know enough about them to say that this is OK or not. You should find the people who own those tests to review this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is PROBABLY alright, but would we be safer to only set this for a testcase that is explicitly targetting this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is actually a hook in RunOrFail
where I can add this logic (tweak function). I'll externalize it and only set it up for this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. @krousey can you take a look at how I structured it? I used the existing tweak hook, but the implementation is a little ugly.
cc @monopole |
test/e2e/upgrades/services.go
Outdated
@@ -36,6 +36,8 @@ type ServiceUpgradeTest struct { | |||
|
|||
func (ServiceUpgradeTest) Name() string { return "service-upgrade" } | |||
|
|||
func shouldTestPDBs() bool { return framework.ProviderIs("gce") } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be enabled for GKE as well.
@mml - upgrade.sh changes LGTM. |
c71165a
to
9fd96d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after comment fix.
test/e2e/upgrades/services.go
Outdated
@@ -72,6 +79,9 @@ func (t *ServiceUpgradeTest) Test(f *framework.Framework, done <-chan struct{}, | |||
switch upgrade { | |||
case MasterUpgrade: | |||
t.test(f, done, true) | |||
case NodeUpgrade: | |||
// Node upgrades should test during disruption only on GCE for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCE/GKE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
9fd96d3
to
ae99613
Compare
/approve |
/approve since @krousey is on vacation (although someone else should review the shell :) ) |
Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test. Modified that test so that we include pod anti-affinity constraints and a PDB.
ae99613
to
43e2bec
Compare
@k8s-bot unit test this |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lavalamp, maisem, mml, vishh The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
1 similar comment
@shashidharatd thanks! |
@mml: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this |
Automatic merge from submit-queue |
This is still a WIP... needs to be squashed at least, and I don't think it's currently passing until I increase the scale of the RC, but please have a look at the general outline. Thanks!
Fixes #38336
@kow3ns @bdbauer @krousey @erictune @maisem @davidopp