-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support running Ubuntu image on GCE node #44744
Support running Ubuntu image on GCE node #44744
Conversation
Using Ubuntu on GCE to run cluster e2e tests requires slightly different node.yaml and master.yaml files than GCI, because Ubuntu uses systemd as PID 1, wheras GCI uses upstart with a systemd delegate. Therefore the e2e tests fail using those files since the kubernetes services are not brought back up after a node/master reboot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure why the ubuntu-specific subdirectory was deleted and instead synlinked to the GCI one, since in my experience this doesn't work.
Specifically, the kubernetes services are not restarted after a system reboot since the systemd target is not enabled, nor does it have an [Install]\nWantedBy= entry.
Or is the assumption that the result of some of the tests should be ignored? (at least one of them uses rebooting nodes to simulate network partition)
Is it assumed that user will never reboot nodes?
cluster/gce/ubuntu/node.yaml
Outdated
Description=Kubernetes | ||
|
||
[Install] | ||
WantedBy=multi-user.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed by Ubuntu.
The GCI image doesn't have this in its systemd services because GCI uses systemd as a delegate (pid 1 is upstart).
Therefore symlinking ubuntu -> gci does not work: if a node reboots the services won't be brought back up.
That is the reason I originally duplicated some of the files in the subdirectory instead of symlinking, the node.yaml and master.yaml files need to be different.
Or does this imply that GCI will change to use systemd as pid 1?
cluster/gce/ubuntu/node.yaml
Outdated
- systemctl enable kubelet-monitor.service | ||
- systemctl enable kube-logrotate.timer | ||
- systemctl enable kube-logrotate.service | ||
- systemctl enable kubernetes.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed by Ubuntu.
This extra "enable" call ensures that the appropriate symlink is created by systemd (so that it is brought back up after a reboot).
cluster/gce/ubuntu/node-helper.sh
Outdated
ensure-gci-metadata-files | ||
create-node-template "$template_name" "${scope_flags[*]}" \ | ||
"kube-env=${KUBE_TEMP}/node-kube-env.yaml" \ | ||
"user-data=${KUBE_ROOT}/cluster/gce/ubuntu/node.yaml" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existence of this file was to make sure the approrpiate node.yaml file is passed to the node instance (an ubuntu specific one, not the GCI one).
cluster/gce/ubuntu/master.yaml
Outdated
- systemctl enable kubelet-monitor.service | ||
- systemctl enable kube-logrotate.timer | ||
- systemctl enable kube-logrotate.service | ||
- systemctl enable kubernetes.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed by Ubuntu.
This extra "enable" call ensures that the appropriate symlink is created by systemd (so that it is brought back up after a reboot).
cluster/gce/ubuntu/master.yaml
Outdated
Description=Kubernetes | ||
|
||
[Install] | ||
WantedBy=multi-user.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed by Ubuntu.
The GCI image doesn't have this in its systemd services because GCI uses systemd as a delegate (pid 1 is upstart).
Therefore symlinking ubuntu -> gci does not work: if a node reboots the services won't be brought back up.
That is the reason I originally duplicated some of the files in the subdirectory instead of symlinking, the node.yaml and master.yaml files need to be different.
4ee450f
to
cf23d4c
Compare
You are right. I wasn't aware of the reboot issue. The tests I ran (the slow, serial, disruptive, flaky and feature tests were skipped) didn't cover this case. |
@k8s-bot test this |
if [[ "${master}" == "true" && ("${MASTER_OS_DISTRIBUTION}" == "trusty" || "${MASTER_OS_DISTRIBUTION}" == "gci" || "${MASTER_OS_DISTRIBUTION}" == "container-linux") ]] || \ | ||
[[ "${master}" == "false" && ("${NODE_OS_DISTRIBUTION}" == "trusty" || "${NODE_OS_DISTRIBUTION}" == "gci" || "${NODE_OS_DISTRIBUTION}" == "container-linux") ]] ; then | ||
if [[ "${master}" == "true" && ("${MASTER_OS_DISTRIBUTION}" == "trusty" || "${MASTER_OS_DISTRIBUTION}" == "gci" || "${MASTER_OS_DISTRIBUTION}" == "container-linux") || "${MASTER_OS_DISTRIBUTION}" == "ubuntu" ]] || \ | ||
[[ "${master}" == "false" && ("${NODE_OS_DISTRIBUTION}" == "trusty" || "${NODE_OS_DISTRIBUTION}" == "gci" || "${NODE_OS_DISTRIBUTION}" == "container-linux") || "${NODE_OS_DISTRIBUTION}" = "ubuntu" ]] ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove all references to trusty since the trusty support was removed a while back. Let's do it in a separate pr.
cluster/gce/ubuntu/master-helper.sh
Outdated
@@ -0,0 +1,111 @@ | |||
#!/bin/bash | |||
|
|||
# Copyright 2016 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/2016/2017?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in f07669b
cluster/gce/ubuntu/master-helper.sh
Outdated
--tags "${MASTER_TAG}" \ | ||
--network "${NETWORK}" \ | ||
--scopes "storage-ro,compute-rw,monitoring,logging-write" \ | ||
--can-ip-forward \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisglass By comparing gce/ubuntu/master-helper.sh and gce/gci/master-helper.sh, adding --can-ip-forward upon creating the instance is the only real difference here. Why do we need this if GCI / COS doesn't require it? cc/ @roberthbailey @zmerlynn
The reason I care this subtle difference is that I want to make sure the difference between two package as minimal as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect I "forked" the script when GCI still had --can-ip-forward. I am fine with dropping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that COS doesn't require the --can-ip-forward flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dchen1107 -- Is this because we never contact a pod IP on the master from any other node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both master-helper.sh and node-helper.sh (via create-node-template
) call make-gcloud-network-argument
, which sets --can-ip-forward unless KUBE_GCE_ENABLE_IP_ALIASES
is set. So COS is using this flag and we are good.
cluster/gce/ubuntu/master-helper.sh
Outdated
|
||
echo "${kube_env}" > ${KUBE_TEMP}/master-kube-env.yaml | ||
get-metadata "${existing_master_zone}" "${existing_master_name}" cluster-name > "${KUBE_TEMP}/cluster-name.txt" | ||
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-update-strategy > "${KUBE_TEMP}/gci-update.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCI specific variable, no need here.
cluster/gce/ubuntu/master-helper.sh
Outdated
echo "${kube_env}" > ${KUBE_TEMP}/master-kube-env.yaml | ||
get-metadata "${existing_master_zone}" "${existing_master_name}" cluster-name > "${KUBE_TEMP}/cluster-name.txt" | ||
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-update-strategy > "${KUBE_TEMP}/gci-update.txt" | ||
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-ensure-gke-docker > "${KUBE_TEMP}/gci-ensure-gke-docker.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCI specific legacy variable, no need here, and should be removed from gci too.
cluster/gce/ubuntu/master-helper.sh
Outdated
get-metadata "${existing_master_zone}" "${existing_master_name}" cluster-name > "${KUBE_TEMP}/cluster-name.txt" | ||
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-update-strategy > "${KUBE_TEMP}/gci-update.txt" | ||
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-ensure-gke-docker > "${KUBE_TEMP}/gci-ensure-gke-docker.txt" | ||
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-docker-version > "${KUBE_TEMP}/gci-docker-version.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto...
cluster/gce/ubuntu/master.yaml
Outdated
- systemctl enable kubelet-monitor.service | ||
- systemctl enable kube-logrotate.timer | ||
- systemctl enable kube-logrotate.service | ||
- systemctl enable kubernetes.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding, this is the only line (another one in node.yaml) must required to creating a separate gce/ubuntu folder besides gce/common.sh & util.sh. I am wondering if we can introduce the same:
- systemctl enable kubernetes.target
before start kubernetes.target. If we could, we can hugely simplify the codebase, and avoid future unnecessary devergening and minimize the maintenance cost.
cc/ @kubernetes/sig-cluster-lifecycle-api-reviews
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked into this. In GCI, the cloud-init's specs in /var/lib/cloud
are not persisted across reboots, so cloud-init will always run everything as if it's on a newly created machine1. Therefore, systemctl enable kubernetes.target
is useless in this case.
However, this is not the case for Ubuntu, where cloud-init will skip those steps whose frequency is once
. That's why we need to have kubernetes.target
required by state multi-user.target
and have it enabled. But I'm not sure if it needs to be strictly started after docker.service
though. (BTW, the After
should be under section [Unit]
rather than [Install]
).
Therefore, I think it should be safe to make the changes for GCI and change ubuntu
to be a symlink to gci
. Both GCI and Ubuntu should be working with this change. (I'm not sure if Ubuntu should have the same behavior as GCI wrt cloud-init though.)
@roberthbailey and @kubernetes/sig-cluster-lifecycle-api-reviews, can you please confirm?
1 This is because the upstart jobs are in /etc/init, which is not persisted across reboots in GCI. In this case, the cloud-init will be rerun in each reboot but it will not recreate the jobs because their run frequency is once. In order to fix this, GCI needs to reset /var/lib/cloud during restart to make cloud-init always run everything.
cluster/gce/ubuntu/node-helper.sh
Outdated
"user-data=${KUBE_ROOT}/cluster/gce/ubuntu/node.yaml" \ | ||
"configure-sh=${KUBE_ROOT}/cluster/gce/gci/configure.sh" \ | ||
"cluster-name=${KUBE_TEMP}/cluster-name.txt" \ | ||
"gci-update-strategy=${KUBE_TEMP}/gci-update.txt" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need this.
cluster/gce/ubuntu/node-helper.sh
Outdated
"configure-sh=${KUBE_ROOT}/cluster/gce/gci/configure.sh" \ | ||
"cluster-name=${KUBE_TEMP}/cluster-name.txt" \ | ||
"gci-update-strategy=${KUBE_TEMP}/gci-update.txt" \ | ||
"gci-ensure-gke-docker=${KUBE_TEMP}/gci-ensure-gke-docker.txt" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto...
cluster/gce/ubuntu/node-helper.sh
Outdated
"cluster-name=${KUBE_TEMP}/cluster-name.txt" \ | ||
"gci-update-strategy=${KUBE_TEMP}/gci-update.txt" \ | ||
"gci-ensure-gke-docker=${KUBE_TEMP}/gci-ensure-gke-docker.txt" \ | ||
"gci-docker-version=${KUBE_TEMP}/gci-docker-version.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto...
cluster/gce/ubuntu/node.yaml
Outdated
- systemctl enable kubelet-monitor.service | ||
- systemctl enable kube-logrotate.timer | ||
- systemctl enable kube-logrotate.service | ||
- systemctl enable kubernetes.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the one in master.yaml
…es the gci's [master|node].yaml to enable kubernetes.target. This enables Ubuntu to use the same provisioning scripts as GCI's. The change for enabling kubernetes.target is needed for Ubuntu but a no-op for GCI.
I've changed |
Besides my manual tests, also confirmed with @adityakali, the changes in [master|node].yaml should be a no-op for GCI. |
I updated the release-notes a little bit: Support Ubuntu 16.04 image on GCE /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, yguo0905
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue |
Removing label |
Automatic merge from submit-queue Cherry pick #44744 on release-1.6 **What this PR does / why we need it**: Cherry pick of #44744 on release-1.6. The changes need to cherry picked are relatively minimal so the risky is low. This allows us to run Ubuntu e2e tests against 1.6 branch. **Special notes for your reviewer**: **Release note**: ``` Support Ubuntu 16.04 image on GCE ```
Commit found in the "release-1.6" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
@@ -1,6 +1,6 @@ | |||
#!/bin/bash | |||
|
|||
# Copyright 2014 The Kubernetes Authors. | |||
# Copyright 2017 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't update copyright dates. The dates should be when the file was added.
What this PR does / why we need it:
This PR (on top of #44629) contains the script changes for running Ubuntu image on GCE.
Special notes for your reviewer:
We made change in
gci/node.yaml
andgci/master.yaml
to ensure that Kubernetes jobs can start automatically after reboot. This is not needed for GCI but required by Ubuntu. See #44744 (comment) for details. With this change, Ubuntu could use the same provisioning scripts as GCI's.Ran e2e tests using the following command and all tests passed.
Also tested manually for both GCI and Ubuntu images.
Running Ubuntu image on master is not in the scope and has not been tested.
Release note:
Support Ubuntu 16.04 image on GCE