Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running Ubuntu image on GCE node #44744

Merged
merged 2 commits into from
Apr 26, 2017

Conversation

yguo0905
Copy link
Contributor

@yguo0905 yguo0905 commented Apr 21, 2017

What this PR does / why we need it:

This PR (on top of #44629) contains the script changes for running Ubuntu image on GCE.

Special notes for your reviewer:

We made change in gci/node.yaml and gci/master.yaml to ensure that Kubernetes jobs can start automatically after reboot. This is not needed for GCI but required by Ubuntu. See #44744 (comment) for details. With this change, Ubuntu could use the same provisioning scripts as GCI's.

Ran e2e tests using the following command and all tests passed.

KUBE_GCE_NODE_IMAGE=ubuntu-gke-1604-xenial-v20170420-1 KUBE_GCE_NODE_PROJECT=ubuntu-os-gke-cloud KUBE_NODE_OS_DISTRIBUTION=ubuntu GINKGO_PARALLEL=y GINKGO_PARALLEL_NODES=30 go run hack/e2e.go -- -v --build --up --test --test_args="--ginkgo.skip=\[Slow\]|\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\]" --down

Also tested manually for both GCI and Ubuntu images.

Running Ubuntu image on master is not in the scope and has not been tested.

Release note:
Support Ubuntu 16.04 image on GCE

Using Ubuntu on GCE to run cluster e2e tests requires slightly different
node.yaml and master.yaml files than GCI, because Ubuntu uses systemd as
PID 1, wheras GCI uses upstart with a systemd delegate. Therefore the
e2e tests fail using those files since the kubernetes services are not
brought back up after a node/master reboot.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 21, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Apr 21, 2017
@yguo0905
Copy link
Contributor Author

@chrisglass
@dchen1107

Copy link

@chrisglass chrisglass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure why the ubuntu-specific subdirectory was deleted and instead synlinked to the GCI one, since in my experience this doesn't work.

Specifically, the kubernetes services are not restarted after a system reboot since the systemd target is not enabled, nor does it have an [Install]\nWantedBy= entry.

Or is the assumption that the result of some of the tests should be ignored? (at least one of them uses rebooting nodes to simulate network partition)
Is it assumed that user will never reboot nodes?

Description=Kubernetes

[Install]
WantedBy=multi-user.target

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed by Ubuntu.

The GCI image doesn't have this in its systemd services because GCI uses systemd as a delegate (pid 1 is upstart).

Therefore symlinking ubuntu -> gci does not work: if a node reboots the services won't be brought back up.

That is the reason I originally duplicated some of the files in the subdirectory instead of symlinking, the node.yaml and master.yaml files need to be different.

Or does this imply that GCI will change to use systemd as pid 1?

- systemctl enable kubelet-monitor.service
- systemctl enable kube-logrotate.timer
- systemctl enable kube-logrotate.service
- systemctl enable kubernetes.target

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed by Ubuntu.

This extra "enable" call ensures that the appropriate symlink is created by systemd (so that it is brought back up after a reboot).

ensure-gci-metadata-files
create-node-template "$template_name" "${scope_flags[*]}" \
"kube-env=${KUBE_TEMP}/node-kube-env.yaml" \
"user-data=${KUBE_ROOT}/cluster/gce/ubuntu/node.yaml" \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existence of this file was to make sure the approrpiate node.yaml file is passed to the node instance (an ubuntu specific one, not the GCI one).

- systemctl enable kubelet-monitor.service
- systemctl enable kube-logrotate.timer
- systemctl enable kube-logrotate.service
- systemctl enable kubernetes.target

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed by Ubuntu.

This extra "enable" call ensures that the appropriate symlink is created by systemd (so that it is brought back up after a reboot).

Description=Kubernetes

[Install]
WantedBy=multi-user.target

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed by Ubuntu.

The GCI image doesn't have this in its systemd services because GCI uses systemd as a delegate (pid 1 is upstart).

Therefore symlinking ubuntu -> gci does not work: if a node reboots the services won't be brought back up.

That is the reason I originally duplicated some of the files in the subdirectory instead of symlinking, the node.yaml and master.yaml files need to be different.

@yguo0905 yguo0905 force-pushed the ubuntu-gce-cluster-tests branch from 4ee450f to cf23d4c Compare April 21, 2017 17:48
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 21, 2017
@yguo0905
Copy link
Contributor Author

You are right. I wasn't aware of the reboot issue. The tests I ran (the slow, serial, disruptive, flaky and feature tests were skipped) didn't cover this case.
I will keep the original commit and set up Jenkins jobs to run more tests against it.

@spxtr
Copy link
Contributor

spxtr commented Apr 21, 2017

@k8s-bot test this

@chrisglass
Copy link

Since this is basically the same as #44629 and I don't have push access to this branch, I suggest this to be closed and efforts focused on #44629 instead.

if [[ "${master}" == "true" && ("${MASTER_OS_DISTRIBUTION}" == "trusty" || "${MASTER_OS_DISTRIBUTION}" == "gci" || "${MASTER_OS_DISTRIBUTION}" == "container-linux") ]] || \
[[ "${master}" == "false" && ("${NODE_OS_DISTRIBUTION}" == "trusty" || "${NODE_OS_DISTRIBUTION}" == "gci" || "${NODE_OS_DISTRIBUTION}" == "container-linux") ]] ; then
if [[ "${master}" == "true" && ("${MASTER_OS_DISTRIBUTION}" == "trusty" || "${MASTER_OS_DISTRIBUTION}" == "gci" || "${MASTER_OS_DISTRIBUTION}" == "container-linux") || "${MASTER_OS_DISTRIBUTION}" == "ubuntu" ]] || \
[[ "${master}" == "false" && ("${NODE_OS_DISTRIBUTION}" == "trusty" || "${NODE_OS_DISTRIBUTION}" == "gci" || "${NODE_OS_DISTRIBUTION}" == "container-linux") || "${NODE_OS_DISTRIBUTION}" = "ubuntu" ]] ; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove all references to trusty since the trusty support was removed a while back. Let's do it in a separate pr.

@@ -0,0 +1,111 @@
#!/bin/bash

# Copyright 2016 The Kubernetes Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/2016/2017?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in f07669b

--tags "${MASTER_TAG}" \
--network "${NETWORK}" \
--scopes "storage-ro,compute-rw,monitoring,logging-write" \
--can-ip-forward \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisglass By comparing gce/ubuntu/master-helper.sh and gce/gci/master-helper.sh, adding --can-ip-forward upon creating the instance is the only real difference here. Why do we need this if GCI / COS doesn't require it? cc/ @roberthbailey @zmerlynn

The reason I care this subtle difference is that I want to make sure the difference between two package as minimal as possible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect I "forked" the script when GCI still had --can-ip-forward. I am fine with dropping it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that COS doesn't require the --can-ip-forward flag.

@bowei

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dchen1107 -- Is this because we never contact a pod IP on the master from any other node?

Copy link
Contributor Author

@yguo0905 yguo0905 Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both master-helper.sh and node-helper.sh (via create-node-template) call make-gcloud-network-argument, which sets --can-ip-forward unless KUBE_GCE_ENABLE_IP_ALIASES is set. So COS is using this flag and we are good.


echo "${kube_env}" > ${KUBE_TEMP}/master-kube-env.yaml
get-metadata "${existing_master_zone}" "${existing_master_name}" cluster-name > "${KUBE_TEMP}/cluster-name.txt"
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-update-strategy > "${KUBE_TEMP}/gci-update.txt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCI specific variable, no need here.

echo "${kube_env}" > ${KUBE_TEMP}/master-kube-env.yaml
get-metadata "${existing_master_zone}" "${existing_master_name}" cluster-name > "${KUBE_TEMP}/cluster-name.txt"
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-update-strategy > "${KUBE_TEMP}/gci-update.txt"
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-ensure-gke-docker > "${KUBE_TEMP}/gci-ensure-gke-docker.txt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCI specific legacy variable, no need here, and should be removed from gci too.

get-metadata "${existing_master_zone}" "${existing_master_name}" cluster-name > "${KUBE_TEMP}/cluster-name.txt"
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-update-strategy > "${KUBE_TEMP}/gci-update.txt"
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-ensure-gke-docker > "${KUBE_TEMP}/gci-ensure-gke-docker.txt"
get-metadata "${existing_master_zone}" "${existing_master_name}" gci-docker-version > "${KUBE_TEMP}/gci-docker-version.txt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto...

- systemctl enable kubelet-monitor.service
- systemctl enable kube-logrotate.timer
- systemctl enable kube-logrotate.service
- systemctl enable kubernetes.target
Copy link
Member

@dchen1107 dchen1107 Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding, this is the only line (another one in node.yaml) must required to creating a separate gce/ubuntu folder besides gce/common.sh & util.sh. I am wondering if we can introduce the same:

  • systemctl enable kubernetes.target
    before start kubernetes.target. If we could, we can hugely simplify the codebase, and avoid future unnecessary devergening and minimize the maintenance cost.

cc/ @kubernetes/sig-cluster-lifecycle-api-reviews

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked into this. In GCI, the cloud-init's specs in /var/lib/cloud are not persisted across reboots, so cloud-init will always run everything as if it's on a newly created machine1. Therefore, systemctl enable kubernetes.target is useless in this case.

However, this is not the case for Ubuntu, where cloud-init will skip those steps whose frequency is once. That's why we need to have kubernetes.target required by state multi-user.target and have it enabled. But I'm not sure if it needs to be strictly started after docker.service though. (BTW, the After should be under section [Unit] rather than [Install]).

Therefore, I think it should be safe to make the changes for GCI and change ubuntu to be a symlink to gci. Both GCI and Ubuntu should be working with this change. (I'm not sure if Ubuntu should have the same behavior as GCI wrt cloud-init though.)

@roberthbailey and @kubernetes/sig-cluster-lifecycle-api-reviews, can you please confirm?


1 This is because the upstart jobs are in /etc/init, which is not persisted across reboots in GCI. In this case, the cloud-init will be rerun in each reboot but it will not recreate the jobs because their run frequency is once. In order to fix this, GCI needs to reset /var/lib/cloud during restart to make cloud-init always run everything.

"user-data=${KUBE_ROOT}/cluster/gce/ubuntu/node.yaml" \
"configure-sh=${KUBE_ROOT}/cluster/gce/gci/configure.sh" \
"cluster-name=${KUBE_TEMP}/cluster-name.txt" \
"gci-update-strategy=${KUBE_TEMP}/gci-update.txt" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need this.

"configure-sh=${KUBE_ROOT}/cluster/gce/gci/configure.sh" \
"cluster-name=${KUBE_TEMP}/cluster-name.txt" \
"gci-update-strategy=${KUBE_TEMP}/gci-update.txt" \
"gci-ensure-gke-docker=${KUBE_TEMP}/gci-ensure-gke-docker.txt" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto...

"cluster-name=${KUBE_TEMP}/cluster-name.txt" \
"gci-update-strategy=${KUBE_TEMP}/gci-update.txt" \
"gci-ensure-gke-docker=${KUBE_TEMP}/gci-ensure-gke-docker.txt" \
"gci-docker-version=${KUBE_TEMP}/gci-docker-version.txt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto...

- systemctl enable kubelet-monitor.service
- systemctl enable kube-logrotate.timer
- systemctl enable kube-logrotate.service
- systemctl enable kubernetes.target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as the one in master.yaml

…es the gci's [master|node].yaml to enable kubernetes.target.

This enables Ubuntu to use the same provisioning scripts as GCI's. The
change for enabling kubernetes.target is needed for Ubuntu but a no-op
for GCI.
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 25, 2017
@yguo0905
Copy link
Contributor Author

I've changed ubuntu to be a symlink to gci and made changes to start kubernetes.target automatically after reboot. Tested manually for both GCI and Ubuntu images (for reboots). Please take another look.

@yguo0905
Copy link
Contributor Author

Besides my manual tests, also confirmed with @adityakali, the changes in [master|node].yaml should be a no-op for GCI.

@dchen1107 dchen1107 added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Apr 26, 2017
@dchen1107
Copy link
Member

I updated the release-notes a little bit: Support Ubuntu 16.04 image on GCE

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 26, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, yguo0905

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 26, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 6d17ab3 into kubernetes:master Apr 26, 2017
@yguo0905 yguo0905 changed the title Support running Ubuntu image on GCE Support running Ubuntu image on GCE node Apr 27, 2017
@k8s-cherrypick-bot
Copy link

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@dchen1107 dchen1107 added this to the v1.6 milestone May 2, 2017
k8s-github-robot pushed a commit that referenced this pull request May 4, 2017
Automatic merge from submit-queue

Cherry pick #44744 on release-1.6

**What this PR does / why we need it**:

Cherry pick of #44744 on release-1.6.

The changes need to cherry picked are relatively minimal so the risky is low. This allows us to run Ubuntu e2e tests against 1.6 branch.

**Special notes for your reviewer**:

**Release note**:

```
Support Ubuntu 16.04 image on GCE
```
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.6" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

@@ -1,6 +1,6 @@
#!/bin/bash

# Copyright 2014 The Kubernetes Authors.
# Copyright 2017 The Kubernetes Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't update copyright dates. The dates should be when the file was added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.