Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't wait for first kubelet to be ready and drop dummy deploy #43835

Merged
merged 1 commit into from
Mar 30, 2017

Conversation

mikedanese
Copy link
Member

@mikedanese mikedanese commented Mar 30, 2017

Per #43815 (comment), I suggest that we drop both the node ready and the dummy deployment check altogether for 1.6 and move them to a validation phase for 1.7.

I really think we should drop these checks altogether. CreateClientAndWaitForAPI should create a client and wait for the API, not create dummy deployments and wait for nodes to register and be healthy. These are end to end validations and this is the wrong place to do this stuff. We need an explicit final validation phase for this.

Fix a deadlock in kubeadm master initialization.

Fixes #43815

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 30, 2017
@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Mar 30, 2017
@k8s-reviewable
Copy link

This change is Reviewable

return true, nil
})

if err := createAndWaitForADummyDeployment(client); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this check if the first node has already ready?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we move this and the node ready check to a seperate final validation step. Do you mean we should drop it permanently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, just noticed #43824 did this just after firstNodeIsReady

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right place to check that. Commented below.

@jbeda
Copy link
Contributor

jbeda commented Mar 30, 2017

This LGTM but, as noted in #43815, I prefer #43824. I'm cool with either, though.


fmt.Printf("[apiclient] First node is ready after %f seconds\n", time.Since(start).Seconds())
fmt.Printf("[apiclient] First node has registered after %f seconds\n", time.Since(start).Seconds())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you change this comment but don't fix up line 47. Most users won't parse registered vs. ready though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mikedanese
Copy link
Member Author

@jbeda I really think we should drop these checks altogether. CreateClientAndWaitForAPI should create a client and wait for the API, not create dummy deployments and wait for nodes to register and be healthy. These are end to end validations and this is the wrong place to do this stuff. We need an explicit final validation phase for this.

and skip dummy deployment
@mikedanese mikedanese added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Mar 30, 2017
@kensimon
Copy link

I agree with @mikedanese that it would be better to separate the validation steps so that it's easier to automate the kubeadm init && kubectl apply cni-plugin.yaml workflows, or at least have let kubeadm manage the network plugin application for you (see kubernetes/kubeadm#214)

In the mean time I'm good with this PR or mine, but I'll leave it to someone else to make the final judgement here. Thanks everyone for your help!

@pipejakob
Copy link
Contributor

/lgtm

Code looks fine to me, but other reviewers are probably more familiar with the expectations around initialization. I'm happy so long as end-to-end testing works.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 30, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mikedanese, pipejakob

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@mikedanese mikedanese added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. and removed do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. labels Mar 30, 2017
@luxas
Copy link
Member

luxas commented Mar 30, 2017

If we do this (which we probably should), we should really make sure it doesn't break us again. The dummy depkoyment was added because of race conditions when the controller manager and/or scheduler said they were ready while they weren't IIRC (deadlock). So I think the problem with removing this is that we might start getting these race conditions (flakes) instead. I'm not 100 % sure this applies anymore but it should be noted.

Can't we just tolerate the taint for the dummy deployment?

@mikedanese
Copy link
Member Author

mikedanese commented Mar 30, 2017

The race was due to immediately creating the kube-discovery deployment and should no longer be a problem. I have been and will continue to test this extensively with @pipejakob before we cut 1.6.1

@kensimon
Copy link

@luxas apparently tolerations don't work for scheduling new pods on NotReady nodes, the scheduler always excludes those nodes from scheduling regardless of tolerations. (see the discussion on the other PR: #43824 (comment))

But if people are waiting on "which PR?" to proceed, I'm totally willing to say let's just do this one, whatever gets this out the door quickest since 1.6.0 is kind of DOA for new clusters due to this.

@mikedanese mikedanese modified the milestones: v1.6, v1.6.1 Mar 30, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 57b7c75 into kubernetes:master Mar 30, 2017
k8s-github-robot pushed a commit that referenced this pull request Mar 31, 2017
…3835-release-1.6

Automatic merge from submit-queue

Automated cherry pick of #43835 release 1.6

Automated cherry pick of #43835 release 1.6

fixes: #43815
@JunkTapes
Copy link

@mikedanese how do i apply your patch to my kubernetes installment?

@lucianomagrao
Copy link

@Sadpyro change your apt repo to unstable and update kubeadm

@luxas
Copy link
Member

luxas commented Mar 31, 2017

LGTM to me as well given the comments above

@mikedanese mikedanese deleted the kubeadm-fix branch March 31, 2017 13:43
@lucianomagrao
Copy link

lucianomagrao commented Mar 31, 2017

@Sadpyro

cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial-unstable main
EOF

@bgrant0607
Copy link
Member

@mikedanese Do you intend to cherrypick this into the release branch?

@mikedanese
Copy link
Member Author

@bgrant0607 it has already been cherry-picked #43837

@k8s-cherrypick-bot
Copy link

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@mikedanese mikedanese modified the milestones: v1.6, v1.6.1 Mar 31, 2017
mintzhao pushed a commit to mintzhao/kubernetes that referenced this pull request Jun 1, 2017
…pick-of-#43835-release-1.6

Automatic merge from submit-queue

Automated cherry pick of kubernetes#43835 release 1.6

Automated cherry pick of kubernetes#43835 release 1.6

fixes: kubernetes#43815
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.6" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.