-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't wait for first kubelet to be ready and drop dummy deploy #43835
Conversation
return true, nil | ||
}) | ||
|
||
if err := createAndWaitForADummyDeployment(client); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this check if the first node has already ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we move this and the node ready check to a seperate final validation step. Do you mean we should drop it permanently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, just noticed #43824 did this just after firstNodeIsReady
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place to check that. Commented below.
|
||
fmt.Printf("[apiclient] First node is ready after %f seconds\n", time.Since(start).Seconds()) | ||
fmt.Printf("[apiclient] First node has registered after %f seconds\n", time.Since(start).Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you change this comment but don't fix up line 47. Most users won't parse registered vs. ready though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@jbeda I really think we should drop these checks altogether. CreateClientAndWaitForAPI should create a client and wait for the API, not create dummy deployments and wait for nodes to register and be healthy. These are end to end validations and this is the wrong place to do this stuff. We need an explicit final validation phase for this. |
and skip dummy deployment
I agree with @mikedanese that it would be better to separate the validation steps so that it's easier to automate the In the mean time I'm good with this PR or mine, but I'll leave it to someone else to make the final judgement here. Thanks everyone for your help! |
/lgtm Code looks fine to me, but other reviewers are probably more familiar with the expectations around initialization. I'm happy so long as end-to-end testing works. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mikedanese, pipejakob
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
If we do this (which we probably should), we should really make sure it doesn't break us again. The dummy depkoyment was added because of race conditions when the controller manager and/or scheduler said they were ready while they weren't IIRC (deadlock). So I think the problem with removing this is that we might start getting these race conditions (flakes) instead. I'm not 100 % sure this applies anymore but it should be noted. Can't we just tolerate the taint for the dummy deployment? |
The race was due to immediately creating the kube-discovery deployment and should no longer be a problem. I have been and will continue to test this extensively with @pipejakob before we cut 1.6.1 |
@luxas apparently tolerations don't work for scheduling new pods on NotReady nodes, the scheduler always excludes those nodes from scheduling regardless of tolerations. (see the discussion on the other PR: #43824 (comment)) But if people are waiting on "which PR?" to proceed, I'm totally willing to say let's just do this one, whatever gets this out the door quickest since 1.6.0 is kind of DOA for new clusters due to this. |
Automatic merge from submit-queue |
@mikedanese how do i apply your patch to my kubernetes installment? |
@Sadpyro change your apt repo to unstable and update kubeadm |
LGTM to me as well given the comments above |
|
@mikedanese Do you intend to cherrypick this into the release branch? |
@bgrant0607 it has already been cherry-picked #43837 |
Removing label |
…pick-of-#43835-release-1.6 Automatic merge from submit-queue Automated cherry pick of kubernetes#43835 release 1.6 Automated cherry pick of kubernetes#43835 release 1.6 fixes: kubernetes#43815
Commit found in the "release-1.6" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
Per #43815 (comment), I suggest that we drop both the node ready and the dummy deployment check altogether for 1.6 and move them to a validation phase for 1.7.
I really think we should drop these checks altogether. CreateClientAndWaitForAPI should create a client and wait for the API, not create dummy deployments and wait for nodes to register and be healthy. These are end to end validations and this is the wrong place to do this stuff. We need an explicit final validation phase for this.
Fixes #43815