Skip to content

Conversation

@rosti
Copy link
Contributor

@rosti rosti commented Oct 31, 2018

What type of PR is this?
/kind feature

What this PR does / why we need it:

Until now the control plane timeout was fixed to 4 minutes and users did not have the ability to change it. This commit allows that timeout to be configured via the new timeoutForControlPlane option in the API server config (itself a member of the ClusterConfiguration).

The default timeout is still 4 minutes.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Refs kubernetes/kubeadm#911

Special notes for your reviewer:

This PR depends on:

Please, review only the last commit here.

This PR does not modify the self hosting timeout code, which is going to be deleted in another PR:

/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/area kubeadm
/assign @fabriziopandini
/assign @timothysc

Does this PR introduce a user-facing change?:

kubeadm: timeoutForControlPlane is introduced as part of the API Server config, that controls the timeout for the wait for control plane to be up. Default value is 4 minutes.

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 31, 2018
@rosti rosti force-pushed the controlplane-timeout branch from 2aa3aca to 7eb0e8c Compare October 31, 2018 12:55
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @rosti

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the default here 4m?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @neolit123, please change to 4 b/c that's the default used all over.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 31, 2018
@rosti rosti force-pushed the controlplane-timeout branch from 7eb0e8c to dd32858 Compare November 1, 2018 11:07
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 1, 2018
@timothysc timothysc added the do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. label Nov 1, 2018
Copy link
Contributor

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Generally looks like what we talked about, thx for the patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @neolit123, please change to 4 b/c that's the default used all over.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 1, 2018
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rosti!
/approve
Waiting for dependency to merge before final lgtm

@rosti rosti force-pushed the controlplane-timeout branch from dd32858 to 486c8f3 Compare November 2, 2018 16:08
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 2, 2018
@rosti
Copy link
Contributor Author

rosti commented Nov 2, 2018

Hold because of unmerged dependency PR.

/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 2, 2018
Until now the control plane timeout was fixed to 4 minutes and users did not
have the ability to change it. This commit allows that timeout to be configured
via the new `timeoutForControlPlane` option in the API server config (itself a
member of the ClusterConfiguration).

The default timeout is still 4 minutes.

Signed-off-by: Rostislav M. Georgiev <[email protected]>
@rosti rosti force-pushed the controlplane-timeout branch from 486c8f3 to eb6f7b1 Compare November 5, 2018 10:37
@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 5, 2018
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 5, 2018
@rosti
Copy link
Contributor Author

rosti commented Nov 5, 2018

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 5, 2018
@neolit123
Copy link
Member

@rosti @fabriziopandini

https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/apiclient/wait.go#L159
^ it runs a couple of go routines concurrently an whichever fails first returns an error. so if f() there is for instance WaitForAPI() (for init / wait-control-plane phase) the control plane timeout would be respected but the other target is always WaitForHealthyKubelet() which has a timeout of total 195 seconds. so even if the user sets DefaultControlPlaneTimeout to 10 minutes it will always fail in no more than 195 seconds.

@fabriziopandini
Copy link
Member

@neolit123 @rosti
In my opionin it makes sense

  • if kubelet doesn't start, fail fast after 195 sec (does not make sense wait for the control plane)
    – if kubelet starts, wait for control plane max 4 minutes

@neolit123
Copy link
Member

i think what has to be done here is not use exponential backoff in WaitForHealthyKubelet, but rather allow us to make the function match a timeout the user has specified - e..g. DefaultKubeletTimeout


// TODO: NewControlPlaneWaiter should be converted to private after the self-hosting phase is removed.
waiter, err := phases.NewControlPlaneWaiter(i.dryRun, client, i.outputWriter)
timeout := i.cfg.ClusterConfiguration.APIServer.TimeoutForControlPlane.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why declaring this var, which is used only once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because the expression to take the timeout value is huge and best viewed on a single line. Also the call to NewControlPlaneWaiter on the following line continues to fit on screen.

@rosti
Copy link
Contributor Author

rosti commented Nov 5, 2018

@neolit123 I agree with you, that the waiter code can be simplified a bit (to say the least), but perhaps this is a job for another PR?

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosti Thanks for this PR!
/approve
/lgtm

@neolit123 @dixudx we can eventually iterate on details, but I'm moving on with this PR to get all the API changes in before code slush

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini, rosti, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit b3d81e5 into kubernetes:master Nov 5, 2018
@rosti rosti deleted the controlplane-timeout branch November 5, 2018 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants