Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certificate rotation for kubelet server certs. #45059

Merged

Conversation

jcbsmpsn
Copy link
Contributor

@jcbsmpsn jcbsmpsn commented Apr 28, 2017

Replaces the current kubelet server side self signed certs with certs signed by
the Certificate Request Signing API on the API server. Also renews expiring
kubelet server certs as expiration approaches.

Two Points:

  1. With --feature-gates=RotateKubeletServerCertificate=true set, the kubelet will
    request a certificate during the boot cycle and pause waiting for the request to
    be satisfied.
  2. In order to have the kubelet's certificate signing request auto approved,
    --insecure-experimental-approve-all-kubelet-csrs-for-group= must be set on
    the cluster controller manager. There is an improved mechanism for auto
    approval proposed.

Release note:

With `--feature-gates=RotateKubeletServerCertificate=true` set, the kubelet will
request a server certificate from the API server during the boot cycle and pause
waiting for the request to be satisfied. It will continually refresh the certificate as
the certificates expiration approaches.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 28, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. release-note-label-needed labels Apr 28, 2017
@jcbsmpsn
Copy link
Contributor Author

@ericchiang Initial bootstrapping of the server cert by requesting a signed cert from the certificate request signing API on the API server is in here.

Copy link
Contributor

@ericchiang ericchiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the Go 1.8.1 is getting rolled back so we might have to wait a bit for the CI :( #38228 (comment)

@@ -72,7 +72,7 @@ func (cc *groupApprover) AutoApprove(csr *certificates.CertificateSigningRequest
if len(x509cr.DNSNames)+len(x509cr.EmailAddresses)+len(x509cr.IPAddresses) != 0 {
return csr, nil
}
if !hasExactUsages(csr, kubeletClientUsages) {
if !hasExactUsages(csr, kubeletClientUsages) && !hasExactUsages(csr, kubeletServerUsages) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the server cert, I'd expect some matching of the user that requested it (system:node:(nodename)) and the common name requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What check are you thinking on the common name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During bootstrapping, the client cert contains the common name system:node:( nodename ). When it requests a serving cert from the certificates API, it'll be authenticated as the username system:node:( nodename ). When approving these requests, I'd expect that a node would only be able to request serving certs for it's nodename, and not others.

Actually, after the groupapprover is used to grant the client cert, the kubelet can use that cert and wont be part of the bootstrapping group after. This was brought up by @deads2k in the original client cert bootstrapping PR: #30094 (comment)

cc @kubernetes/sig-auth-pr-reviews

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. There are three distinct autoapproval cases we want to detect (see #45030)

  1. CSR shaped like a kubelet client cert, requested by the kubelet itself
  2. CSR shaped like a kubelet serving cert, requested by the kubelet itself
  3. CSR shaped like a kubelet client cert, requested by a different user (like a shared bootstrap user)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems beyond the scope of the group approver. Maybe a different PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is my hope to put some bare minimum into this PR and move on to #45030 next, after getting some feedback on that issue. @mikedanese says he thinks it would be a relatively small change, but it will depend on feedback.

If it seems reasonable to proceed here with some bare minimum, would addressing @ericchiang's comment be enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the checks this function does wouldn't match the shape of a node's serving cert... it would have IP and DNS SANs, at least. I'd expect to start to see the refactor into isNodeClientCSR(CSR) and isNodeServingCSR(CSR) here.

the way we eventually determine which IP and DNS names a node can ask for a serving cert for are will be interesting... should a kubelet have to register its Node API object before it can ask for a serving cert?

@luxas luxas added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Apr 29, 2017
@jcbsmpsn jcbsmpsn force-pushed the rotate-server-certificate branch from fc62cd1 to 8c462c2 Compare May 2, 2017 00:51
@jcbsmpsn jcbsmpsn force-pushed the rotate-server-certificate branch from 8c462c2 to f5c6484 Compare May 4, 2017 22:37
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 4, 2017
@jcbsmpsn jcbsmpsn force-pushed the rotate-server-certificate branch from f5c6484 to 7dc7cf3 Compare May 4, 2017 23:24
Copy link
Contributor

@ericchiang ericchiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @liggitt the controller manager approver changes have been removed from this PR.

@@ -46,6 +46,9 @@ const (
// manager. In the background it communicates with the API server to get new
// certificates for certificates about to expire.
type Manager interface {
// CertificateSigningRequestClient sets the client interface that is used for
// signing new certificates generated as part of rotation.
CertificateSigningRequestClient(certificatesclient.CertificateSigningRequestInterface)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this to be part of the initializer. Any reason this was added to the interface? It doesn't actually look like it's being used outside of tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used in this PR, but it's used in #41912. It's specifically to work around the one case where the API client needs the certificate from the certificate manager to make a connection, and the certificate manager needs the API client to request certificate changes. The sequence of steps is:

  • Initialize certificate manager with local certs (either bootstrap or previously requested and stored cert). No client is passed in yet, so the certificate manager can't rotate/update yet.
  • Initialize the API client with the certificate manager
  • .Start() the certificate manager
    • certificate manager uses the client to request an updated certificate (if it needs too, either because of bootstrapping or expiration)

@jcbsmpsn
Copy link
Contributor Author

jcbsmpsn commented May 8, 2017

/assign @dchen1107

@jcbsmpsn
Copy link
Contributor Author

jcbsmpsn commented May 8, 2017

/assign @timstclair

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2017
@jcbsmpsn
Copy link
Contributor Author

jcbsmpsn commented May 8, 2017

/unassign @dchen1107

@jcbsmpsn jcbsmpsn force-pushed the rotate-server-certificate branch from 7dc7cf3 to b3530ac Compare May 8, 2017 22:42
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2017
@jcbsmpsn jcbsmpsn force-pushed the rotate-server-certificate branch from b3530ac to f0bbe84 Compare May 9, 2017 17:26
@jcbsmpsn
Copy link
Contributor Author

jcbsmpsn commented May 9, 2017

@k8s-bot gce etcd3 e2e test this

@jcbsmpsn jcbsmpsn force-pushed the rotate-server-certificate branch from f0bbe84 to 9a9d3c1 Compare May 13, 2017 10:09
@k8s-github-robot k8s-github-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 13, 2017
func initializeServerCertificateManager(kubeClient clientset.Interface, kubeCfg *componentconfig.KubeletConfiguration, nodeName types.NodeName, cloudIPs []net.IP, hostnames []string) (certificate.Manager, error) {
var certSigningRequestClient clientcertificates.CertificateSigningRequestInterface
if kubeClient != nil && kubeClient.Certificates() != nil {
certSigningRequestClient = kubeClient.Certificates().CertificateSigningRequests()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the implications if we don't have a client here? is it even possible for the cert manager to do anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The certificate manager can serve whatever local certificates it has, either received through the command line parameters, config file, or existing files on disk.

DNSNames: hostnames,
IPAddresses: ips,
},
Usages: []certificates.KeyUsage{
Copy link
Member

@liggitt liggitt May 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be worth having a helper in the certificates api package that returns usages for a TLS client cert, and one that returns usages for a TLS serving cert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will follow on with this change.

if err != nil {
return nil, fmt.Errorf("failed to initialize certificate store: %v", err)
}
ips, err := allLocalIPsWithoutLoopback()
Copy link
Member

@liggitt liggitt May 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hoist IP detection out of this function, this function should just take the lists of dns and ip SANs we want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

})
}

func allLocalIPsWithoutLoopback() ([]net.IP, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kubelet takes an --address option that lets you bind to a specific interface. if that is set to a specific IP, we should just use that (we should only iterate over all interfaces if it is binding to 0.0.0.0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@liggitt liggitt added sig/auth Categorizes an issue or PR as relevant to SIG Auth. area/security labels May 27, 2017
@liggitt
Copy link
Member

liggitt commented May 27, 2017

looking good, just a couple questions and a couple structural changes around where/how we determine local ips. (needs a release note in the description as well)

@liggitt liggitt added the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 27, 2017
@jcbsmpsn jcbsmpsn force-pushed the rotate-server-certificate branch from dfca6bc to 2b4b578 Compare May 29, 2017 13:43
Copy link

@timstclair timstclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for adding a more detailed release note.

if err != nil {
return nil, err
}
ips = append(ips, cloudIPs...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for allLocalIPsWithoutLoopback to include the external IPs? If so, I think you should probably de-dupe this list?

// Gets a server certificate for the kubelet from the Certificate Signing
// Request API instead of generating one self signed and auto rotates the
// certificate as expiration approaches.
RotateKubeletServerCertificate utilfeature.Feature = "RotateKubeletServerCertificate"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think feature gates should be a long-term option. If we want something to toggle this after it goes to GA, it should be a separate flag.

@jcbsmpsn jcbsmpsn force-pushed the rotate-server-certificate branch 2 times, most recently from d2febc8 to 92a9d07 Compare May 29, 2017 18:56
@jcbsmpsn
Copy link
Contributor Author

Added a release note.

Replaces the current kubelet server side self signed certs with certs
signed by the Certificate Request Signing API on the API server. Also
renews expiring kubelet server certs as expiration approaches.
@timstclair
Copy link

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 30, 2017
@jcbsmpsn
Copy link
Contributor Author

/assign @smarterclayton

@smarterclayton
Copy link
Contributor

/approve

as per reviewers

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcbsmpsn, smarterclayton, timstclair

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 30, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 46635, 45619, 46637, 45059, 46415)

@k8s-github-robot k8s-github-robot merged commit f2074ba into kubernetes:master May 31, 2017
@smarterclayton
Copy link
Contributor

This broke hack/local-up-cluster - while this will be disabled by default in hack local up cluster, we need a fix to it that allows someone to locally test this change.

@liggitt
Copy link
Member

liggitt commented May 31, 2017

PR to disable alpha gates in local up cluster is already open. Would be easy to create a CA for the controller manager to enable signing.

@@ -1037,6 +1094,62 @@ type Kubelet struct {
dockerLegacyService dockershim.DockerLegacyService
}

func initializeServerCertificateManager(kubeClient clientset.Interface, kubeCfg *componentconfig.KubeletConfiguration, nodeName types.NodeName, ips []net.IP, hostnames []string) (certificate.Manager, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to move this into its own package (since it's not dependent on anything in the kubelet) and import it, primarily to separate it from this mega package, but also to make it easier to consume from other contexts (like tools that want to provide their own certificate management in a similar form).

Objections to a small refactor? I also want to do that for the client code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/security cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.