kubeadm init
and kubeadm join
together provides a nice user experience for creating a best-practice but bare Kubernetes cluster from scratch.
However, it might not be obvious how kubeadm does that.
This document strives to explain the phases of work that happen under the hood. Also included is ComponentConfiguration API types for talking to kubeadm programmatically.
Note: Each and every one of the phases must be idempotent!
The scope of kubeadm init
and kubeadm join
is to provide a smooth user experience for the user while bootstrapping a best-practice cluster.
The cluster that kubeadm init
and kubeadm join
set up should be:
- Secure:
- It should adopt latest best-practices like:
- enforcing RBAC
- using the Node Authorizer
- using secure communication between the control plane components
- using secure communication between the API server and the kubelets
- making it possible to lock-down the kubelet API
- locking down access to the API system components like the kube-proxy and kube-dns
- locking down what a Bootstrap Token can access
- It should adopt latest best-practices like:
- Easy to use:
- The user should not have to run anything more than a couple of commands, including:
kubeadm init
on the masterexport KUBECONFIG=/etc/kubernetes/admin.conf
kubectl apply -f <network-of-choice.yaml>
kubeadm join --token <token> <master-ip>:<master-port>
- The
kubeadm join
request to add a node should be automatically approved
- The user should not have to run anything more than a couple of commands, including:
- Extendable:
- It should for example not favor any network provider, instead configuring a network is out-of-scope
- Should provide the possibility to use a config file for customizing various parameters
We have to draw the line somewhere about what should be configurable, what shouldn't, and what should be hard-coded in the binary.
We've decided to make the Kubernetes directory /etc/kubernetes
a constant in the application, since it is clearly the given path in a majority of cases,
and the most intuitive location. Having that path configurable would confuse readers of an on-top-of-kubeadm-implemented deployment solution.
This means we aim to standardize:
/etc/kubernetes/manifests
as the path where kubelet should look for static Pod manifests- Temporarily when bootstrapping, these manifests are present:
etcd.yaml
kube-apiserver.yaml
kube-controller-manager.yaml
kube-scheduler.yaml
- Temporarily when bootstrapping, these manifests are present:
/etc/kubernetes/kubelet.conf
as the path where the kubelet should store its credentials to the API server./etc/kubernetes/admin.conf
as the path from where the admin can fetch his/her superuser credentials.- Names of certificates files:
ca.crt
,ca.key
(CA certificate)apiserver.crt
,apiserver.key
(API server certificate)apiserver-kubelet-client.crt
,apiserver-kubelet-client.key
(client certificate for the apiservers to connect to the kubelets securely)sa.pub
,sa.key
(a private key for signing ServiceAccount )front-proxy-ca.crt
,front-proxy-ca.key
(CA for the front proxy)front-proxy-client.crt
,front-proxy-client.key
(client cert for the front proxy client)
- Names of kubeconfig files:
admin.conf
kubelet.conf
(bootstrap-kubelet.conf
during TLS bootstrap)controller-manager.conf
scheduler.conf
kubeadm init
internal workflow consists of a sequence of atomic work tasks to perform.
The kubeadm alpha phase
command allows users to invoke individually each task, and ultimately offers a reusable and composable API/toolbox that can be used by other Kubernetes bootstrap tools / by any IT automation tool / by advanced user for creating custom clusters.
kubeadm
executes a set of preflight checks before starting the init, with the aim to verify preconditions and avoid common cluster startup problems.
In any case the user can skip specific preflight checks (or eventually all preflight checks) with the --ignore-preflight-errors
option.
- [warning] If the Kubernetes version to use (passed with the
--kubernetes-version
flag) is at least one minor version higher than the kubeadm CLI version. - Kubernetes system requirements:
- if running on linux:
- [error] if not Kernel 3.10+ or 4+ with specific KernelSpec.
- [error] if required cgroups subsystem aren't in set up.
- if using docker:
- [error/warning] if Docker endpoint does not exist or does not work, if docker version >17.03. Note: starting from 1.9, kubeadm provides better support for CRI-generic functionality; in that case, docker specific controls are skipped or replaced by similar controls for crictl
- if running on linux:
- [error] if user is not root.
- [error] if hostname is not a valid DNS subdomain; [warning] if the host name cannot be reached via network lookup.
- [error] if kubelet version is lower that the minimum kubelet version supported by kubeadm (current minor -1).
- [error] if kubelet version is at least one minor higher than the required controlplane version (unsupported version skew).
- [warning] if kubelet service does not exist, if it is disabled.
- [warning] if firewalld is active.
- [error] if API.BindPort or ports 10250/10251/10252 are used.
- [Error] if /etc/kubernetes/manifest folder already exists and it is not empty.
- [Error] if /proc/sys/net/bridge/bridge-nf-call-iptables file does not exist/does not contain 1.
- [Error] if swap is on.
- [Error] if "ip", "iptables", "mount", "nsenter" commands are not present in the command path.
- [warning] if "ebtables", "ethtool", "socat", "tc", "touch", "crictl" commands are not present in the command path.
- [warning] if extra arg flags for API server, controller manager, scheduler contains some invalid options.
- [warning] if connection to https://API.AdvertiseAddress:API.BindPort goes thought proxy.
- [warning] if connection to services subnet goes thought proxy (only first address checked).
- [warning] if connection to Pods subnet goes thought proxy (only first address checked).
- If using docker:
- [warning/error] if Docker service does not exist, if it is disabled, if it is not active.
- If using other cri engine:
- [error] if crictl socket does not answer.
- If external etcd is provided:
- [Error] if etcd version less than 3.0.14.
- [Error] if certificates or keys are specified, but not provided.
- If external etcd is NOT provided:
- [Error] if ports 2379 is used.
- [Error] if Etcd.DataDir folder already exists and it is not empty.
- If authorization mode is ABAC, [Error] if abac_policy.json does not exist.
- If authorization mode is WebHook, [Error] if webhook_authz.conf does not exist.
- [Error] if advertise address is ipv6 and /proc/sys/net/bridge/bridge-nf-call-ip6tables does not exist/does not contain 1.
Please note that:
- Preflight checks can be invoked individually with the
kubeadm alpha phase preflight
command.
kubeadm
generates certificate and private key pairs for different purposes.
Certificates are stored by default in /etc/kubernetes/pki
. This directory is configurable.
There should be:
- A CA certificate (
ca.crt
) with its private key (ca.key
). - An API server certificate (
apiserver.crt
) usingca.crt
as the CA with its private key (apiserver.key
). The certificate should:- Be a serving server certificate (
x509.ExtKeyUsageServerAuth
). - Contains altnames for:
- The Kubernetes service's internal clusterIP (the first address in the services CIDR, e.g.
10.96.0.1
if service subnet is10.96.0.0/12
). - Kubernetes DNS names (e.g.
kubernetes.default.svc.cluster.local
if--service-dns-domain
iscluster.local
,kubernetes.default.svc
,kubernetes.default
,kubernetes
). - The node-name.
- The
--apiserver-advertise-address
. - Optional extra altnames specified by the user.
- The Kubernetes service's internal clusterIP (the first address in the services CIDR, e.g.
- Be a serving server certificate (
- A client certificate for the API server to connect to the kubelets securely (
apiserver-kubelet-client.crt
) usingca.crt
as the CA with its private key (apiserver-kubelet-client.key
). The certificate should:- Be a client certificate (
x509.ExtKeyUsageClientAuth
). - Be in the
system:masters
organization.
- Be a client certificate (
- A private key for signing ServiceAccount Tokens (
sa.key
) along with its public key (sa.pub
). - A CA for the front proxy (
front-proxy-ca.crt
) with its key (front-proxy-ca.key
). - A client cert for the front proxy client with its key (
front-proxy-client.crt
andfront-proxy-client.key
) generate usingfront-proxy-ca.crt
as the CA.
Please note that:
- If a given certificate and private key pair both exist, and its content is evaluated compliant with the above specs, the existing files will be used and the generation phase for the given certificate skipped.
This means the user can, for example, copy an existing CA to
/etc/kubernetes/pki/ca.{crt,key}
, and then thenkubeadm
will use those files for signing the rest of the certs. - Only for the CA, it is possible to provide the
ca.crt
file but not theca.key
file, if all other certificates and kubeconfig files already are in placekubeadm
recognize this condition and activates the ExternalCA , which also implies thecsrsigner
controller in controller-manager won't be started. - If
kubeadm
is running in "ExternalCA" mode; all the certificates must be provided as well, becausekubeadm
cannot generate them by itself. - In case of
kubeadm
executed in the--dry-run
mode, certificates files are written in a temporary folder. - Certificate generation can be invoked individually with the
kubeadm alpha phase certs all
command.
There should be:
- A kubeconfig file for kubeadm to use itself and the admin,
/etc/kubernetes/admin.conf
:- The "admin" here is defined as
kubeadm
itself and the actual person(s) that is administering the cluster and want to control the cluster:- With this file, the admin has full control (root) over the cluster.
- Inside this file, a client certificate is generated from the
ca.crt
andca.key
. The client cert should:- Be a client certificate (
x509.ExtKeyUsageClientAuth
). - Be in the
system:masters
organization. - Include a CN, but that can be anything.
kubeadm
uses thekubernetes-admin
CN.
- Be a client certificate (
- The "admin" here is defined as
- A kubeconfig file for kubelet to use,
/etc/kubernetes/kubelet.conf
:- Inside this file, a client certificate is generated from the
ca.crt
andca.key
. The client cert should:- Be a client certificate (
x509.ExtKeyUsageClientAuth
). - Be in the
system:nodes
organization. - Have the CN
system:node:<hostname-lowercased>
.
- Be a client certificate (
- Inside this file, a client certificate is generated from the
- A kubeconfig file for controller-manager,
/etc/kubernetes/controller-manager.conf
:- Inside this file, a client certificate is generated from the
ca.crt
andca.key
. The client cert should:- Be a client certificate (
x509.ExtKeyUsageClientAuth
). - Have the CN
system:kube-controller-manager
.
- Be a client certificate (
- Inside this file, a client certificate is generated from the
- A kubeconfig file for scheduler,
/etc/kubernetes/scheduler.conf
:- Inside this file, a client certificate is generated from the
ca.crt
andca.key
. The client cert should:- Be a client certificate (
x509.ExtKeyUsageClientAuth
). - Have the CN
system:kube-scheduler
.
- Be a client certificate (
- Inside this file, a client certificate is generated from the
Please note that:
ca.crt
is embedded in all the kubeconfig files.- If a given kubeconfig file exists, and its content is evaluated compliant with the above specs, the existing file will be used and the generation phase for the given kubeconfig skipped.
- If
kubeadm
is running in ExternalCA mode, all the required kubeconfig must be provided by the user as well, becausekubeadm
cannot generate any of them by itself. - In case of
kubeadm
executed in the--dry-run
mode, kubeconfig files are written in a temporary folder. - Kubeconfig files generation can be invoked individually with the
kubeadm alpha phase kubeconfig all
command.
Common properties for the control plane components:
- All static Pods are deployed on
kube-system
namespace. - All static Pods get
tier:control-plane
andcomponent:{component-name}
labels. - All static Pods get
scheduler.alpha.kubernetes.io/critical-pod
annotation. Note. this will be moved over to the proper solution of using Pod Priority and Preemption when ready. hostNetwork: true
is set on all static Pods to allow control plane startup before a network is configured; accordingly:- The
address
that the controller-manager and the scheduler use to refer the API server is127.0.0.1
. - If using a local etcd server,
etcd-servers
address will be set to127.0.0.1:2379
.
- The
- Leader election is enabled for both the controller-manager and the scheduler.
- Controller-manager and the scheduler will reference kubeconfig files with their respective, unique identities.
- All static Pods get any extra flags specified by the user.
- All static Pods get any extra extra Volumes specified by the user (Host path).
Please note that:
- All the images, for the
--kubernetes-version
/current architecture, will be pulled fromgcr.io/google_containers
; In case an alternative image repository or CI image repository is specified this one will be used; In case a specific container image should be used for all control plane components, this one will be used. - In case of
kubeadm
executed in the--dry-run
mode, static Pods files are written in a temporary folder. - Static Pod manifest generation for master components can be invoked individually with the
kubeadm alpha phase controlplane all
command.
The API server needs to know this in particular:
- The
apiserver-advertise-address
andapiserver-bind-port
to bind to (if not provided, those value defaults to the IP address of the default interface and port 6443). - The
service-cluster-ip-range
to use for services. - The
etcd-servers
address and related TLS settingsetcd-cafile
,etcd-certfile
,etcd-keyfile
if required; if an external etcd server won't be provided, a local etcd will be used (via host network). - If a cloud provider is specified, the corresponding
--cloud-provider
is configured, together with the--cloud-config
path if such file exists. Note: this is experimental, alpha and will be removed in a future version - If kubeadm is invoked with
--feature-gates=HighAvailability
, the flag--endpoint-reconciler-type=lease
is set, thus enabling automatic reconciliation of endpoints for the internal API server VIP. - If kubeadm is invoked with
--feature-gates=DynamicKubeletConfig
, the corresponding feature on kube-apiserver is activated with--feature-gates=DynamicKubeletConfig=true
flag.
Other flags that are set unconditionally:
--insecure-port=0
to avoid insecure connections to the API server.--enable-bootstrap-token-auth=true
to enable theBootstrapTokenAuthenticator
authentication module.--allow-privileged
totrue
(used e.g. by kube proxy).--client-ca-file
toca.crt
.--tls-cert-file
toapiserver.crt
.--tls-private-key-file
toapiserver.key
.--kubelet-client-certificate
toapiserver-kubelet-client.crt
.--kubelet-client-key
toapiserver-kubelet-client.key
.--service-account-key-file
tosa.pub
.--requestheader-client-ca-file
tofront-proxy-ca.crt
.--admission-control
to:Initializers
to enable Dynamic Admission Control.NamespaceLifecycle
to avoid deletion of system reserved namespaces etc.LimitRanger
andResourceQuota
to enforce limits on namespacesServiceAccount
to enforce service account automation.PersistentVolumeLabel
attaches region or zone labels to PersistentVolumes as defined by the cloud provider. Note: This admission controller is deprecated and will be removed in a future version. It is not deployed by kubeadm by default with v1.9 onwards when not explicitly opting into usinggce
oraws
as cloud providers..DefaultStorageClass
to enforce default storage class onPersistentVolumeClaim
objects.DefaultTolerationSeconds
.NodeRestriction
to limit what a kubelet can modify (e.g. its own pods).
--kubelet-preferred-address-types
toInternalIP,ExternalIP,Hostname;
this makeskubectl logs
and other apiserver -> kubelet communication work in environments where the hostnames of the nodes aren't resolvable.requestheader-client-ca-file
tofront-proxy-ca.crt
,proxy-client-cert-file
tofront-proxy-client.crt
,proxy-client-key-file
tofront-proxy-client.key
, and--requestheader-username-headers=X-Remote-User
,--requestheader-group-headers=X-Remote-Group
,--requestheader-extra-headers-prefix=X-Remote-Extra-
,--requestheader-allowed-names=front-proxy-client
so the front proxy (API Aggregation) communication is secure.
The controller-manager needs to know this in particular:
- If kubeadm is invoked specifying a
--pod-network-cidr
, the Subnet manager feature required for some CNI network plugins is enabled by setting--allocate-node-cidrs=true
,--cluster-cidr
and--node-cidr-mask-size
flags are set. - If a cloud provider is specified, the corresponding
--cloud-provider
is specified, together with the--cloud-config
path if such configuration file exists. Note: this is experimental, alpha and will be removed in a future version
Other flags that are set unconditionally:
- Enables all the default controllers plus
BootstrapSigner
andTokenCleaner
controllers for TLS bootstrap. --root-ca-file
toca.crt
.--cluster-signing-cert-file
toca.crt
, if External CA mode is disabled, otherwise to""
.--cluster-signing-key-file
toca.key
, if External CA mode is disabled, otherwise to""
.--service-account-private-key-file
tosa.key
.--use-service-account-credentials
totrue
.
Kubeadm doesn't set any special scheduler flags.
If the user specified an external etcd this step will be skipped, otherwise a static manifest file will be generated for creating a local etcd instance running in a Pod with following attributes:
- listen on
localhost:2379
and useHostNetwork=true
. - make a
hostPath
mount out from thedataDir
to the host's filesystem. - Any extra flags specified by the user.
Please note that:
- The etcd image will be pulled from
gcr.io/google_containers
. In case an alternative image repository is specified this one will be used; In case an alternative image name is specified, this one will be used. - in case of
kubeadm
executed in the--dry-run
mode, the etcd static Pod manifest is written in a temporary folder. - Static Pod manifest generation for local etcd can be invoked individually with the
kubeadm alpha phase etcd local
command.
If kubeadm is invoked with --feature-gates=DynamicKubeletConfig
, it writes the kubelet init configuration into /var/lib/kubelet/config/init/kubelet
file.
The init configuration is used for starting the kubelet on this specific node, providing an alternative for the kubelet drop-in file; such configuration will be replaced by the kubelet base configuration as described in following steps. See online doc for additional info.
Please note that:
- to make dynamic kubelet configuration work, flag
--dynamic-config-dir=/var/lib/kubelet/config/dynamic
should be specified in/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
. - kubelet init configuration can be changed by using kubeadm MasterConfiguration file (
.kubeletConfiguration.baseConfig
).
This is a critical moment in time for kubeadm clusters.
kubeadm waits until localhost:6443/healthz
returns ok
, however in order to detect deadlock conditions, kubeadm fails fast if localhost:10255/healthz
(kubelet liveness) or localhost:10255/healthz/syncloop
(kubelet readiness) don't return ok
, respectively after 40 and 60 second.
kubeadm relies on the kubelet to pull the control plane images and run them properly as static Pods. But there are (as we've seen) a lot of things that can go wrong. Most of them are network/resolv.conf/proxy related.
After the control plane is up, kubeadm completes a couple of tasks described in following paragraphs.
If kubeadm is invoked with --feature-gates=DynamicKubeletConfig
:
- Write the kubelet base configuration into the
kubelet-base-config-v1.9
ConfigMap in thekube-system
namespace. - Creates RBAC rules for granting read access to that ConfigMap to all bootstrap tokens and all kubelets (
system:bootstrappers:kubeadm:default-node-token
andsystem:nodes
groups). - Enable the dynamic kubelet configuration feature for the initial master node by pointing Node.spec.configSource to the newly-created configmap.
kubeadm saves the configuration passed to kubeadm init
, either via flags or the config file, in a ConfigMap named kubeadm-config
under kube-system
namespace.
This will ensure that kubeadm actions executed in future (e.g kubeadm upgrade
) will be able to determine the actual/current cluster state and make new decisions based on that data.
Please note that
- Before uploading, sensitive information like e.g. the token are stripped from the configuration.
- Upload of master configuration can be invoked individually with the
kubeadm alpha phase upload-config
command. - If you initialized your cluster using kubeadm v1.7.x or lower, you must create manually the master configuration ConfigMap before
kubeadm upgrade
to v1.8 . In order to facilitate this task, thekubeadm config upload (from-flags|from-file)
was implemented.
As soon as the control plane is available, kubeadm executes following actions:
- Label the master with
node-role.kubernetes.io/master=""
- Taints the master with
node-role.kubernetes.io/master:NoSchedule
Please note that
- Mark master phase can be invoked individually with the
kubeadm alpha phase mark-master
command.
Kubeadm uses Authenticating with Bootstrap Tokens for joining new nodes to an existing cluster; for more details see also design proposal.
kubeadm init
ensures that everything is properly configured for this process, and this includes following steps as well as setting API server and controller flags as already described in previous paragraphs.
Please note that
- TLS bootstrapping for nodes can be configured with the
kubeadm alpha phase bootstrap-token all
command, executing configuration steps described in following paragraphs; alternatively, each step can be invoked individually.
kubeadm init
create a first bootstrap token, either generated automatically or provided by the user with the --token
flag; as documented in bootstrap token specification, token should be saved as secrets with name bootstrap-token-<token-id>
under kube-system
namespace.
Please note that
- The default token created by
kubeadm init
will be used to validate temporary user during TLS bootstrap process; those users will be member ofsystem:bootstrappers:kubeadm:default-node-token
group. - The token has a limited validity, default 24 hours (the interval may be changed with the
—token-ttl
flag) - Additional tokens can be created with the
kubeadm token
command, that provide as well other useful functions for token management .
kubeadm ensure that users in system:bootstrappers:kubeadm:default-node-token
group are able to access the certificate signing API.
This is implemented by creating a ClusterRoleBinding named kubeadm:kubelet-bootstrap
between the group above and the default RBAC role system:node-bootstrapper
.
kubeadm ensures that the Bootstrap Token will get its CSR request automatically approved by the csrapprover controller.
This is implemented by creating ClusterRoleBinding named kubeadm:node-autoapprove-bootstrap
between the system:bootstrappers:kubeadm:default-node-token
group and the default role system:certificates.k8s.io:certificatesigningrequests:nodeclient
.
The role system:certificates.k8s.io:certificatesigningrequests:nodeclient
should be created as well, granting POST permission to /apis/certificates.k8s.io/certificatesigningrequests/nodeclient
(in v1.8 role will be automatically created by default).
kubeadm ensures that certificate rotation is enabled for nodes, and that new certificate request for nodes will get its CSR request automatically approved by the csrapprover controller.
This is implemented by creating ClusterRoleBinding named kubeadm:node-autoapprove-certificate-rotation
between the system:nodes
group and the default role system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
.
This phase creates the cluster-info
ConfigMap in the kube-public
namespace.
Additionally it is created a role and a RoleBinding granting access for to the ConfigMap for unauthenticated users (i.e. users in RBAC group system:unauthenticated
)
Please note that
- The access to the
cluster-info
ConfigMap is not rate-limited. This may or may not be a problem if you expose your master to the internet; worst-case scenario here is a DoS attack where an attacker uses all the in-flight requests the kube-apiserver can handle to serving thecluster-info
ConfigMap.
A ServiceAccount for kube-proxy
is created in the kube-system
namespace; then kube-proxy is deployed as a DaemonSet:
- the credentials (
ca.crt
andtoken
) to the master come from the ServiceAccount - the location of the master comes from a ConfigMap
- the
kube-proxy
ServiceAccount is bound to the privileges in thesystem:node-proxier
ClusterRole
Please note that
- This phase can be invoked individually with the
kubeadm alpha phase addon kube-proxy
command.
A ServiceAccount for kube-dns
is created in the kube-system
namespace.
Deploy the kube-dns Deployment and Service:
- it's the upstream kube-dns deployment relatively unmodified
- the
kube-dns
ServiceAccount is bound to the privileges in thesystem:kube-dns
ClusterRole
Please note that
- If kubeadm is invoked with
--feature-gates=CoreDNS
, CoreDNS is installed instead ofkube-dns
. - This phase can be invoked individually with the
kubeadm alpha phase addon kube-dns
command.
This phase is performed only if kubeadm init
is invoked with —features-gates=self-hosting
The self hosting phase basically replaces static Pods for control plane components with DaemonSets; this is achieved by executing following procedure for API server, scheduler and controller manager static Pods:
- Load the static Pod specification from disk
- Extract the PodSpec from that static Pod specification
- Mutate the PodSpec to be compatible with self-hosting, and more in detail:
- add node selector attribute targeting nodes with
node-role.kubernetes.io/master=""
label, - add a toleration for
node-role.kubernetes.io/master:NoSchedule
taint, - set
spec.DNSPolicy
toClusterFirstWithHostNet
- add node selector attribute targeting nodes with
- Build a new DaemonSet object for the self-hosted component in question. Use the above mentioned PodSpec
- Create the DaemonSet resource in
kube-system
namespace. Wait until the Pods are running. - Remove the static Pod manifest file. The kubelet will stop the original static Pod-hosted component that was running
Please note that:
-
Self hosting is not yet resilient to node restarts; this can be fixed with external checkpointing or with kubelet checkpointing for the control plane Pods.
-
If invoked with
—features-gates=StoreCertsInSecrets
following additional steps will be executed-
creation of
ca
,apiserver
,apiserver-kubelet-client
,sa
,front-proxy-ca
,front-proxy-client
TLS secrets inkube-system
namespace with respective certificates and keysNote: Please note that storing the CA key in a Secret might have security implications.
-
creation of
scheduler.conf
andcontroller-manager.conf
secrets inkube-system
namespace with respective kubeconfig files -
mutation of all the Pod specs by replacing host path volumes with projected volumes from the secrets above
-
-
This phase can be invoked individually with the
kubeadm alpha phase selfhosting convert-from-staticpods
command.
Similarly to kubeadm init
, also kubeadm join
internal workflow consists of a sequence of atomic work tasks to perform.
This is split into discovery (having the Node trust the Kubernetes Master) and TLS bootstrap (having the Kubernetes Master trust the Node).
see Authenticating with Bootstrap Tokens , design proposal.
kubeadm
executes a set of preflight checks before starting the join, with the aim to verify preconditions and avoid common cluster startup problems.
Please note that:
kubeadm join
preflight checks are basically a subsetkubeadm init
preflight checks- Starting from 1.9, kubeadm provides better support for CRI-generic functionality; in that case, docker specific controls are skipped or replaced by similar controls for crictl.
- Starting from 1.9, kubeadm provides support for joining nodes running on Windows; in that case, linux specific controls are skipped.
- In any case the user can skip specific preflight checks (or eventually all preflight checks) with the
--ignore-preflight-errors
option.
There are 2 main schemes for discovery. The first is to use a shared token along with the IP address of the API server. The second is to provide a file (a subset of the standard kubeconfig file).
If kubeadm join
is invoked with --discovery-token
, token discovery is used; in this case the node basically retrieves the cluster CA certificates from the cluster-info
ConfigMap in the kube-public
namespace.
In order to prevent "man in the middle" attacks, several steps are taken:
- First, the CA certificate is retrieved via insecure connection (note: this is possible because
kubeadm init
granted access tocluster-info
users forsystem:unauthenticated
) - Then the CA certificate goes through following validation steps:
- "Basic validation", using the token ID against a JWT signature
- "Pub key validation", using provided
--discovery-token-ca-cert-hash
. This value is available in the output of "kubeadm init" or can be calculated using standard tools (the hash is calculated over the bytes of the Subject Public Key Info (SPKI) object as in RFC7469). The--discovery-token-ca-cert-hash flag
may be repeated multiple times to allow more than one public key. - as a additional validation, the CA certificate is retrieved via secure connection and then compared with the CA retrieved initially
Please note that:
- "Pub key validation" can be skipped passing
--discovery-token-unsafe-skip-ca-verification
flag; This weakens the kubeadm security model since others can potentially impersonate the Kubernetes Master.
If kubeadm join
is invoked with --discovery-file
, file discovery is used; this file can be a local file or downloaded via an HTTPS URL; in case of HTTPS, the host installed CA bundle is used to verify the connection.
With file discovery, the cluster CA certificates is provided into the file itself; in fact, the discovery file is a kubeconfig file with only server
and certificate-authority-data
attributes set, e.g.:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: <really long certificate data>
server: https://10.138.0.2:6443
name: ""
contexts: []
current-context: ""
kind: Config
preferences: {}
users: []
Finally, when the connection with the cluster is established, kubeadm try to access the cluster-info
ConfigMap, and if available, uses it.
Once the cluster info are known, the file bootstrap-kubelet.conf
is written, allowing kubelet to do TLS Bootstrapping (conversely in v.1.7 TLS were managed by kubeadm).
The TLS bootstrap mechanism uses the shared token to temporarily authenticate with the Kubernetes Master to submit a certificate signing request (CSR) for a locally created key pair.
The request is then automatically approved and the operation completes saving ca.crt
file and kubelet.conf
file to be used by kubelet for joining the cluster, whilebootstrap-kubelet.conf
is deleted.
Please note that:
- The temporary authentication is validated against the token saved during the
kubeadm init
process (or with additional tokens created withkubeadm token
) - The temporary authentication resolve to a user member of
system:bootstrappers:kubeadm:default-node-token
group which was granted access to CSR api during thekubeadm init
process - The automatic CSR approval is managed by the csrapprover controller, according with configuration done the
kubeadm init
process
If kubeadm is invoked with --feature-gates=DynamicKubeletConfig
:
- Read the kubelet base configuration from the
kubelet-base-config-v1.9
ConfigMap in thekube-system
namespace using the Bootstrap Token credentials, and write it to disk as kubelet init configuration file/var/lib/kubelet/config/init/kubelet
. - As soon as kubelet starts with the Node's own credential (
/etc/kubernetes/kubelet.conf
), update current node configuration specifying that the source for the node/kubelet configuration is the above ConfigMap.
Please note that:
- to make dynamic kubelet configuration work, flag
--dynamic-config-dir=/var/lib/kubelet/config/dynamic
should be specified in/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
.
There are two primary ways to extend kubeadm
:
- By setting CLI arguments or editing the lightweight
kubeadm init
API. - By running the phases you need separately and giving every phase the arguments it needs
The kubeadm init
and kubeadm join
APIs respectively are very limited in scope by design; That is where kubeadm alpha phase
comes in, which gives you full power of the cluster creation.