Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
A Kubernetes node is a machine that runs containerized workloads as part of a Kubernetes cluster. A node can be a physical machine or a virtual machine, and can be hosted on-premises or in the cloud. A Kubernetes cluster can have a large number of nodes—recent versions support up to 5,000 nodes.
There are two types of nodes:
This is part of an extensive series of guides about microservices.
A Kubernetes node is a single machine in a cluster that serves as an abstraction. Instead of managing specific physical or virtual machines, you can treat each node as pooled CPU and RAM resources on which you can run containerized workloads. When an application is deployed to the cluster, Kubernetes distributes the work across the nodes. Workloads can be moved seamlessly between nodes in the cluster.
A Kubernetes pod is the smallest unit of management in a Kubernetes cluster. A pod includes one or more containers, and operators can attach additional resources to a pod, such as storage volumes. Pods are stateless by design, meaning they are dispensable and replaced by an identical unit if one fails. A pod has its own IP, allowing pods to communicate with other pods on the same node or other nodes.
The Kubernetes Scheduler, running on the master node, is responsible for searching for eligible worker nodes for each pod and deploying it on those nodes. Each pod has a template that defines how many instances of the pod should run and on which types of nodes. When a node fails or has insufficient resources to run a pod, the pod is evicted and rerun on another node.
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better manage Kubernetes nodes:
Keep your nodes updated with the latest security patches and Kubernetes versions.
Use tools like Prometheus and Grafana to monitor node health and performance.
Control pod placement on nodes using taints and tolerations.
Define rules to influence pod scheduling based on node labels.
Distribute workloads evenly across nodes to avoid overloading.
Here are the primary software components that run on every Kubernetes node:
The kubelet is a software agent that runs on Kubernetes nodes and communicates with the cluster control plane. It allows the control plane to monitor the node, see what it is running, and deliver instructions to the container runtime.
When Kubernetes wants to schedule a pod on a specific node, it sends the pod’s PodSecs to the kubelet. The kubelet reads the details of the containers specified in the PodSpecs, pulls the images from the registry and runs the containers. From that point onwards, the kubelet is responsible for ensuring these containers are healthy and maintaining them according to the declarative configuration.
kube-proxy enables networking on Kubernetes nodes, with network rules that allow communication between pods and entities outside the Kubernetes cluster. kube-proxy either forwards traffic directly or leverages the operating system packet filtering layer.
kube-proxy can run in three different modes: iptables, ipvs, and userspace (a deprecated mode that is not recommended for use). iptables, the default mode, is suitable for clusters of moderate size, however it uses sequential network rules which can impact routing performance. ipvs can support a large number of services, as it supports parallel processing of network rules.
iptables
ipvs
userspace
The container runtime, such as Docker, containerd, or CRI-O, is a software component responsible for running containers on the node. Kubernetes does not take responsibility for stopping and starting containers, and managing basic container lifecycle. The kubelet interfaces with any container engine that supports the Container Runtime Interface (CRI), giving it instructions according to the needs of the Kubernetes cluster.
Interestingly, Kubernetes does not directly support Docker, and in recent versions Kubernetes has deprecated Docker support. The reason is that Docker does not fully support CRI. It is technically possible to run Docker with Kubernetes, but in most cases, Kubernetes runs with other, lightweight container engines that are more suitable for fully-automated operations.
You can use the kubectl command line to view the status of a Kubernetes node.
kubectl
kubectl describe node [node-name]
Here is an example of the status returned by a node:
Name: kubernetes-node-861h Role Labels: kubernetes.io/arch=amd64 kubernetes.io/os=linux kubernetes.io/hostname=kubernetes-node-861h Annotations: node.alpha.kubernetes.io/ttl=0 volumes.kubernetes.io/controller-managed-attach-detach=true Taints: CreationTimestamp: Mon, 04 Sep 2017 17:13:23 +0800 Phase: Conditions: ... Addresses: 10.240.115.55,104.197.0.26 Capacity: ... Allocatable: ... System Info: ...
The most important parts of a node status report are: Addresses, Conditions, Capacity/Allocatable, and System Info. The node status report also shows the node’s taints and tolerations, which tell the Kubernetes scheduler which nodes are more appropriate to a specific node. You can read more about node affinities, taints and tolerations below.
The Addresses section of the node status report can represent the hostname, as reported by the kernel of the node, the external IP of the node, and the internal IP that is routable within the cluster. The way these fields are displayed depends on whether the node is a bare-metal machine or a compute instance running in the cloud.
The Conditions section of the node status report looks like this:
... Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status. MemoryPressure Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status. ...
Here are some of the common conditions that appear in a node status report:
The Capacity and Allocatable sections of the node status report looks like this:
... Capacity: cpu: 2 hugePages: 0 memory: 4046788Ki pods: 110 Allocatable: cpu: 1500m hugePages: 0 memory: 1479263Ki pods: 110 ...
These parameters reflect the node’s available resources, which determine how many pods can run on the node:
The System Info section of the node status report looks like this:
... System Info: Machine ID: 8e025a21a4254e11b028584d9d8b12c4 System UUID: 349075D1-D169-4F25-9F2A-E886850C47E3 Boot ID: 5cd18b37-c5bd-4658-94e0-e436d3f110e0 Kernel Version: 4.4.0-31-generic OS Image: Debian GNU/Linux 8 (jessie) Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.5 Kubelet Version: v1.6.9+a3d1dfa6f4335 Kube-Proxy Version: v1.6.9+a3d1dfa6f4335 ExternalID: 15233045891481496305 Non-terminated Pods: (9 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 900m (60%) 2200m (146%) 1009286400 (66%) 5681286400 (375%) Events: ...
This provides useful information about hardware and software on the node, including:
Here are three criteria you can use to determine the optimal number of nodes in your Kubernetes cluster:
Kubernetes allows you to flexibly control which nodes should run your pods. It is possible to manually assign a pod to a node, but in most cases, you will define a mechanism that allows Kubernetes to dynamically assign pods to nodes. Two of these mechanisms are node selectors and node affinity.
Both node selectors and affinity are closely tied to Kubernetes labels. A label is a metadata you can attach to a Kubernetes resource, which lets you identify and manage it.
A node selector lets you specify which nodes the pod should be deployed on. The Kubernetes scheduler reads the pod template (also called pod specification), searches for eligible nodes and deploys the pod.
The simplest type of node selection is the nodeSelector field of the podSpec. It is a set of key-value pairs, which lets you define labels that the node needs to match in order to be eligible to run the pod. This is known as a label selector.
nodeSelector
podSpec
Node affinities provide an expressive language you can use to define which nodes to run a pod on. You can define:
Node affinity is conceptually similar to nodeSelector – it allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node.
Taints are the opposite of affinity – a taint is like defining that a node “doesn’t like” a certain set of pods and those pods will, if possible, not schedule on the node. A node can have one or more taints defined on it.
You can define tolerations in pods templates, to indicate that despite a taint, you want to allow – not require – the pod to run on nodes that have a matching “taint”.
You can taints and tolerations to ensure pods are not scheduled onto nodes that are not appropriate for them.
Kubernetes node errors indicate an issue on a machine participating in a Kubernetes cluster, which can affect its ability to run and manage pods. Below are two common errors and what you can do about them.
If a node has a NotReady status for over five minutes, the status of pods running on it becomes Unknown, and new pods fail with ContainerCreating error.
NotReady
Unknown
ContainerCreating
How to identify the issue
Resolving the issueIn some cases, this issue will be resolved on its own if the node is able to recover or the user reboots it. If this doesn’t happen, you can remove the failed node from the cluster using the kubectl delete node command.
kubectl delete node
Learn more about Node Not Ready issues in Kubernetes.
This error indicates that kubelet is not running properly on the node, so it cannot participate in the Kubernetes cluster.
How to identify the issueRun systemctl status kubelet and look for the message node [node-name] not found
systemctl status kubelet
node [node-name] not found
Resolving the issueA common way to resolve this issue is to reset the node using the kubeadm reset command, use kubeadm to recreate a token, and then use the new token in a kubectl join command.
kubeadm reset command
kubeadm
kubectl join
Kubernetes troubleshooting relies on the ability to quickly contextualize the problem with what’s happening in the rest of the cluster. More often than not, you will be conducting your investigation during fires in production. The major challenge is correlating service-level incidents with other events happening in the underlying infrastructure.
Komodor can help with our new ‘Node Status’ view, built to pinpoint correlations between service or deployment issues and changes in the underlying node infrastructure. With this view you can quickly:
Beyond node error remediations, Komodor can help troubleshoot a variety of Kubernetes errors and issues, acting as a single source of truth (SSOT) for all of your K8s troubleshooting needs. Komodor provides:
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of microservices.
Authored by NetApp
Share:
and start using Komodor in seconds!