You are viewing documentation for Kubernetes version: v1.31
Kubernetes v1.31 documentation is no longer actively maintained. The version you are currently viewing is a static snapshot. For up-to-date information, see the latest version.
Configure a Security Context for a Pod or Container
A security context defines privilege and access control settings for a Pod or Container. Security context settings include, but are not limited to:
-
Discretionary Access Control: Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).
-
Security Enhanced Linux (SELinux): Objects are assigned security labels.
-
Running as privileged or unprivileged.
-
Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.
-
AppArmor: Use program profiles to restrict the capabilities of individual programs.
-
Seccomp: Filter a process's system calls.
-
allowPrivilegeEscalation
: Controls whether a process can gain more privileges than its parent process. This bool directly controls whether theno_new_privs
flag gets set on the container process.allowPrivilegeEscalation
is always true when the container:- is run as privileged, or
- has
CAP_SYS_ADMIN
-
readOnlyRootFilesystem
: Mounts the container's root filesystem as read-only.
The above bullets are not a complete set of security context settings -- please see SecurityContext for a comprehensive list.
Before you begin
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
To check the version, enterkubectl version
.
Set the security context for a Pod
To specify security settings for a Pod, include the securityContext
field
in the Pod specification. The securityContext
field is a
PodSecurityContext object.
The security settings that you specify for a Pod apply to all Containers in the Pod.
Here is a configuration file for a Pod that has a securityContext
and an emptyDir
volume:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
supplementalGroups: [4000]
volumes:
- name: sec-ctx-vol
emptyDir: {}
containers:
- name: sec-ctx-demo
image: busybox:1.28
command: [ "sh", "-c", "sleep 1h" ]
volumeMounts:
- name: sec-ctx-vol
mountPath: /data/demo
securityContext:
allowPrivilegeEscalation: false
In the configuration file, the runAsUser
field specifies that for any Containers in
the Pod, all processes run with user ID 1000. The runAsGroup
field specifies the primary group ID of 3000 for
all processes within any containers of the Pod. If this field is omitted, the primary group ID of the containers
will be root(0). Any files created will also be owned by user 1000 and group 3000 when runAsGroup
is specified.
Since fsGroup
field is specified, all processes of the container are also part of the supplementary group ID 2000.
The owner for volume /data/demo
and any files created in that volume will be Group ID 2000.
Additionally, when the supplementalGroups
field is specified, all processes of the container are also part of the
specified groups. If this field is omitted, it means empty.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context.yaml
Verify that the Pod's Container is running:
kubectl get pod security-context-demo
Get a shell to the running Container:
kubectl exec -it security-context-demo -- sh
In your shell, list the running processes:
ps
The output shows that the processes are running as user 1000, which is the value of runAsUser
:
PID USER TIME COMMAND
1 1000 0:00 sleep 1h
6 1000 0:00 sh
...
In your shell, navigate to /data
, and list the one directory:
cd /data
ls -l
The output shows that the /data/demo
directory has group ID 2000, which is
the value of fsGroup
.
drwxrwsrwx 2 root 2000 4096 Jun 6 20:08 demo
In your shell, navigate to /data/demo
, and create a file:
cd demo
echo hello > testfile
List the file in the /data/demo
directory:
ls -l
The output shows that testfile
has group ID 2000, which is the value of fsGroup
.
-rw-r--r-- 1 1000 2000 6 Jun 6 20:08 testfile
Run the following command:
id
The output is similar to this:
uid=1000 gid=3000 groups=2000,3000,4000
From the output, you can see that gid
is 3000 which is same as the runAsGroup
field.
If the runAsGroup
was omitted, the gid
would remain as 0 (root) and the process will
be able to interact with files that are owned by the root(0) group and groups that have
the required group permissions for the root (0) group. You can also see that groups
contains the group IDs which are specified by fsGroup
and supplementalGroups
,
in addition to gid
.
Exit your shell:
exit
Implicit group memberships defined in /etc/group
in the container image
By default, kubernetes merges group information from the Pod with information defined in /etc/group
in the container image.
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
supplementalGroups: [4000]
containers:
- name: sec-ctx-demo
image: registry.k8s.io/e2e-test-images/agnhost:2.45
command: [ "sh", "-c", "sleep 1h" ]
securityContext:
allowPrivilegeEscalation: false
This Pod security context contains runAsUser
, runAsGroup
and supplementalGroups
.
However, you can see that the actual supplementary groups attached to the container process
will include group IDs which come from /etc/group
in the container image.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context-5.yaml
Verify that the Pod's Container is running:
kubectl get pod security-context-demo
Get a shell to the running Container:
kubectl exec -it security-context-demo -- sh
Check the process identity:
$ id
The output is similar to this:
uid=1000 gid=3000 groups=3000,4000,50000
You can see that groups
includes group ID 50000
. This is because the user (uid=1000
),
which is defined in the image, belongs to the group (gid=50000
), which is defined in /etc/group
inside the container image.
Check the /etc/group
in the container image:
$ cat /etc/group
You can see that uid 1000
belongs to group 50000
.
...
user-defined-in-image:x:1000:
group-defined-in-image:x:50000:user-defined-in-image
Exit your shell:
exit
Note:
Implicitly merged supplementary groups may cause security problems particularly when accessing the volumes (see kubernetes/kubernetes#112879 for details). If you want to avoid this. Please see the below section.Configure fine-grained SupplementalGroups control for a Pod
Kubernetes v1.31 [alpha]
(enabled by default: false)
This feature can be enabled by setting the SupplementalGroupsPolicy
feature gate for kubelet and
kube-apiserver, and setting the .spec.securityContext.supplementalGroupsPolicy
field for a pod.
The supplementalGroupsPolicy
field defines the policy for calculating the
supplementary groups for the container processes in a pod. There are two valid
values for this field:
-
Merge
: The group membership defined in/etc/group
for the container's primary user will be merged. This is the default policy if not specified. -
Strict
: Only group IDs infsGroup
,supplementalGroups
, orrunAsGroup
fields are attached as the supplementary groups of the container processes. This means no group membership from/etc/group
for the container's primary user will be merged.
When the feature is enabled, it also exposes the process identity attached to the first container process
in .status.containerStatuses[].user.linux
field. It would be useful for detecting if
implicit group ID's are attached.
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
supplementalGroups: [4000]
supplementalGroupsPolicy: Strict
containers:
- name: sec-ctx-demo
image: registry.k8s.io/e2e-test-images/agnhost:2.45
command: [ "sh", "-c", "sleep 1h" ]
securityContext:
allowPrivilegeEscalation: false
This pod manifest defines supplementalGroupsPolicy=Strict
. You can see that no group memberships
defined in /etc/group
are merged to the supplementary groups for container processes.
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context-6.yaml
Verify that the Pod's Container is running:
kubectl get pod security-context-demo
Check the process identity:
kubectl exec -it security-context-demo -- id
The output is similar to this:
uid=1000 gid=3000 groups=3000,4000
See the Pod's status:
kubectl get pod security-context-demo -o yaml
You can see that the status.containerStatuses[].user.linux
field exposes the process identitiy
attached to the first container process.
...
status:
containerStatuses:
- name: sec-ctx-demo
user:
linux:
gid: 3000
supplementalGroups:
- 3000
- 4000
uid: 1000
...
Note:
Please note that the values in thestatus.containerStatuses[].user.linux
field is the first attached
process identity to the first container process in the container. If the container has sufficient privilege
to make system calls related to process identity
(e.g. setuid(2)
,
setgid(2)
or
setgroups(2)
, etc.),
the container process can change its identity. Thus, the actual process identity will be dynamic.Implementations
The following container runtimes are known to support fine-grained SupplementalGroups control.
CRI-level:
- containerd, since v2.0
- CRI-O, since v1.31
You can see if the feature is supported in the Node status.
apiVersion: v1
kind: Node
...
status:
features:
supplementalGroupsPolicy: true
Configure volume permission and ownership change policy for Pods
Kubernetes v1.23 [stable]
By default, Kubernetes recursively changes ownership and permissions for the contents of each
volume to match the fsGroup
specified in a Pod's securityContext
when that volume is
mounted.
For large volumes, checking and changing ownership and permissions can take a lot of time,
slowing Pod startup. You can use the fsGroupChangePolicy
field inside a securityContext
to control the way that Kubernetes checks and manages ownership and permissions
for a volume.
fsGroupChangePolicy - fsGroupChangePolicy
defines behavior for changing ownership
and permission of the volume before being exposed inside a Pod.
This field only applies to volume types that support fsGroup
controlled ownership and permissions.
This field has two possible values:
- OnRootMismatch: Only change permissions and ownership if the permission and the ownership of root directory does not match with expected permissions of the volume. This could help shorten the time it takes to change ownership and permission of a volume.
- Always: Always change permission and ownership of the volume when volume is mounted.
For example:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
fsGroupChangePolicy: "OnRootMismatch"
Delegating volume permission and ownership change to CSI driver
Kubernetes v1.26 [stable]
If you deploy a Container Storage Interface (CSI)
driver which supports the VOLUME_MOUNT_GROUP
NodeServiceCapability
, the
process of setting file ownership and permissions based on the
fsGroup
specified in the securityContext
will be performed by the CSI driver
instead of Kubernetes. In this case, since Kubernetes doesn't perform any
ownership and permission change, fsGroupChangePolicy
does not take effect, and
as specified by CSI, the driver is expected to mount the volume with the
provided fsGroup
, resulting in a volume that is readable/writable by the
fsGroup
.
Set the security context for a Container
To specify security settings for a Container, include the securityContext
field
in the Container manifest. The securityContext
field is a
SecurityContext object.
Security settings that you specify for a Container apply only to
the individual Container, and they override settings made at the Pod level when
there is overlap. Container settings do not affect the Pod's Volumes.
Here is the configuration file for a Pod that has one Container. Both the Pod
and the Container have a securityContext
field:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-2
spec:
securityContext:
runAsUser: 1000
containers:
- name: sec-ctx-demo-2
image: gcr.io/google-samples/hello-app:2.0
securityContext:
runAsUser: 2000
allowPrivilegeEscalation: false
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context-2.yaml
Verify that the Pod's Container is running:
kubectl get pod security-context-demo-2
Get a shell into the running Container:
kubectl exec -it security-context-demo-2 -- sh
In your shell, list the running processes:
ps aux
The output shows that the processes are running as user 2000. This is the value
of runAsUser
specified for the Container. It overrides the value 1000 that is
specified for the Pod.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
2000 1 0.0 0.0 4336 764 ? Ss 20:36 0:00 /bin/sh -c node server.js
2000 8 0.1 0.5 772124 22604 ? Sl 20:36 0:00 node server.js
...
Exit your shell:
exit
Set capabilities for a Container
With Linux capabilities,
you can grant certain privileges to a process without granting all the privileges
of the root user. To add or remove Linux capabilities for a Container, include the
capabilities
field in the securityContext
section of the Container manifest.
First, see what happens when you don't include a capabilities
field.
Here is configuration file that does not add or remove any Container capabilities:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-3
spec:
containers:
- name: sec-ctx-3
image: gcr.io/google-samples/hello-app:2.0
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context-3.yaml
Verify that the Pod's Container is running:
kubectl get pod security-context-demo-3
Get a shell into the running Container:
kubectl exec -it security-context-demo-3 -- sh
In your shell, list the running processes:
ps aux
The output shows the process IDs (PIDs) for the Container:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 4336 796 ? Ss 18:17 0:00 /bin/sh -c node server.js
root 5 0.1 0.5 772124 22700 ? Sl 18:17 0:00 node server.js
In your shell, view the status for process 1:
cd /proc/1
cat status
The output shows the capabilities bitmap for the process:
...
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
...
Make a note of the capabilities bitmap, and then exit your shell:
exit
Next, run a Container that is the same as the preceding container, except that it has additional capabilities set.
Here is the configuration file for a Pod that runs one Container. The configuration
adds the CAP_NET_ADMIN
and CAP_SYS_TIME
capabilities:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-4
spec:
containers:
- name: sec-ctx-4
image: gcr.io/google-samples/hello-app:2.0
securityContext:
capabilities:
add: ["NET_ADMIN", "SYS_TIME"]
Create the Pod:
kubectl apply -f https://k8s.io/examples/pods/security/security-context-4.yaml
Get a shell into the running Container:
kubectl exec -it security-context-demo-4 -- sh
In your shell, view the capabilities for process 1:
cd /proc/1
cat status
The output shows capabilities bitmap for the process:
...
CapPrm: 00000000aa0435fb
CapEff: 00000000aa0435fb
...
Compare the capabilities of the two Containers:
00000000a80425fb
00000000aa0435fb
In the capability bitmap of the first container, bits 12 and 25 are clear. In the second container,
bits 12 and 25 are set. Bit 12 is CAP_NET_ADMIN
, and bit 25 is CAP_SYS_TIME
.
See capability.h
for definitions of the capability constants.
Note:
Linux capability constants have the formCAP_XXX
.
But when you list capabilities in your container manifest, you must
omit the CAP_
portion of the constant.
For example, to add CAP_SYS_TIME
, include SYS_TIME
in your list of capabilities.Set the Seccomp Profile for a Container
To set the Seccomp profile for a Container, include the seccompProfile
field
in the securityContext
section of your Pod or Container manifest. The
seccompProfile
field is a
SeccompProfile object consisting of type
and localhostProfile
.
Valid options for type
include RuntimeDefault
, Unconfined
, and
Localhost
. localhostProfile
must only be set if type: Localhost
. It
indicates the path of the pre-configured profile on the node, relative to the
kubelet's configured Seccomp profile location (configured with the --root-dir
flag).
Here is an example that sets the Seccomp profile to the node's container runtime default profile:
...
securityContext:
seccompProfile:
type: RuntimeDefault
Here is an example that sets the Seccomp profile to a pre-configured file at
<kubelet-root-dir>/seccomp/my-profiles/profile-allow.json
:
...
securityContext:
seccompProfile:
type: Localhost
localhostProfile: my-profiles/profile-allow.json
Set the AppArmor Profile for a Container
To set the AppArmor profile for a Container, include the appArmorProfile
field
in the securityContext
section of your Container. The appArmorProfile
field
is a
AppArmorProfile object consisting of type
and localhostProfile
.
Valid options for type
include RuntimeDefault
(default), Unconfined
, and
Localhost
. localhostProfile
must only be set if type
is Localhost
. It
indicates the name of the pre-configured profile on the node. The profile needs
to be loaded onto all nodes suitable for the Pod, since you don't know where the
pod will be scheduled.
Approaches for setting up custom profiles are discussed in
Setting up nodes with profiles.
Note: If containers[*].securityContext.appArmorProfile.type
is explicitly set
to RuntimeDefault
, then the Pod will not be admitted if AppArmor is not
enabled on the Node. However if containers[*].securityContext.appArmorProfile.type
is not specified, then the default (which is also RuntimeDefault
) will only
be applied if the node has AppArmor enabled. If the node has AppArmor disabled
the Pod will be admitted but the Container will not be restricted by the
RuntimeDefault
profile.
Here is an example that sets the AppArmor profile to the node's container runtime default profile:
...
containers:
- name: container-1
securityContext:
appArmorProfile:
type: RuntimeDefault
Here is an example that sets the AppArmor profile to a pre-configured profile
named k8s-apparmor-example-deny-write
:
...
containers:
- name: container-1
securityContext:
appArmorProfile:
type: Localhost
localhostProfile: k8s-apparmor-example-deny-write
For more details please see, Restrict a Container's Access to Resources with AppArmor.
Assign SELinux labels to a Container
To assign SELinux labels to a Container, include the seLinuxOptions
field in
the securityContext
section of your Pod or Container manifest. The
seLinuxOptions
field is an
SELinuxOptions
object. Here's an example that applies an SELinux level:
...
securityContext:
seLinuxOptions:
level: "s0:c123,c456"
Note:
To assign SELinux labels, the SELinux security module must be loaded on the host operating system.Efficient SELinux volume relabeling
Kubernetes v1.28 [beta]
(enabled by default: true)
Note:
Kubernetes v1.27 introduced an early limited form of this behavior that was only applicable
to volumes (and PersistentVolumeClaims) using the ReadWriteOncePod
access mode.
As an alpha feature, you can enable the SELinuxMount
feature gate to widen that
performance improvement to other kinds of PersistentVolumeClaims, as explained in detail
below.
By default, the container runtime recursively assigns SELinux label to all
files on all Pod volumes. To speed up this process, Kubernetes can change the
SELinux label of a volume instantly by using a mount option
-o context=<label>
.
To benefit from this speedup, all these conditions must be met:
- The feature gates
ReadWriteOncePod
andSELinuxMountReadWriteOncePod
must be enabled. - Pod must use PersistentVolumeClaim with applicable
accessModes
and feature gates:- Either the volume has
accessModes: ["ReadWriteOncePod"]
, and feature gateSELinuxMountReadWriteOncePod
is enabled. - Or the volume can use any other access modes and both feature gates
SELinuxMountReadWriteOncePod
andSELinuxMount
must be enabled.
- Either the volume has
- Pod (or all its Containers that use the PersistentVolumeClaim) must
have
seLinuxOptions
set. - The corresponding PersistentVolume must be either:
- A volume that uses the legacy in-tree
iscsi
,rbd
orfc
volume type. - Or a volume that uses a CSI driver.
The CSI driver must announce that it supports mounting with
-o context
by settingspec.seLinuxMount: true
in its CSIDriver instance.
- A volume that uses the legacy in-tree
For any other volume types, SELinux relabelling happens another way: the container runtime recursively changes the SELinux label for all inodes (files and directories) in the volume. The more files and directories in the volume, the longer that relabelling takes.
Managing access to the /proc
filesystem
Kubernetes v1.31 [beta]
(enabled by default: false)
For runtimes that follow the OCI runtime specification, containers default to running in a mode where there are multiple paths that are both masked and read-only. The result of this is the container has these paths present inside the container's mount namespace, and they can function similarly to if the container was an isolated host, but the container process cannot write to them. The list of masked and read-only paths are as follows:
-
Masked Paths:
/proc/asound
/proc/acpi
/proc/kcore
/proc/keys
/proc/latency_stats
/proc/timer_list
/proc/timer_stats
/proc/sched_debug
/proc/scsi
/sys/firmware
/sys/devices/virtual/powercap
-
Read-Only Paths:
/proc/bus
/proc/fs
/proc/irq
/proc/sys
/proc/sysrq-trigger
For some Pods, you might want to bypass that default masking of paths. The most common context for wanting this is if you are trying to run containers within a Kubernetes container (within a pod).
The securityContext
field procMount
allows a user to request a container's /proc
be Unmasked
, or be mounted as read-write by the container process. This also
applies to /sys/firmware
which is not in /proc
.
...
securityContext:
procMount: Unmasked
Note:
SettingprocMount
to Unmasked requires the spec.hostUsers
value in the pod
spec to be false
. In other words: a container that wishes to have an Unmasked
/proc
or unmasked /sys
must also be in a
user namespace.
Kubernetes v1.12 to v1.29 did not enforce that requirement.Discussion
The security context for a Pod applies to the Pod's Containers and also to
the Pod's Volumes when applicable. Specifically fsGroup
and seLinuxOptions
are
applied to Volumes as follows:
-
fsGroup
: Volumes that support ownership management are modified to be owned and writable by the GID specified infsGroup
. See the Ownership Management design document for more details. -
seLinuxOptions
: Volumes that support SELinux labeling are relabeled to be accessible by the label specified underseLinuxOptions
. Usually you only need to set thelevel
section. This sets the Multi-Category Security (MCS) label given to all Containers in the Pod as well as the Volumes.
Warning:
After you specify an MCS label for a Pod, all Pods with the same label can access the Volume. If you need inter-Pod protection, you must assign a unique MCS label to each Pod.Clean up
Delete the Pod:
kubectl delete pod security-context-demo
kubectl delete pod security-context-demo-2
kubectl delete pod security-context-demo-3
kubectl delete pod security-context-demo-4
What's next
- PodSecurityContext
- SecurityContext
- CRI Plugin Config Guide
- Security Contexts design document
- Ownership Management design document
- PodSecurity Admission
- AllowPrivilegeEscalation design document
- For more information about security mechanisms in Linux, see Overview of Linux Kernel Security Features (Note: Some information is out of date)
- Read about User Namespaces for Linux pods.
- Masked Paths in the OCI Runtime Specification
Items on this page refer to third party products or projects that provide functionality required by Kubernetes. The Kubernetes project authors aren't responsible for those third-party products or projects. See the CNCF website guidelines for more details.
You should read the content guide before proposing a change that adds an extra third-party link.