Description
openedon Nov 7, 2024
How to categorize this issue?
/area os
/kind bug
What happened:
Gardener-Node-Agent deletes containerd drop-in directory when drop-in gets removed from OSC and no more reference to systemd unit exists.
Details:
We deployed a shoot with an OSC containing the following extension units in the status field:
...
status:
extensionUnits:
- dropIns:
- content: |
[Service]
ExecStartPre=/opt/gardener/bin/containerd_cgroup_driver.sh
name: 10-configure-cgroup-driver.conf
filePaths:
- /opt/gardener/bin/g_functions.sh
- /opt/gardener/bin/containerd_cgroup_driver.sh
name: containerd.service
- dropIns:
- content: |
[Service]
ExecStartPre=/opt/gardener/bin/kubelet_cgroup_driver.sh
name: 10-configure-cgroup-driver.conf
filePaths:
- /opt/gardener/bin/g_functions.sh
- /opt/gardener/bin/kubelet_cgroup_driver.sh
name: kubelet.service
This results in the following files to be present:
ls /etc/systemd/system/containerd.service.d
10-configure-cgroup-driver.conf
11-exec_config.conf
30-env_config.conf
override.conf
Only one of these files was delivered by the OSC.
The OSC status was updated by the operating system controller and the extensionUnits were removed from it; the containerd extension unit.
...
status:
extensionUnits:
- dropIns:
- content: |
[Service]
ExecStartPre=/opt/gardener/bin/kubelet_cgroup_driver.sh
name: 10-configure-cgroup-driver.conf
filePaths:
- /opt/gardener/bin/g_functions.sh
- /opt/gardener/bin/kubelet_cgroup_driver.sh
name: kubelet.service
As the containerd.service was defined only once in the status field, the entire drop-in directory was deleted.
Even tho we just intended to remove one systemd drop-in file, the code in here identifies the unit as to be deleted. And the code here deletes the entire systemd drop-in directory.
What you expected to happen:
Gardener node agent should not delete an entire drop-in directory with drop-ins it never created.
How to reproduce it (as minimally and precisely as possible):
See above
Anything else we need to know?:
In the garden-linux extension a drop-in for containerd is deployed that should no longer be deployed going forward. As a result we are removing the entire unit from the status.extensionUnits
as described above.
For this bug to occur it is important that we deployed a drop-in for a systemd unit delivered by the OS vendor. This does not happen to systemd units delivered by gardener directly.
Environment:
- Gardener version: 1.107
- Kubernetes version (use
kubectl version
): --- - Cloud provider or hardware configuration: ---
- Others: ---
Activity