Skip to content

Gardener Node Agent deletes containerd drop-in directory #10809

Open

Description

How to categorize this issue?

/area os
/kind bug

What happened:

Gardener-Node-Agent deletes containerd drop-in directory when drop-in gets removed from OSC and no more reference to systemd unit exists.

Details:

We deployed a shoot with an OSC containing the following extension units in the status field:

...
status:
  extensionUnits:
  - dropIns:
    - content: |
        [Service]
        ExecStartPre=/opt/gardener/bin/containerd_cgroup_driver.sh
      name: 10-configure-cgroup-driver.conf
    filePaths:
    - /opt/gardener/bin/g_functions.sh
    - /opt/gardener/bin/containerd_cgroup_driver.sh
    name: containerd.service
  - dropIns:
    - content: |
        [Service]
        ExecStartPre=/opt/gardener/bin/kubelet_cgroup_driver.sh
      name: 10-configure-cgroup-driver.conf
    filePaths:
    - /opt/gardener/bin/g_functions.sh
    - /opt/gardener/bin/kubelet_cgroup_driver.sh
    name: kubelet.service

This results in the following files to be present:

ls /etc/systemd/system/containerd.service.d

10-configure-cgroup-driver.conf  
11-exec_config.conf 
30-env_config.conf  
override.conf

Only one of these files was delivered by the OSC.

The OSC status was updated by the operating system controller and the extensionUnits were removed from it; the containerd extension unit.

...
status:
  extensionUnits:
  - dropIns:
    - content: |
        [Service]
        ExecStartPre=/opt/gardener/bin/kubelet_cgroup_driver.sh
      name: 10-configure-cgroup-driver.conf
    filePaths:
    - /opt/gardener/bin/g_functions.sh
    - /opt/gardener/bin/kubelet_cgroup_driver.sh
    name: kubelet.service

As the containerd.service was defined only once in the status field, the entire drop-in directory was deleted.

Even tho we just intended to remove one systemd drop-in file, the code in here identifies the unit as to be deleted. And the code here deletes the entire systemd drop-in directory.

What you expected to happen:

Gardener node agent should not delete an entire drop-in directory with drop-ins it never created.

How to reproduce it (as minimally and precisely as possible):

See above

Anything else we need to know?:

In the garden-linux extension a drop-in for containerd is deployed that should no longer be deployed going forward. As a result we are removing the entire unit from the status.extensionUnits as described above.

For this bug to occur it is important that we deployed a drop-in for a systemd unit delivered by the OS vendor. This does not happen to systemd units delivered by gardener directly.

Environment:

  • Gardener version: 1.107
  • Kubernetes version (use kubectl version): ---
  • Cloud provider or hardware configuration: ---
  • Others: ---

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions