Skip to content

MCM should post eviction error (after some time) to the Machine object #997

@Kumm-Kai

Description

@Kumm-Kai

How to categorize this issue?
/area control-plane
/area usability
/kind enhancement
/priority 3

What would you like to be added:
MCM should, at some point, write general eviction/drain errors into the Status.LastOperation.Description field.

We are mainly interested in errors that contain Cannot evict pod as it would violate the pod's disruption budget., because of the below explained reason.

Code locations:

  • after:
    klog.V(3).Infof("Pod %s/%s couldn't be evicted from node %s. This may also occur due to PDB violation. Will be retried. Error: %v", pod.Namespace, pod.Name, pod.Spec.NodeName, err)
    pdb := getPdbForPod(o.pdbLister, pod)
    if pdb != nil {
    if isMisconfiguredPdb(pdb) {
    pdbErr := fmt.Errorf("error while evicting pod %q: pod disruption budget %s/%s is misconfigured and requires zero voluntary evictions",
    pod.Name, pdb.Namespace, pdb.Name)
    returnCh <- pdbErr
    return
    }
    }
  • after
    klog.V(3).Infof("Pod %s/%s couldn't be evicted from node %s. This may also occur due to PDB violation. Will be retried. Error: %v", pod.Namespace, pod.Name, pod.Spec.NodeName, err)
    pdb := getPdbForPod(o.pdbLister, pod)
    if pdb != nil {
    if isMisconfiguredPdb(pdb) {
    pdbErr := fmt.Errorf("error while evicting pod %q: pod disruption budget %s/%s is misconfigured and requires zero voluntary evictions",
    pod.Name, pdb.Namespace, pdb.Name)
    returnCh <- pdbErr
    o.checkAndDeleteWorker(volumeAttachmentEventCh)
    continue
    }
    }

Why is this needed:
Currently, the MCM only writes an error into the Status.LastOperation.Description field if it detected a misconfigured PDB.

In cases where the PDB is not misconfigured, but a drain would still violate it (e.g., because the workload is unhealthy), it would be very beneficial to easily being able to differentiate between "a node can't be drained because of a PDB" and "a node can't be drained because something else is broken".

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/control-planeControl plane relatedarea/usabilityUsability relatedkind/enhancementEnhancement, improvement, extensionpriority/3Priority (lower number equals higher priority)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions