Skip to content

Pod stuck in ContainerCreating with log message "Volume is not yet attached according to node" #50721

Closed
@verult

Description

@verult

Is this a BUG REPORT or FEATURE REQUEST?: bug

/kind bug

What happened:

  • Deployment running with read-only GCE PD.
  • A node containing one of the deployment pods is deleted and recreated soon after.
  • New pod is scheduled to the same node, but is stuck in ContainerCreating state. The node's kubelet logs show continuous "Volume is not yet attached according to node" errors.

What you expected to happen: Pod recreated successfully.

Anything else we need to know?: Cause by PR #45923

The issue happens as follows:

  • When the node is deleted, GCE PD is detached from the cloud provider.
  • AttachDetachController discovers the PD has been detached, and marks it as detached.
  • The original pod still exists in the API server (i.e. before garbage collection), so AttachDetachController reattaches the PD, successfully.
  • Before the node recovers, ADC tries to update the node status but finds that the node doesn't exist. Following the logic introduced in Node status updater now deletes the node entry in attach updates... #45923 it never attempts to update the node status again.
  • The node recovers, a new pod is scheduled to it, waits for a successful attach signal in the node status but never gets it.

/cc @jingxu97 @msau42
/sig storage

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.sig/storageCategorizes an issue or PR as relevant to SIG Storage.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions