Skip to content

Pods with volumes stuck in ContainerCreating after cluster node is deleted from OpenStack #50200

Closed
@kars7e

Description

@kars7e

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
One of the cluster worker nodes was deleted from OpenStack.
Pods running on this node have been rescheduled on different nodes but got stuck in ContainerCreating. It's stuck for 20+ minutes until action like restarting controller manager is taken (it can't reconcile without manual intervention). See at the end for actions that can fix it.

What you expected to happen:
Pod should be started correctly on a different node, with volumes attached to it.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
The underlying issue is that Cinder volumes are getting dettached when instance is deleted, but k8s is not registering this fact, and is throwing:

Multi-Attach error for volume "pvc-7da59477-7a13-11e7-a1c3-fa163ec6b87c" Volume is already exclusively attached to one node and can't be attached to another

It seems that manager attempts the detach, but is not able to handle the fact that volume is in available state (k8s-node-2 is the node that has been deleted):

{"log":"E0805 19:36:33.196285 6 nestedpendingoperations.go:262] Operation for \"\\\"kubernetes.io/cinder/9dd4110b-e9f2-4eba-a2b5-22b6082b2c1b\\\"\" failed. No retries permitted until 2017-08-05 19:37:37.196173574 +0000 UTC (durationBeforeRetry 1m4s). Error: DetachVolume.Detach failed for volume \"pvc-7da59477-7a13-11e7-a1c3-fa163ec6b87c\" (UniqueName: \"kubernetes.io/cinder/9dd4110b-e9f2-4eba-a2b5-22b6082b2c1b\") on node \"k8s-node-2-31217f04-941c-48f2-b36e-8a97a3bf7515\" : can not detach volume kubernetes-dynamic-pvc-7da59477-7a13-11e7-a1c3-fa163ec6b87c, its status is available.\n","stream":"stderr","time":"2017-08-05T19:36:33.196396348Z"}

After inspecting the pod, following events are registered (note, this pod is eventually started because controller manager has been rebooted):

Events:
  FirstSeen	LastSeen	Count	From								SubObjectPath		Type		Reason		  	Message
  ---------	--------	-----	----								-------------		--------	------			-------
  11m		11m		1	default-scheduler									Normal		Scheduled	  	Successfully assigned webapp-staging-3564021476-tq2dc to k8s-node-1-31217f04-941c-48f2-b36e-8a97a3bf7515
  11m		11m		1	kubelet, k8s-node-1-31217f04-941c-48f2-b36e-8a97a3bf7515				Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-bw55f"
  11m		2m		5144	attachdetach										Warning		FailedAttachVolume	Multi-Attach error for volume "pvc-7da59477-7a13-11e7-a1c3-fa163ec6b87c" Volume is already exclusively attached to one node and can't be attached to another
  9m		2m		4	kubelet, k8s-node-1-31217f04-941c-48f2-b36e-8a97a3bf7515				Warning		FailedMount	  	Unable to mount volumes for pod "webapp-staging-3564021476-tq2dc_default(63941f31-7a14-11e7-bd74-fa163eae2160)": timeout expired waiting for volumes to attach/mount for pod "default"/"webapp-staging-3564021476-tq2dc". list of unattached/unmounted volumes=[mypvc]
  9m		2m		4	kubelet, k8s-node-1-31217f04-941c-48f2-b36e-8a97a3bf7515				Warning		FailedSync	  	Error syncing pod
  1m		1m		1	kubelet, k8s-node-1-31217f04-941c-48f2-b36e-8a97a3bf7515				Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "pvc-7da59477-7a13-11e7-a1c3-fa163ec6b87c"
  1m		1m		1	kubelet, k8s-node-1-31217f04-941c-48f2-b36e-8a97a3bf7515	spec.containers{nginx}	Normal		Pulled		  	Container image "nginx:1.7.9" already present on machine
  1m		1m		1	kubelet, k8s-node-1-31217f04-941c-48f2-b36e-8a97a3bf7515	spec.containers{nginx}	Normal		Created		  	Created container
  1m		1m		1	kubelet, k8s-node-1-31217f04-941c-48f2-b36e-8a97a3bf7515	spec.containers{nginx}	Normal		Started

Also an excerpt from controller manager log (with --v=4):

{"log":"I0805 19:36:33.156440       6 deployment_controller.go:562] Started syncing deployment \"default/webapp-staging\" (2017-08-05 19:36:33.156430897 +0000 UTC)\n","stream":"stderr","time":"2017-08-05T19:36:33.156586574Z"}
{"log":"I0805 19:36:33.157358       6 progress.go:231] Queueing up deployment \"webapp-staging\" for a progress check now\n","stream":"stderr","time":"2017-08-05T19:36:33.157472884Z"}
{"log":"I0805 19:36:33.157409       6 deployment_controller.go:564] Finished syncing deployment \"default/webapp-staging\" (954.263µs)\n","stream":"stderr","time":"2017-08-05T19:36:33.157484805Z"}
{"log":"E0805 19:36:33.196088       6 openstack_volumes.go:263] can not detach volume kubernetes-dynamic-pvc-7da59477-7a13-11e7-a1c3-fa163ec6b87c, its status is available.\n","stream":"stderr","time":"2017-08-05T19:36:33.196206222Z"}
{"log":"E0805 19:36:33.196104       6 attacher.go:394] Error detaching volume \"9dd4110b-e9f2-4eba-a2b5-22b6082b2c1b\" from node \"k8s-node-2-31217f04-941c-48f2-b36e-8a97a3bf7515\": can not detach volume kubernetes-dynamic-pvc-7da59477-7a13-11e7-a1c3-fa163ec6b87c, its status is available.\n","stream":"stderr","time":"2017-08-05T19:36:33.196226294Z"}
{"log":"I0805 19:36:33.196143       6 actual_state_of_world.go:478] Add new node \"k8s-node-2-31217f04-941c-48f2-b36e-8a97a3bf7515\" to nodesToUpdateStatusFor\n","stream":"stderr","time":"2017-08-05T19:36:33.196230931Z"}
{"log":"I0805 19:36:33.196157       6 actual_state_of_world.go:486] Report volume \"kubernetes.io/cinder/9dd4110b-e9f2-4eba-a2b5-22b6082b2c1b\" as attached to node \"k8s-node-2-31217f04-941c-48f2-b36e-8a97a3bf7515\"\n","stream":"stderr","time":"2017-08-05T19:36:33.196234322Z"}
{"log":"E0805 19:36:33.196285       6 nestedpendingoperations.go:262] Operation for \"\\\"kubernetes.io/cinder/9dd4110b-e9f2-4eba-a2b5-22b6082b2c1b\\\"\" failed. No retries permitted until 2017-08-05 19:37:37.196173574 +0000 UTC (durationBeforeRetry 1m4s). Error: DetachVolume.Detach failed for volume \"pvc-7da59477-7a13-11e7-a1c3-fa163ec6b87c\" (UniqueName: \"kubernetes.io/cinder/9dd4110b-e9f2-4eba-a2b5-22b6082b2c1b\") on node \"k8s-node-2-31217f04-941c-48f2-b36e-8a97a3bf7515\" : can not detach volume kubernetes-dynamic-pvc-7da59477-7a13-11e7-a1c3-fa163ec6b87c, its status is available.\n","stream":"stderr","time":"2017-08-05T19:36:33.196396348Z"}
{"log":"I0805 19:36:33.212211       6 node_status_updater.go:76] Could not update node status. Failed to find node \"k8s-node-2-31217f04-941c-48f2-b36e-8a97a3bf7515\" in NodeInformer cache. Error: 'node \"k8s-node-2-31217f04-941c-48f2-b36e-8a97a3bf7515\" not found'\n","stream":"stderr","time":"2017-08-05T19:36:33.212356492Z"}
{"log":"W0805 19:36:33.212444       6 reconciler.go:262] Multi-Attach error for volume \"pvc-7da59477-7a13-11e7-a1c3-fa163ec6b87c\" (UniqueName: \"kubernetes.io/cinder/9dd4110b-e9f2-4eba-a2b5-22b6082b2c1b\") from node \"k8s-node-1-31217f04-941c-48f2-b36e-8a97a3bf7515\" Volume is already exclusively attached to one node and can't be attached to another\n","stream":"stderr","time":"2017-08-05T19:36:33.2

Note: Following operations will resolve this issue:

  • Restarting active controller manager
  • attaching the volume in question to random instance
  • deleting & recreating the pod.

Environment:

  • Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d70ffb940d12c648df2b6647580c150b8f113704", GitTreeState:"clean", BuildDate:"2017-08-04T07:14:35Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration**:
    OpenStack Mitaka
  • OS (e.g. from /etc/os-release):
    Ubuntu 16.04.2 LTS (Xenial Xerus)
  • Kernel (e.g. uname -a):
    Linux k8s-master-1-31217f04-941c-48f2-b36e-8a97a3bf7515 4.4.0-62-generic add travis integration #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
    kargo (kubespray)
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/provider/openstackIssues or PRs related to openstack providerkind/bugCategorizes issue or PR as related to a bug.sig/storageCategorizes an issue or PR as relevant to SIG Storage.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions