Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods with volumes stuck in ContainerCreating after cluster node is powered off in vCenter #50944

Closed
BaluDontu opened this issue Aug 18, 2017 · 18 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Milestone

Comments

@BaluDontu
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
One of the cluster worker nodes was powered off in vCenter.
Pods running on this node have been rescheduled on different nodes but got stuck in ContainerCreating. It failed to attach the volume on the new node with error "Multi-Attach error for volume pvc-xxx, Volume is already exclusively attached to one node and can't be attached to another" and hence the application running in the pod has no data available because the volume is not attached to the new node. The volume is still attached to powered off and no attempt is made to detach the volume by attach/detach controller. Hence attach the volume on the new node failed with error "Multi-Attach error". It's stuck for 6 minutes until attach/detach controller forcefully tried to detach the volume on the powered off node. After the end of 6 minutes when volume is detached on powered off node, the volume is now successfully attached on the new node and application has now the data available.

What you expected to happen:
Pod should be started correctly on a different node, with volumes attached to it very quickly. For an application to be down for 6 minutes is very bad.

How to reproduce it (as minimally and precisely as possible):

  • use k8s 1.7.0+ with vSphere cloud provider
  • Delete the node from the cluster by powering off the node from vCenter.

Anything else we need to know?:

Below are the logs from controller manager.

{"log":"I0818 23:27:01.933677       1 controller_utils.go:285] Recording status change NodeNotReady event message for node node2\n","stream":"stderr","time":"2017-08-18T23:27:01.933908764Z"}
{"log":"I0818 23:27:01.934012       1 controller_utils.go:203] Update ready status of pods on node [node2]\n","stream":"stderr","time":"2017-08-18T23:27:01.934110529Z"}
{"log":"I0818 23:27:01.934384       1 event.go:218] Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"node2\", UID:\"14b8f7fb-846c-11e7-addd-0050569ccb52\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'NodeNotReady' Node node2 status is now: NodeNotReady\n","stream":"stderr","time":"2017-08-18T23:27:01.93448004Z"}
{"log":"I0818 23:27:01.939006       1 controller_utils.go:220] Updating ready status of pod redis-master-2401852951-w92hb to false\n","stream":"stderr","time":"2017-08-18T23:27:01.939064356Z"}
{"log":"I0818 23:27:01.944968       1 controller_utils.go:220] Updating ready status of pod heapster-v1.2.0-1794790591-94z4s to false\n","stream":"stderr","time":"2017-08-18T23:27:01.945054955Z"}
{"log":"I0818 23:27:01.956548       1 controller_utils.go:220] Updating ready status of pod kube-proxy-1lq8f to false\n","stream":"stderr","time":"2017-08-18T23:27:01.956639977Z"}
{"log":"W0818 23:27:02.043661       1 vsphere.go:625] VM node2, is not in poweredOn state\n","stream":"stderr","time":"2017-08-18T23:27:02.043753193Z"}
{"log":"I0818 23:27:02.043759       1 controller_utils.go:274] Recording Deleting Node node2 because it's not present according to cloud provider event message for node node2\n","stream":"stderr","time":"2017-08-18T23:27:02.043785273Z"}
{"log":"I0818 23:27:02.044235       1 event.go:218] Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"node2\", UID:\"14b8f7fb-846c-11e7-addd-0050569ccb52\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'DeletingNode' Node node2 event: Deleting Node node2 because it's not present according to cloud provider\n","stream":"stderr","time":"2017-08-18T23:27:02.044320987Z"}
{"log":"I0818 23:27:02.050426       1 attach_detach_controller.go:465] error removing node \"node2\" from desired-state-of-world: failed to delete node \"node2\" from list of nodes managed by attach/detach controller--the node still contains 1 volumes in its list of volumes to attach\n","stream":"stderr","time":"2017-08-18T23:27:02.050519226Z"}
{"log":"I0818 23:27:06.964846       1 gc_controller.go:162] Found orphaned Pod redis-master-2401852951-w92hb assigned to the Node node2. Deleting.\n","stream":"stderr","time":"2017-08-18T23:27:06.96761137Z"}
{"log":"I0818 23:27:06.964850       1 gc_controller.go:62] PodGC is force deleting Pod: default:redis-master-2401852951-w92hb\n","stream":"stderr","time":"2017-08-18T23:27:06.967620391Z"}
{"log":"I0818 23:27:06.987612       1 replica_set.go:455] Too few \"default\"/\"redis-master-2401852951\" replicas, need 1, creating 1\n","stream":"stderr","time":"2017-08-18T23:27:06.987687983Z"}
{"log":"I0818 23:27:07.005200       1 gc_controller.go:166] Forced deletion of orphaned Pod redis-master-2401852951-w92hb succeeded\n","stream":"stderr","time":"2017-08-18T23:27:07.005265456Z"}
{"log":"I0818 23:27:07.027452       1 event.go:218] Event(v1.ObjectReference{Kind:\"ReplicaSet\", Namespace:\"default\", Name:\"redis-master-2401852951\", UID:\"3f1729f7-846c-11e7-addd-0050569ccb52\", APIVersion:\"extensions\", ResourceVersion:\"909\", FieldPath:\"\"}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: redis-master-2401852951-wcx6h\n","stream":"stderr","time":"2017-08-18T23:27:07.027541497Z"}
{"log":"I0818 23:27:07.044256       1 nodecontroller.go:553] NodeController observed a Node deletion: node2\n","stream":"stderr","time":"2017-08-18T23:27:07.044322304Z"}
{"log":"I0818 23:27:07.044362       1 controller_utils.go:274] Recording Removing Node node2 from NodeController event message for node node2\n","stream":"stderr","time":"2017-08-18T23:27:07.048265998Z"}
{"log":"I0818 23:27:07.044485       1 event.go:218] Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"node2\", UID:\"14b8f7fb-846c-11e7-addd-0050569ccb52\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RemovingNode' Node node2 event: Removing Node node2 from NodeController\n","stream":"stderr","time":"2017-08-18T23:27:07.048273963Z"}
2017-08-18T23:27:07.048273963Z"}
{"log":"W0818 23:27:07.054184       1 reconciler.go:262] Multi-Attach error for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\") from node \"node3\" Volume is already exclusively attached to one node and can't be attached to another\n","stream":"stderr","time":"2017-08-18T23:27:07.05423019Z"}
{"log":"I0818 23:27:07.054279       1 event.go:218] Event(v1.ObjectReference{Kind:\"Pod\", Namespace:\"default\", Name:\"redis-master-2401852951-wcx6h\", UID:\"bff886d1-846c-11e7-addd-0050569ccb52\", APIVersion:\"v1\", ResourceVersion:\"945\", FieldPath:\"\"}): type: 'Warning' reason: 'FailedAttachVolume' Multi-Attach error for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" Volume is already exclusively attached to one node and can't be attached to another\n","stream":"stderr","time":"2017-08-18T23:27:07.054350162Z"}
{"log":"W0818 23:27:07.054184       1 reconciler.go:262] Multi-Attach error for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\") from node \"node3\" Volume is already exclusively attached to one node and can't be attached to another\n","stream":"stderr","time":"2017-08-18T23:27:07.05423019Z"}
{"log":"I0818 23:27:07.054279       1 event.go:218] Event(v1.ObjectReference{Kind:\"Pod\", Namespace:\"default\", Name:\"redis-master-2401852951-wcx6h\", UID:\"bff886d1-846c-11e7-addd-0050569ccb52\", APIVersion:\"v1\", ResourceVersion:\"945\", FieldPath:\"\"}): type: 'Warning' reason: 'FailedAttachVolume' Multi-Attach error for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" Volume is already exclusively attached to one node and can't be attached to another\n","stream":"stderr","time":"2017-08-18T23:27:07.054350162Z"}
{"log":"W0818 23:27:07.154442       1 reconciler.go:262] Multi-Attach error for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\") from node \"node3\" Volume is already exclusively attached to one node and can't be attached to another\n","stream":"stderr","time":"2017-08-18T23:27:07.154554127Z"}
{"log":"I0818 23:27:07.155116       1 event.go:218] Event(v1.ObjectReference{Kind:\"Pod\", Namespace:\"default\", Name:\"redis-master-2401852951-wcx6h\", UID:\"bff886d1-846c-11e7-addd-0050569ccb52\", APIVersion:\"v1\", ResourceVersion:\"945\", FieldPath:\"\"}): type: 'Warning' reason: 'FailedAttachVolume' Multi-Attach error for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" Volume is already exclusively attached to one node and can't be attached to another\n","stream":"stderr","time":"2017-08-18T23:27:07.15522094Z"}
{"log":"W0818 23:27:07.254711       1 reconciler.go:262] Multi-Attach error for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\") from node \"node3\" Volume is already exclusively attached to one node and can't be attached to another\n","stream":"stderr","time":"2017-08-18T23:27:07.254837081Z"}

**At the end of 6 minutes, you will see attach/detach controller attempts a force delete after which attach volume on new node succeeds.**

**{"log":"W0818 23:33:07.140938**       1 reconciler.go:228] attacherDetacher.DetachVolume started for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\") on node \"node2\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching\n","stream":"stderr","time":"2017-08-18T23:33:07.140998742Z"}
{"log":"I0818 23:33:07.141103       1 attacher.go:231]   Detach in attacher.go with deviceMountPath: \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\"\n","stream":"stderr","time":"2017-08-18T23:33:07.141180226Z"}
{"log":"I0818 23:33:07.141201       1 attacher.go:233]   Calling DiskIsAttached in attacher.go with volPath: \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\"\n","stream":"stderr","time":"2017-08-18T23:33:07.141248186Z"}
{"log":"I0818 23:33:07.141303       1 vsphere.go:1003]   DiskIsAttached with volPath: \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\" and nodeName: \"node2\"\n","stream":"stderr","time":"2017-08-18T23:33:07.141325846Z"}
{"log":"I0818 23:33:07.413977       1 vsphere.go:1041]   DiskIsAttached result with volPath: \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\" and nodeName: \"node2\" is isAttached: true and err: \u003cnil\u003e\n","stream":"stderr","time":"2017-08-18T23:33:07.414101776Z"}
{"log":"I0818 23:33:07.414265       1 attacher.go:250]   Calling DetachDisk in attacher.go with volPath: \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\"\n","stream":"stderr","time":"2017-08-18T23:33:07.414329139Z"}
{"log":"I0818 23:33:07.414359       1 vsphere.go:1225]   DetachDisk with volPath: \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\" and nodeName: \"node2\"\n","stream":"stderr","time":"2017-08-18T23:33:07.41448605Z"}
{"log":"I0818 23:33:08.855287       1 vsphere.go:1282]   DetachDisk result with volPath: \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\" and nodeName: \"node2\" is err: \u003cnil\u003e\n","stream":"stderr","time":"2017-08-18T23:33:08.855487181Z"}
{"log":"I0818 23:33:08.855387       1 operation_generator.go:348] DetachVolume.Detach succeeded for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\") on node \"node2\" \n","stream":"stderr","time":"2017-08-18T23:33:08.855531178Z"}
{"log":"I0818 23:33:08.945540       1 reconciler.go:272] attacherDetacher.AttachVolume started for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-08-18T23:33:08.945844415Z"}
{"log":"I0818 23:33:11.951325       1 reconciler.go:272] attacherDetacher.AttachVolume started for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-08-18T23:33:11.951450457Z"}
{"log":"I0818 23:33:12.581761       1 operation_generator.go:275] AttachVolume.Attach succeeded for volume \"pvc-2f53fe7e-846c-11e7-addd-0050569ccb52\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-2f53fe7e-846c-11e7-addd-0050569ccb52.vmdk\") from node \"node3\" \n","stream":"stderr","time":"2017-08-18T23:33:12.581846373Z"}

The issue is because of a recent change in kubernetes release 1.7 - #45346 which handles "Multi-Attach error for volume"

I think this is a very serious problems for us, as in case of vsphere volumes are not detached automatically when the node is powered off as in GCE or AWS.

@tusharnt @divyenpatel @luomiao @rohitjogvmw @pdhamdhere

@jingxu97 @saad-ali @msau42 @codablock: Can you please provide your inputs on this.

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 18, 2017
@k8s-github-robot
Copy link

@BaluDontu
There are no sig labels on this issue. Please add a sig label by:

  1. mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
    e.g., @kubernetes/sig-contributor-experience-<group-suffix> to notify the contributor experience sig, OR

  2. specifying the label manually: /sig <label>
    e.g., /sig scalability to apply the sig/scalability label

Note: Method 1 will trigger an email to the group. You can find the group list here and label list here.
The <group-suffix> in the method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 18, 2017
@BaluDontu
Copy link
Contributor Author

/sig storage
cc @kubernetes/sig-storage-bugs

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. kind/bug Categorizes issue or PR as related to a bug. labels Aug 19, 2017
@k8s-ci-robot
Copy link
Contributor

@BaluDontu: Reiterating the mentions to trigger a notification:
@kubernetes/sig-storage-bugs.

In response to this:

/sig storage
cc @kubernetes/sig-storage-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 19, 2017
@wenlxie
Copy link
Contributor

wenlxie commented Aug 19, 2017

+1 also facing the same issue.

  1. When the node had been power off, then the node will be deleted from the the cluster.
  2. For some cloud provider , the node will node be deleted, but the node will become NotReady and the pod will be evicted.
    So in these two scenarios, we can set the volume to be unmount , then at reconcile we don't need to wait for maxWaitForUnmountDuration to detach the volume. The concern is what if the node comeback?

@jingxu97
Copy link
Contributor

@wenlxie, I think when node powered off, it is not supposed to be deleted from the cluster automatically. Only if you have some management tool such as GEK instance manager it might delete the node when it is powered off.
When node delete happens, for most of volume plugins such as gce pd, aws, cinder, the volume will be automatically detached. attach_detach controller can detect that and mark volume as detached so that volume can be attached to another node.
Please let me know if your case is different. Thanks!

@wenlxie
Copy link
Contributor

wenlxie commented Aug 20, 2017

@jingxu97 I think the node will be deleted from the cluster for some cloud provider.
Here is the discussion about this: #46442.
Take cinder for example, if the node status is not 'active', then it will be consider as not exist in the cloud(https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/openstack/openstack.go#L334 ), so if the node become NotReady, then the node controller will delete it from the cluster.

When node delete happens, for most of volume plugins such as gce pd, aws, cinder, the volume will be automatically detached.

It still need to wait for maxWaitForUnmountDuration (6 minutes) , right? So the discussion is about if we can speed up the detach process in this scenario.

@jingxu97
Copy link
Contributor

@wenlxie if node is deleted, volume is detached by cloud provider, not kubernetes. kubernetes at first is not aware of detach at all, but we added a check to verify volume is still detached or not, and will mark detach if verification returns false. So there is no 6 minutes delay.

However, there are a few volume plugins such as cinder, vshpere, that their verifyVolumesAreAttached function has a bug see comment #50266. This might be the reason if you see a 6 minutes delay because kubernetes does not know it was detached already. There are some PRs open for fixing it.

@wenlxie
Copy link
Contributor

wenlxie commented Aug 20, 2017

@jingxu97 Thanks for your kindly explanation.
I just realized that we not enable reconciliationSync, so it will not do what you said when node is deleted from the cloud.
However, there is another question that if the node is deleted from the cluster but the node still exist in the cloud, so the volumes will be consider as mounted, then when do detach, it will wait for 6 minutes ?

@BaluDontu
Copy link
Contributor Author

BaluDontu commented Aug 20, 2017

@jingxu97 : Hi Jing, in my case when the node is powered off, the node will deleted from the kubernetes cluster and not the vCenter. In vCenter the node still exists.
Now, even though the node is powered off and still deleted by the node controller in kubernetes cluster, it is not issuing a detach call immediately. I would have to wait for 6 minutes before it does force detach. Until this moment, the attach on other node fails because the volume is still attached to powered off node. You can clearly see these actions in the logs I have provided above.

I agree with @wenlxie comments, that a detach call to the cloud provider is not at all issued by the kubernetes when the node is powered off (or) deleted from kubernetes cluster by the node controller. Only after 6 minutes, I issues a detach call to cloud provider. Until this we cannot detach the volume because no call is issued by kubernetes cluster. Please note that the volume is not detached automatically when node is powered off in case of vsphere.

Also please note here that node is not deleted from vCenter. Node is only deleted from kubernetes cluster. Node delete only represents node is removed from kubernetes cluster state and not the vCenter.

@BaluDontu
Copy link
Contributor Author

Since the node is powered off, the volume attached to powered off node can be attached to other powered on node successfully. But you cannot the same volume to 2 different powered on nodes. But the case here is kubernetes is preventing me from attaching the volume to other node. Here, vsphere supports multi-attach scenario.

@saad-ali
Copy link
Member

Now, even though the node is powered off and still deleted by the node controller in kubernetes cluster, it is not issuing a detach call immediately

I would have to wait for 6 minutes before it does force detach. Until this moment, the attach on other node fails because the volume is still attached to powered off node. You can clearly see these actions in the logs I have provided above.

@BaluDontu This is by design. If a node is not safely cordoned and drained then it is up to the cluster to notice that something is not right and correct it. This involves:

  1. The Kubernetes node controller noticing that the node is down.
  2. The Kubernetes node controller draining (deleting pods) on the downed node.
  3. The attach/detach controller noticing the pod delete.
  4. The attach/detach controller waiting for the volume to be safely unmounted (up to 6 minutes).
  5. The attach/detach controller issuing the detach call.

This process can take several minutes to complete.

Pod should be started correctly on a different node, with volumes attached to it very quickly. For an application to be down for 6 minutes is very bad.

Even if we reduce some of these wait times, Kubernetes can't instantly detach volumes as soon as a node is powered off. Meaning you can not expect 100% uptime with a single pod/disk. Your application, from a service level, must be designed to handle down time of a single instance.

@BaluDontu
Copy link
Contributor Author

  1. The Kubernetes node controller noticing that the node is down.
  2. The Kubernetes node controller draining (deleting pods) on the downed node.
  3. The attach/detach controller noticing the pod delete.
  4. The attach/detach controller waiting for the volume to be safely unmounted (up to 6 minutes).
  5. The attach/detach controller issuing the detach call.

I am completely OK with the logic for detach. I agree that kubernetes is reacting safely to keep the system in consistent state. But my only problem is that it could go ahead with the attach of this volume on the new node where the pod is getting scheduled so that we have a minimal downtime for the application. I have tested the same scenario in the 1.6.5 release and see that kubernetes goes ahead with the volume attach on new node where the pod got scheduled and would eventually delete the this volume on powered off after 6 minutes. This is a sort of solution we are currently looking for. Please check below 1.6.5 logs for the same scenario.

  1. Initially attached volume on node1. And Attach volume operation succeeded.
{"log":"I0821 19:24:40.128083       1 reconciler.go:208] Started AttachVolume for volume \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\" to node \"node1\"\n","stream":"stderr","time":"2017-08-21T19:24:40.12827462Z"}
{"log":"I0821 19:24:41.337013       1 operation_generator.go:290] AttachVolume.Attach succeeded for volume \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\" (spec.Name: \"pvc-208563c0-86a5-11e7-bff5-005056bd2502\") from node \"node1\".\n","stream":"stderr","time":"2017-08-21T19:24:41.337222884Z"}
{"log":"I0821 19:24:41.438450       1 node_status_updater.go:143] Updating status for node \"node1\" succeeded. patchBytes: \"{\\\"status\\\":{\\\"volumesAttached\\\":[{\\\"devicePath\\\":\\\"/dev/disk/by-id/wwn-0x6000c291fb5e79d81ea1958e888c9d37\\\",\\\"name\\\":\\\"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\\\"}]}}\" VolumesAttached: [{kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk /dev/disk/by-id/wwn-0x6000c291fb5e79d81ea1958e888c9d37}]\n","stream":"stderr","time":"2017-08-21T19:24:41.438746061Z"}
{"log":"I0821 19:25:12.646017       1 attacher.go:115] VolumesAreAttached: volume \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\" (specName: \"pvc-208563c0-86a5-11e7-bff5-005056bd2502\") is attached\n","stream":"stderr","time":"2017-08-21T19:25:12.646308705Z"}
{"log":"I0821 19:26:12.877397       1 attacher.go:115] VolumesAreAttached: volume \"[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\" (specName: \"pvc-208563c0-86a5-11e7-bff5-005056bd2502\") is attached\n","stream":"stderr","time":"2017-08-21T19:26:12.877636505Z"}
  1. Now node1 is powered off. Node controller recognizes this and eventually deletes the orphaned pod.
{"log":"W0821 19:27:12.632675       1 vsphere.go:1446] VM node1, is not in poweredOn state\n","stream":"stderr","time":"2017-08-21T19:27:12.632845421Z"}
{"log":"I0821 19:27:33.459577       1 controller_utils.go:285] Recording status change NodeNotReady event message for node node1\n","stream":"stderr","time":"2017-08-21T19:27:33.459956784Z"}{"log":"I0821 19:27:33.459653       1 controller_utils.go:203] Update ready status of pods on node [node1]\n","stream":"stderr","time":"2017-08-21T19:27:33.459975458Z"}
{"log":"I0821 19:27:33.459791       1 event.go:217] Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"node1\", UID:\"c32a5114-86a4-11e7-bff5-005056bd2502\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'NodeNotReady' Node node1 status is now: NodeNotReady\n","stream":"stderr","time":"2017-08-21T19:27:33.459981155Z"}
{"log":"W0821 19:27:33.553165       1 vsphere.go:611] VM node1, is not in poweredOn state\n","stream":"stderr","time":"2017-08-21T19:27:33.553316921Z"}
{"log":"I0821 19:27:33.553213       1 nodecontroller.go:752] Deleting node (no longer present in cloud provider): node1\n","stream":"stderr","time":"2017-08-21T19:27:33.553369294Z"}
{"log":"I0821 19:27:33.553220       1 controller_utils.go:274] Recording Deleting Node node1 because it's not present according to cloud provider event message for node node1\n","stream":"stderr","time":"2017-08-21T19:27:33.55337406Z"}
{"log":"I0821 19:27:33.553384       1 event.go:217] Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"node1\", UID:\"c32a5114-86a4-11e7-bff5-005056bd2502\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'DeletingNode' Node node1 event: Deleting Node node1 because it's not present according to cloud provider\n","stream":"stderr","time":"2017-08-21T19:27:33.553443403Z"}
{"log":"I0821 19:27:38.553459       1 nodecontroller.go:623] NodeController observed a Node deletion: node1\n","stream":"stderr","time":"2017-08-21T19:27:38.553720811Z"}
{"log":"I0821 19:27:38.553518       1 controller_utils.go:274] Recording Removing Node node1 from NodeController event message for node node1\n","stream":"stderr","time":"2017-08-21T19:27:38.553760747Z"}
{"log":"I0821 19:27:38.553811       1 event.go:217] Event(v1.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"node1\", UID:\"c32a5114-86a4-11e7-bff5-005056bd2502\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RemovingNode' Node node1 event: Removing Node node1 from NodeController\n","stream":"stderr","time":"2017-08-21T19:27:38.554956045Z"}
{"log":"I0821 19:27:53.531262       1 gc_controller.go:157] Found orphaned Pod redis-server-2448813087-m62zw assigned to the Node node1. Deleting.\n","stream":"stderr","time":"2017-08-21T19:27:53.531376237Z"}
{"log":"I0821 19:27:53.531267       1 gc_controller.go:62] PodGC is force deleting Pod: default:redis-server-2448813087-m62zw\n","stream":"stderr","time":"2017-08-21T19:27:53.53138159Z"}
{"log":"I0821 19:27:53.557972       1 gc_controller.go:161] Forced deletion of orphaned Pod redis-server-2448813087-m62zw succeeded\n","stream":"stderr","time":"2017-08-21T19:27:53.558039494Z"}
  1. Immediately started attaching volume on the new node - node3 in this case and it succeeded. This is the problem in 1.7 release where it checks if the volume is attached to other node prevents any attach operation.
{"log":"I0821 19:27:56.771960       1 reconciler.go:208] Started AttachVolume for volume \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\" to node \"node3\"\n","stream":"stderr","time":"2017-08-21T19:27:56.772160877Z"}
{"log":"I0821 19:27:57.804622       1 operation_generator.go:290] AttachVolume.Attach succeeded for volume \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\" (spec.Name: \"pvc-208563c0-86a5-11e7-bff5-005056bd2502\") from node \"node3\".\n","stream":"stderr","time":"2017-08-21T19:27:57.804762946Z"}
{"log":"I0821 19:27:57.880295       1 node_status_updater.go:143] Updating status for node \"node3\" succeeded. patchBytes: \"{\\\"status\\\":{\\\"volumesAttached\\\":[{\\\"devicePath\\\":\\\"/dev/disk/by-id/wwn-0x6000c291fb5e79d81ea1958e888c9d37\\\",\\\"name\\\":\\\"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\\\"}]}}\" VolumesAttached: [{kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk /dev/disk/by-id/wwn-0x6000c291fb5e79d81ea1958e888c9d37}]\n","stream":"stderr","time":"2017-08-21T19:27:57.880397598Z"}
  1. Exactly after 6 minutes you see the volume is detached successfully from the powered off node.
    {"log":"I0821 19:33:59.102956 1 operation_generator.go:374] DetachVolume.Detach succeeded for volume \"kubernetes.io/vsphere-volume/[vsanDatastore] kubevols/kubernetes-dynamic-pvc-208563c0-86a5-11e7-bff5-005056bd2502.vmdk\" (spec.Name: \"pvc-208563c0-86a5-11e7-bff5-005056bd2502\") from node \"node1\".\n","stream":"stderr","time":"2017-08-21T19:33:59.103081259Z"}

This is a perfect scenario which we would want to have in 1.7 release. In 1.7 release, this issue was introduced in - #45346 which handles "Multi-Attach error for volume" to handle a specific use case for azure.

@saad-ali
Copy link
Member

saad-ali commented Aug 21, 2017

But my only problem is that it could go ahead with the attach of this volume on the new node where the pod is getting scheduled so that we have a minimal downtime for the application.

Ah, ok. Then we need to fix isMultiAttachForbidden(...) to not be so aggressive.

I am ok with either of the following solutions:

  1. Modify isMultiAttachForbidden(...) whitelist per volume plugin instead of looking at AccessModes for volumeSpec.PersistentVolume (just like it already does for volumeSpec.Volume).
  2. Special case the volumeSpec.PersistentVolume logic to always return false for vSphere regardless of AccessModes.

I'd prefer the first solution for consistency. CC @codablock
If no there are no objections to the first approach, @BaluDontu feel free to send out a PR for it.

@saad-ali saad-ali added this to the v1.8 milestone Aug 21, 2017
k8s-github-robot pushed a commit that referenced this issue Aug 23, 2017
Automatic merge from submit-queue

Allow attach of volumes to multiple nodes for vSphere

This is a fix for issue #50944 which doesn't allow a volume to be attached to a new node after the node is powered off where the volume was previously attached.

Current behaviour:
One of the cluster worker nodes was powered off in vCenter.
Pods running on this node have been rescheduled on different nodes but got stuck in ContainerCreating. It failed to attach the volume on the new node with error "Multi-Attach error for volume pvc-xxx, Volume is already exclusively attached to one node and can't be attached to another" and hence the application running in the pod has no data available because the volume is not attached to the new node. Since the volume is still attached to powered off node, any attempt to attach the volume on the new node failed with error "Multi-Attach error". It's stuck for 6 minutes until attach/detach controller forcefully tried to detach the volume on the powered off node. After the end of 6 minutes when volume is detached on powered off node, the volume is now successfully attached on the new node and application has now the data available.

What is expected to happen:
I would want the attach/detach controller to go ahead with the attach of the volume on new node where the pod got provisioned instead of waiting for the volume to be detached on the powered off node. It is ok to eventually delete the volume on the powered off node after 6 minutes. This way the application downtime is low and pods are up as soon as possible.

The current fix ignore, vSphere volumes/persistent volume to check for multi-attach scenario in attach/detach controller.

@jingxu97 @saad-ali : Can you please take a look at it.

@tusharnt @divyenpatel @rohitjogvmw @luomiao 

```release-note
Allow attach of volumes to multiple nodes for vSphere
```
@BaluDontu
Copy link
Contributor Author

BaluDontu commented Sep 1, 2017

/priority/critical-urgent

@msau42
Copy link
Member

msau42 commented Sep 1, 2017

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Sep 1, 2017
@BaluDontu BaluDontu reopened this Sep 1, 2017
@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Labels Complete

@BaluDontu

Issue label settings:

  • sig/storage: Issue will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Additional instructions available here

@jingxu97
Copy link
Contributor

jingxu97 commented Sep 1, 2017

@BaluDontu why reopened this issue? #51066 is not solving it?

@BaluDontu
Copy link
Contributor Author

Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

7 participants