-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods with volumes stuck in ContainerCreating after cluster node is powered off in vCenter #50944
Comments
@BaluDontu
Note: Method 1 will trigger an email to the group. You can find the group list here and label list here. |
/sig storage |
@BaluDontu: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
+1 also facing the same issue.
|
@wenlxie, I think when node powered off, it is not supposed to be deleted from the cluster automatically. Only if you have some management tool such as GEK instance manager it might delete the node when it is powered off. |
@jingxu97 I think the node will be deleted from the cluster for some cloud provider.
It still need to wait for maxWaitForUnmountDuration (6 minutes) , right? So the discussion is about if we can speed up the detach process in this scenario. |
@wenlxie if node is deleted, volume is detached by cloud provider, not kubernetes. kubernetes at first is not aware of detach at all, but we added a check to verify volume is still detached or not, and will mark detach if verification returns false. So there is no 6 minutes delay. However, there are a few volume plugins such as cinder, vshpere, that their verifyVolumesAreAttached function has a bug see comment #50266. This might be the reason if you see a 6 minutes delay because kubernetes does not know it was detached already. There are some PRs open for fixing it. |
@jingxu97 Thanks for your kindly explanation. |
@jingxu97 : Hi Jing, in my case when the node is powered off, the node will deleted from the kubernetes cluster and not the vCenter. In vCenter the node still exists. I agree with @wenlxie comments, that a detach call to the cloud provider is not at all issued by the kubernetes when the node is powered off (or) deleted from kubernetes cluster by the node controller. Only after 6 minutes, I issues a detach call to cloud provider. Until this we cannot detach the volume because no call is issued by kubernetes cluster. Please note that the volume is not detached automatically when node is powered off in case of vsphere. Also please note here that node is not deleted from vCenter. Node is only deleted from kubernetes cluster. Node delete only represents node is removed from kubernetes cluster state and not the vCenter. |
Since the node is powered off, the volume attached to powered off node can be attached to other powered on node successfully. But you cannot the same volume to 2 different powered on nodes. But the case here is kubernetes is preventing me from attaching the volume to other node. Here, vsphere supports multi-attach scenario. |
@BaluDontu This is by design. If a node is not safely cordoned and drained then it is up to the cluster to notice that something is not right and correct it. This involves:
This process can take several minutes to complete.
Even if we reduce some of these wait times, Kubernetes can't instantly detach volumes as soon as a node is powered off. Meaning you can not expect 100% uptime with a single pod/disk. Your application, from a service level, must be designed to handle down time of a single instance. |
I am completely OK with the logic for detach. I agree that kubernetes is reacting safely to keep the system in consistent state. But my only problem is that it could go ahead with the attach of this volume on the new node where the pod is getting scheduled so that we have a minimal downtime for the application. I have tested the same scenario in the 1.6.5 release and see that kubernetes goes ahead with the volume attach on new node where the pod got scheduled and would eventually delete the this volume on powered off after 6 minutes. This is a sort of solution we are currently looking for. Please check below 1.6.5 logs for the same scenario.
This is a perfect scenario which we would want to have in 1.7 release. In 1.7 release, this issue was introduced in - #45346 which handles "Multi-Attach error for volume" to handle a specific use case for azure. |
Ah, ok. Then we need to fix I am ok with either of the following solutions:
I'd prefer the first solution for consistency. CC @codablock |
Automatic merge from submit-queue Allow attach of volumes to multiple nodes for vSphere This is a fix for issue #50944 which doesn't allow a volume to be attached to a new node after the node is powered off where the volume was previously attached. Current behaviour: One of the cluster worker nodes was powered off in vCenter. Pods running on this node have been rescheduled on different nodes but got stuck in ContainerCreating. It failed to attach the volume on the new node with error "Multi-Attach error for volume pvc-xxx, Volume is already exclusively attached to one node and can't be attached to another" and hence the application running in the pod has no data available because the volume is not attached to the new node. Since the volume is still attached to powered off node, any attempt to attach the volume on the new node failed with error "Multi-Attach error". It's stuck for 6 minutes until attach/detach controller forcefully tried to detach the volume on the powered off node. After the end of 6 minutes when volume is detached on powered off node, the volume is now successfully attached on the new node and application has now the data available. What is expected to happen: I would want the attach/detach controller to go ahead with the attach of the volume on new node where the pod got provisioned instead of waiting for the volume to be detached on the powered off node. It is ok to eventually delete the volume on the powered off node after 6 minutes. This way the application downtime is low and pods are up as soon as possible. The current fix ignore, vSphere volumes/persistent volume to check for multi-attach scenario in attach/detach controller. @jingxu97 @saad-ali : Can you please take a look at it. @tusharnt @divyenpatel @rohitjogvmw @luomiao ```release-note Allow attach of volumes to multiple nodes for vSphere ```
/priority/critical-urgent |
/priority critical-urgent |
[MILESTONENOTIFIER] Milestone Labels Complete Issue label settings:
Additional instructions available here
|
@BaluDontu why reopened this issue? #51066 is not solving it? |
Closing this issue. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
One of the cluster worker nodes was powered off in vCenter.
Pods running on this node have been rescheduled on different nodes but got stuck in ContainerCreating. It failed to attach the volume on the new node with error "Multi-Attach error for volume pvc-xxx, Volume is already exclusively attached to one node and can't be attached to another" and hence the application running in the pod has no data available because the volume is not attached to the new node. The volume is still attached to powered off and no attempt is made to detach the volume by attach/detach controller. Hence attach the volume on the new node failed with error "Multi-Attach error". It's stuck for 6 minutes until attach/detach controller forcefully tried to detach the volume on the powered off node. After the end of 6 minutes when volume is detached on powered off node, the volume is now successfully attached on the new node and application has now the data available.
What you expected to happen:
Pod should be started correctly on a different node, with volumes attached to it very quickly. For an application to be down for 6 minutes is very bad.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Below are the logs from controller manager.
The issue is because of a recent change in kubernetes release 1.7 - #45346 which handles "Multi-Attach error for volume"
I think this is a very serious problems for us, as in case of vsphere volumes are not detached automatically when the node is powered off as in GCE or AWS.
@tusharnt @divyenpatel @luomiao @rohitjogvmw @pdhamdhere
@jingxu97 @saad-ali @msau42 @codablock: Can you please provide your inputs on this.
The text was updated successfully, but these errors were encountered: