-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow attach of volumes to multiple nodes for vSphere #51066
Allow attach of volumes to multiple nodes for vSphere #51066
Conversation
Hi @BaluDontu. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/assign @saad-ali |
@@ -143,11 +143,20 @@ func (rc *reconciler) isMultiAttachForbidden(volumeSpec *volume.Spec) bool { | |||
volumeSpec.Volume.Cinder != nil { | |||
return true | |||
} | |||
// Check for volume types which do not fail when trying to multi-attach | |||
if volumeSpec.Volume.VsphereVolume != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary since the default behavior should return false
. I don't want this to become a list of every plugin that exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed it. Will remove it.
if volumeSpec.PersistentVolume.Spec.VsphereVolume != nil { | ||
return false | ||
} | ||
|
||
if len(volumeSpec.PersistentVolume.Spec.AccessModes) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be ok being more aggressive here and replacing the volumeSpec.PersistentVolume.Spec.AccessModes
altogether with a white list (as above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we want to check access modes for other cloud providers. Other cloud providers might use ReadWriteMany, ReadOnlyMany access mode for volumes. In this case attach/detach controller should allow for "multi-attach". Removing the entire check for access mode might cause a problem for other providers.
if len(volumeSpec.PersistentVolume.Spec.AccessModes) == 0 {
// No access mode specified so we don't know for sure. Let the attacher fail if needed
return false
}
// check if this volume is allowed to be attached to multiple PODs/nodes, if yes, return false
for _, accessMode := range volumeSpec.PersistentVolume.Spec.AccessModes {
if accessMode == v1.ReadWriteMany || accessMode == v1.ReadOnlyMany {
return false
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. Ok with leaving this as is for now, and modifying it is also an issue for other storage plugins.
/ok-to-test |
82ce983
to
cfdff1a
Compare
/lgtm Please modify the first comment on this page to include a release note. |
@wojtek-t for 1.7 cherry pick approval |
/retest |
1 similar comment
/retest |
/retest Review the full test history for this PR. |
/retest |
/lgtm. |
@jingxu97 : It took a minute or so for the kubernetes to recognize that the VM powered off. After that, it deletes all the pods on the powered off node (node1 in this case) and immediately starts the volume attach on new node (node3 in this case) after which the pod became available. It almost took around 40-50 secs to have pod available.
|
@jingxu97 : PR still says lgtm label is not present. Can you try it again? |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
…-upstream-release-1.7 Automatic merge from submit-queue Automated cherry pick of #51066 upstream release 1.7 Cherry pick of #51066 on release-1.7: #51066: Allow attach of volumes to multiple nodes for vSphere @rohitjogvmw @divyenpatel @tusharnt @luomiao ```release-note vSphere: Allow attach of volumes to multiple nodes for vSphere. ```
Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
Automatic merge from submit-queue (batch tested with PRs 67745, 67432, 67569, 67825, 67943). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. Fix VMWare VM freezing bug by reverting #51066 **What this PR does / why we need it**: kube-controller-manager, VSphere specific: When the controller tries to attach a Volume to Node A that is already attached to Node B, Node A freezes until the volume is attached. Kubernetes continues to try to attach the volume as it thinks that it's 'multi-attachable' when it's not. #51066 is the culprit. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes vmware-archive#500 / vmware-archive#502 (same issue) **Special notes for your reviewer**: - Repro: Vsphere installation, any k8s version from 1.8 and above, pod with attached PV/PVC/VMDK: 1. cordon the node which the pod is in 2. `kubectl delete po/[pod] --force --grace-period=0` 3. the pod is immediately rescheduled to a new node. Grab the new node from a `kubectl describe [pod]` and attempt to Ping it or SSH into it. 4. you can see that pings/ssh fail to reach the new node. `kubectl get node` shows it as 'NotReady'. New node is frozen until the volume is attached - usually 1 minute freeze for 1 volume in a low-load cluster, and many minutes more with higher loads and more volumes involved. - Patch verification: Tested a custom patched 1.9.10 kube-controller-manager with #51066 reverted and the above bug is resolved - can't repro it anymore. New node doesn't freeze at all, and attaching happens quite quickly, in a few seconds. **Release note**: ``` Fix VSphere VM Freezing bug by reverting #51066 ```
When this change was applied/discussed, what vsphere integration expected to do in the "normal" case? when pod is restarted on another node, this changes makes volume on a new node to be attached first, before detach on a previous node is executed. We see kernel panics where volume is attached before detaching it on another node, so I am asking to find out if there is a bug lurking elsewhere even though this particular change was reverted. Looking at the discussion, it seems that normal case would be handled by vsphere cloud provider, but what exactly is expected behaviour here? Should it report error when asked to attach volume without attempting to attach if volume is attached to a healthy node? Should it detach as part of attach? |
To clarify this for future users who may stumble on this: The change made here got reverted in #67825. The Kubernetes versions that do not include the change introduced here anymore are:
|
This is a fix for issue #50944 which doesn't allow a volume to be attached to a new node after the node is powered off where the volume was previously attached.
Current behaviour:
One of the cluster worker nodes was powered off in vCenter.
Pods running on this node have been rescheduled on different nodes but got stuck in ContainerCreating. It failed to attach the volume on the new node with error "Multi-Attach error for volume pvc-xxx, Volume is already exclusively attached to one node and can't be attached to another" and hence the application running in the pod has no data available because the volume is not attached to the new node. Since the volume is still attached to powered off node, any attempt to attach the volume on the new node failed with error "Multi-Attach error". It's stuck for 6 minutes until attach/detach controller forcefully tried to detach the volume on the powered off node. After the end of 6 minutes when volume is detached on powered off node, the volume is now successfully attached on the new node and application has now the data available.
What is expected to happen:
I would want the attach/detach controller to go ahead with the attach of the volume on new node where the pod got provisioned instead of waiting for the volume to be detached on the powered off node. It is ok to eventually delete the volume on the powered off node after 6 minutes. This way the application downtime is low and pods are up as soon as possible.
The current fix ignore, vSphere volumes/persistent volume to check for multi-attach scenario in attach/detach controller.
@jingxu97 @saad-ali : Can you please take a look at it.
@tusharnt @divyenpatel @rohitjogvmw @luomiao