Mark volume as detached when node does not exist for vsphere #50281

FengyunPan · 2017-08-08T03:07:44Z

If node does not exist, node's volumes will be detached
automatically and become available. So mark them detached and
return false without error.
Fix #50266

Special notes for your reviewer:
/assign @jingxu97

Release note:

NONE

k8s-ci-robot · 2017-08-08T03:07:51Z

Hi @FengyunPan. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

FengyunPan · 2017-08-08T03:09:29Z

/assign @luomiao
/assign @kerneltime

divyenpatel · 2017-08-08T19:06:10Z

pkg/cloudprovider/providers/vsphere/vsphere.go

During host failover when node VM goes to disconnected state or is not accessible for any reason we are returning a Map of volumepath and its attachment status set to false. This will be misinterpreted as disks are already detached from the node and Kubernetes mark volumes as detached after orphaned pod is cleaned up.

This can causes volumes to remain attached to previous node, and new pod creation on another node will remains in the “containercreating” state. Since both the node vms are powered on and hot volumes can not be attached to other node.

We need to test this PR change for vSphere HA failover.

@BaluDontu @luomiao @tusharnt @rohitjogvmw

luomiao · 2017-08-08T20:19:59Z

@FengyunPan Thanks for coming up this PR regarding to the detached volumes.
However, I think this will revert a previous PR to fix another HA issue: #45569
As @divyenpatel has mentioned, directly setting the status of volumes to detached will cause a race condition when the previous dead node is restarted. More explanation can be found in that PR.

Edit:
sorry I missed that this is regarding to an issue when the instance is deleted from cloud provider.
In this case we will need a fix to handle both of the two situations: when a instance is deleted from cloud provider and when a VM is dead/restarted in vCenter. This PR will handle the first situation but will have a problem with the second situation.

jingxu97 · 2017-08-08T20:37:56Z

@luomiao Could you confirm that when a node is deleted, will the volume be detached automatically? For all other cloud providers such as GCE PD, AWS, that is the behavior. So the function DiskIsAttached() in cloud provider should return false (not attached) without error.

luomiao · 2017-08-08T21:44:57Z

@jingxu97 I feel there are two different issues are mixed here :) but the short answer is no, volumes are not detached automatically.

What @FengyunPan 's PR has changed is not when the node is deleted. Instead, it's when the node is powered off but still exists in vCenter. In this case, volumes are still attached to the VM. Thus we cannot simply return false (means disk is not attached). The variable and function names are misleading here. Instead of nodeExist, it should be nodeActive.

Regarding to your question, it also depends on how you define "deleting".

If "deleting" means "removing the node from kubernetes cluster":
Then the node is till alive in vCenter. So the volume is still attached to the VM.
If "deleting" means "removing the node from vCenter":
Removing node from vCenter involves two steps: power off the node and then delete the node.
When the node is powered-off but not deleted yet, the volumes are attached to the VM.

@divyenpatel would you please confirm if the above understanding is correct or not?

jingxu97 · 2017-08-08T21:53:24Z

@luomiao By "deleted" I mean deleted from vCenter, similar to the case of node instance is deleted from cloud provider.

The code nodeExist, err := vs.NodeExists(nodeName) what does it mean? I thought it means the node is deleted from vcenter already?

luomiao · 2017-08-08T22:12:01Z

@jingxu97
The current NodeExists is misleading.
It could return nodeExist=false and err=nil, when the node is powered-off but not deleted.
Now I think it's not a problem with this PR but a problem with the NodeExists function.
Please allow us some time to do a few testings to confirm :)
The change to NodeExists function might be required for this PR.

luomiao · 2017-08-08T22:40:57Z

@jingxu97
Just did some double check. Sorry to take back my words.
NodeExists with a return value nodeExist=false means the node is not in an active status (powered off, disconnected, not found...).
Nodes can be in a non-active state (powered off, disconnected) while volumes are still attached.
In the case of vsphere cloud provider, deleting the node from cloud provider can make the nodes into any of the non-active status (powered off, disconnected, not found...).
But this PR could mark volumes as detached even the volumes might still be attached. Thus it's not correct.

jingxu97 · 2017-08-08T22:51:12Z

Is there a way to identify whether the node is deleted from vsphere cloud provider. If no, could we add one?

FengyunPan · 2017-08-09T01:58:12Z

The variable and function names are misleading here. Instead of nodeExist, it should be nodeActive.
NodeExists with a return value nodeExist=false means the node is not in an active status (powered off, disconnected, not found...).

I see, @luomiao thank for your comments.

Is there a way to identify whether the node is deleted from vsphere cloud provider. If no, could we add one?

I think it is necessary for attachDetachManager, we should add one.

luomiao · 2017-08-09T20:18:50Z

@FengyunPan @jingxu97
Sorry for the delayed reply.
The current PR looks good to me.

We will submit another PR to fix NodeExists (NodeActive) function to return InstanceNotFound error when a node is not existing anymore.

/lgtm

dims · 2017-08-09T20:20:47Z

/ok-to-test

jingxu97 · 2017-08-09T20:23:36Z

@luomiao should we wait until your fix is merged to merge this PR? Thanks!

FengyunPan · 2017-08-10T03:59:24Z

@luomiao @divyenpatel @jingxu97 I have add codes to check if node exists and rebase the PR ( #49164), Please take a look.

FengyunPan · 2017-08-10T05:23:40Z

/retest

divyenpatel · 2017-08-10T17:24:55Z

pkg/cloudprovider/providers/vsphere/vclib/virtualmachine.go

Not related to the change made in the PR, but just came across this unwanted code block.
I think this is not required.

if vmMoList[0].Summary.Config.Template == false { glog.Warningf("VM is not in %s state", ActivePowerState) } else { glog.Warningf("VM is a template") }

I think this change of Exists() is helpful for reader, is right?
Yeah, that is not required.

Yes IsActive is helpful for reader.

divyenpatel · 2017-08-10T17:33:27Z

pkg/cloudprovider/providers/vsphere/vsphere.go

Returned error should be different. When the Node VM is powered off, we can still determine whether disks are attached to powered off VM or not. Current error message does not look good in this case.

Yes, I agree, there is no need to determine wherher node is powered off in DisksAreAttached() and DisksisAttached().

divyenpatel · 2017-08-10T17:47:43Z

pkg/cloudprovider/providers/vsphere/vsphere.go

Same as commented for DisksAreAttached. When node VM is powered off, we can still check if disks are attached to the Node VM, and return actual state of the disk.

divyenpatel

Provided my review comments.

divyenpatel

Changes looks good to me.

divyenpatel · 2017-08-11T17:28:31Z

pkg/cloudprovider/providers/vsphere/vclib/virtualmachine.go

Yes IsActive is helpful for reader.

BaluDontu · 2017-08-13T23:59:12Z

pkg/cloudprovider/providers/vsphere/vsphere.go

The current log statement is very confusing as when the node is not found, disks will not detached automatically.
Can you rephrase this info log to
glog.Infof("Node %q does not exist, vsphere CP will assume all disks %v are not attached to it.", nodeName, volPaths).
Same at other places as well.

Ok, thank for your comment.

If node does not exist, node's volumes will be detached automatically and become available. So mark them detached and return false without error. Fix kubernetes#50266

BaluDontu · 2017-08-14T20:35:28Z

pkg/cloudprovider/providers/vsphere/vsphere.go

 			glog.Errorf("Failed to get VM object for node: %q. err: +%v", vSphereInstance, err)
 			return nil, err
 		}
-		nodeExist, err := vm.Exists(ctx)


I have one more question. If node is powered off and the pod is placed on to other available nodes, will the volumes attached to powered off node detach automatically?

What logic in here, would cause it to happen?

Its because i have one PR which does this #40118. Throwing an error when node is powered off caused the volume to be detached automatically.

Doesn't it matter if the node is powered off or not when the DisksAreAttached() call is made.

@bacongobbler Have you see the comment from @divyenpatel:"Same as commented for DisksAreAttached. When node VM is powered off, we can still check if disks are attached to the Node VM, and return actual state of the disk."
It seem it doesn't matter if the node is powered off or not when the DisksAreAttached() call is made.
On the other hand, the "vm.Exists(ctx)" just check if vm is power on, can't check if vm is exist.

FengyunPan · 2017-08-17T09:07:57Z

@dims @jingxu97 @kerneltime @luomiao @BaluDontu - Any thoughts on it?

BaluDontu · 2017-08-17T22:29:34Z

The change looks good to me. /lgtm from my side.

luomiao · 2017-08-17T22:31:12Z

sorry for the late reply.
I thought this PR is already merged...
/lgtm
/approve

k8s-github-robot · 2017-08-17T22:31:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FengyunPan, luomiao

Associated issue: 50266

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these OWNERS Files:

~~pkg/cloudprovider/providers/vsphere/OWNERS~~ [luomiao]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-08-17T22:38:15Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2017-08-17T23:33:26Z

Automatic merge from submit-queue

k8s-ci-robot assigned jingxu97 Aug 8, 2017

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 8, 2017

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 8, 2017

k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 8, 2017

k8s-ci-robot assigned kerneltime and luomiao Aug 8, 2017

divyenpatel reviewed Aug 8, 2017

View reviewed changes

FengyunPan force-pushed the mark-detached-vsphere branch from 2d9952a to 0f1a34e Compare August 9, 2017 02:41

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 9, 2017

FengyunPan force-pushed the mark-detached-vsphere branch 3 times, most recently from e0a4030 to 4b0fca1 Compare August 9, 2017 03:14

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 9, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 9, 2017

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 9, 2017

FengyunPan force-pushed the mark-detached-vsphere branch from 48a5c58 to 05a748b Compare August 10, 2017 04:56

divyenpatel reviewed Aug 10, 2017

View reviewed changes

divyenpatel suggested changes Aug 10, 2017

View reviewed changes

FengyunPan force-pushed the mark-detached-vsphere branch from 05a748b to 8e8da99 Compare August 11, 2017 02:26

divyenpatel approved these changes Aug 11, 2017

View reviewed changes

pkg/cloudprovider/providers/vsphere/vclib/virtualmachine.go Outdated

Copy link

Member

divyenpatel Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes IsActive is helpful for reader.

BaluDontu reviewed Aug 13, 2017

View reviewed changes

FengyunPan force-pushed the mark-detached-vsphere branch from 8e8da99 to 0927121 Compare August 14, 2017 01:54

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 14, 2017

FengyunPan added 2 commits August 14, 2017 10:09

Mark volume as detached when node does not exist for vsphere

b85743b

If node does not exist, node's volumes will be detached automatically and become available. So mark them detached and return false without error. Fix kubernetes#50266

[VSphere] Don't return err when node doesn't exist in DetachDisk()

ea32f06

FengyunPan force-pushed the mark-detached-vsphere branch from 0927121 to ea32f06 Compare August 14, 2017 02:16

k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 14, 2017

BaluDontu reviewed Aug 14, 2017

View reviewed changes

BaluDontu mentioned this pull request Aug 14, 2017

Guideline for Checking Volume attached/detached Status through Cloudprovider #50645

Closed

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2017

k8s-github-robot merged commit 7c13d65 into kubernetes:master Aug 17, 2017

Mark volume as detached when node does not exist for vsphere #50281

Mark volume as detached when node does not exist for vsphere #50281

Uh oh!

Conversation

FengyunPan commented Aug 8, 2017

Uh oh!

k8s-ci-robot commented Aug 8, 2017

Uh oh!

FengyunPan commented Aug 8, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luomiao commented Aug 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jingxu97 commented Aug 8, 2017

Uh oh!

luomiao commented Aug 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jingxu97 commented Aug 8, 2017

Uh oh!

luomiao commented Aug 8, 2017

Uh oh!

luomiao commented Aug 8, 2017

Uh oh!

jingxu97 commented Aug 8, 2017

Uh oh!

FengyunPan commented Aug 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luomiao commented Aug 9, 2017

Uh oh!

dims commented Aug 9, 2017

Uh oh!

jingxu97 commented Aug 9, 2017

Uh oh!

FengyunPan commented Aug 10, 2017

Uh oh!

FengyunPan commented Aug 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FengyunPan Aug 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divyenpatel left a comment

Choose a reason for hiding this comment

Uh oh!

divyenpatel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BaluDontu Aug 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FengyunPan commented Aug 17, 2017

Uh oh!

BaluDontu commented Aug 17, 2017

Uh oh!

luomiao commented Aug 17, 2017

luomiao commented Aug 8, 2017 •

edited

Loading

luomiao commented Aug 8, 2017 •

edited

Loading

FengyunPan commented Aug 9, 2017 •

edited

Loading

FengyunPan Aug 11, 2017 •

edited

Loading

BaluDontu Aug 13, 2017 •

edited

Loading