Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttachDisk should not call detach inside of Cinder volume provider #50042

Merged
merged 1 commit into from
Aug 7, 2017

Conversation

jingxu97
Copy link
Contributor

@jingxu97 jingxu97 commented Aug 2, 2017

This PR fixes #50038 which removes the detach call inside of AttachDisk.

Updates Cinder AttachDisk operation to be more reliable by delegating Detaches to volume manager.

This PR fixes kubernetes#50038 which removes the detach call inside of AttachDisk.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 2, 2017
@jingxu97 jingxu97 requested review from gnufied and saad-ali August 2, 2017 21:38
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Aug 2, 2017
return "", err
}
errmsg := fmt.Sprintf("Disk %s is attached to a different instance (%s)", volumeID, volume.AttachedServerId)
glog.V(2).Infof(errmsg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we preserve/log the original error too? something like

glog.V(2).Infof("%s : %s", errmsg, err.Error())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep the log message with glog.V(2).Infof(errmsg) right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. never mind, i read this wrong, i think i was thinking about the err from DetachDisk and you have already removed that call :)

@dims
Copy link
Member

dims commented Aug 2, 2017

/retest
/approve

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 2, 2017
@zetaab
Copy link
Member

zetaab commented Aug 3, 2017

I do not fully agree with this PR. We had lots of problems without this detachdisk. If you delete this detachdisk from code, it means that if you have pod running in server1, server1 is shutdowned -> the pod cannot move to server2 without someone going and release the drive manually

please test it carefully before removing things like this. In bad case you are breaking someone production because the failover does not work anymore like it worked before.

@dims
Copy link
Member

dims commented Aug 3, 2017

/approve cancel

@k8s-github-robot k8s-github-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 3, 2017
@dims
Copy link
Member

dims commented Aug 3, 2017

Let's get @anguslees 's opinion here too

@jamiehannaford
Copy link
Contributor

@zetaab shouldn't the attach/detach controller in the controller manager be handling detach logic? I'm confused why you have to manually detach. Are you running an older version of the control plane?

@zetaab
Copy link
Member

zetaab commented Aug 3, 2017

well we have 1.7, but i have not tested it without that detachdisk line. We have another environment which is using openshift 3.5 (kubernetes 1.5) and we have to detach drives manually once in a week. Anyways I would say that another pr made by @jamiehannaford looks better than just removing this line(how well it is tested?).

@jingxu97
Copy link
Contributor Author

jingxu97 commented Aug 3, 2017

The logic of AttachDisk in all other attachable volume plugins (e.g., GCE PD, AWS EBS) is that if it is already attached to a different instance, it just returns error. Kubernetes attach_detach controller decides when is safe to detach and takes the action. Before detaching, we have to make sure the the volume is already unmounted, otherwise it will cause data corruption.
It often happens that a pod is killed due to some reason and gets recreated in a different node. The sequence should be unmount volume, detach volume from the old node, and then attach to the new node. But it is very possible that attach requests comes in first, and if the attach function call detach directly, it will cause problem as described above.
We have the tests for GCE PD for this workflow. If there is no such test cases for Cinder, then we need add one.

@jamiehannaford
Copy link
Contributor

I agree with @jingxu97. I didn't know about this PR when I submitted #50089, so I can close it in favour of this one.

BTW I've tested my PR on a live Cinder cluster and it fixes the bug we were seeing. Since our 2 PRs do the same thing, I'm pretty sure this will work too.

@dims
Copy link
Member

dims commented Aug 3, 2017

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, jingxu97

Associated issue: 50038

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. labels Aug 3, 2017
@dims
Copy link
Member

dims commented Aug 3, 2017

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Aug 3, 2017
@dims
Copy link
Member

dims commented Aug 3, 2017

Dang it! we need someone to clear the "do-not-merge" label :(

@jamiehannaford
Copy link
Contributor

@dims Just wondering, how does that usually work? Do we need to ping someone?

@wojtek-t wojtek-t removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Aug 7, 2017
@dims
Copy link
Member

dims commented Aug 7, 2017

/retest

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 50087, 39587, 50042, 50241, 49914)

@k8s-github-robot k8s-github-robot merged commit d6cb482 into kubernetes:master Aug 7, 2017
@jamiehannaford
Copy link
Contributor

@wojtek-t can this be cherry-picked for v1.7.4?

@k8s-cherrypick-bot
Copy link

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@wojtek-t
Copy link
Member

wojtek-t commented Aug 8, 2017

@jingxu97 @jamiehannaford - if you want this to be cherrypicked, please provide a release note

@jamiehannaford
Copy link
Contributor

@jingxu97 @wojtek-t Ah okay. How about:

Updates Cinder AttachDisk operation to be more reliable by delegating Detaches to volume manager.

@wojtek-t
Copy link
Member

wojtek-t commented Aug 8, 2017

SGTM

@wenlxie
Copy link
Contributor

wenlxie commented Aug 10, 2017

When do the volume attach and find the volume had been attached to another node, is it better to to check whether the volume should be detached or not? then we can decide to detach the volume or just return error. The previous code is just do detach in this situation and now this PR changed to not do the detach. If we can check the volume should be detached or not, that would be better and can solve many issues like volume not get detached

@jamiehannaford
Copy link
Contributor

@wenlxie What do you mean? Usually this is handled by the attach/detach controller, right?

@wojtek-t wojtek-t added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 11, 2017
@wojtek-t
Copy link
Member

Cherrypick approved and created.

@wojtek-t wojtek-t added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 11, 2017
k8s-github-robot pushed a commit that referenced this pull request Aug 12, 2017
…42-upstream-release-1.7

Automatic merge from submit-queue

Automated cherry pick of #50042 upstream release 1.7 

Cherry pick of #50042 on release-1.7.
#50042: AttachDisk should not call detach inside of Cinder volume provider
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

dims pushed a commit to dims/kubernetes that referenced this pull request Feb 8, 2018
Automatic merge from submit-queue (batch tested with PRs 50087, 39587, 50042, 50241, 49914)

AttachDisk should not call detach inside of Cinder volume provider
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cinder cloud provider's AttachDisk should not issue detach inside of it.
10 participants