Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use local PX endpoint for mount, unmount, detach and attach calls #48898

Merged
merged 3 commits into from
Jul 17, 2017

Conversation

harsh-px
Copy link
Contributor

@harsh-px harsh-px commented Jul 13, 2017

What this PR does / why we need it:
This PR fixes an issue with Setup and TearDown of Portworx volumes which has side-effects such a Pod using a Portworx volume not being able to start on the minion.

Which issue this PR fixes: fixes #49034
This PR addresses an issue that fails to mount, attach, unmount or detach a volume when Kubernetes sends these requests to Portworx when it's API server on that particular minion is down.

Portworx mount, unmount, attach and detach requests need to be received on the minion where the pod is running. So these calls need to talk to the Portworx API server running locally on that node (and NOT to the Portworx k8s service since it may route the request to any node in the cluster). This PR explicitly makes such requests local only.

Release note:

Fix Pods using Portworx volumes getting stuck in ContainerCreating phase.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 13, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @harsh-px. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 13, 2017
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Jul 13, 2017
@harsh-px harsh-px force-pushed the fix-px-volume-calls branch from 76a8c74 to a0d6aa1 Compare July 13, 2017 19:56
@harsh-px
Copy link
Contributor Author

/assign @gnufied

Copy link
Member

@gnufied gnufied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits - mostly lgtm

@@ -266,7 +266,7 @@ func (b *portworxVolumeMounter) SetUp(fsGroup *int64) error {
// SetUpAt attaches the disk and bind mounts to the volume path.
func (b *portworxVolumeMounter) SetUpAt(dir string, fsGroup *int64) error {
notMnt, err := b.mounter.IsLikelyNotMountPoint(dir)
glog.V(4).Infof("Portworx Volume set up: %s %v %v", dir, !notMnt, err)
glog.Infof("Portworx Volume set up. Dir: %s %v %v", dir, !notMnt, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to log these messages always in the log files? Looks to me that, older logging priority was more accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have found tracking down events for volume setup and teardown very important in debugging field issues and hence I've changed the logging level. SetupAt and TearDownAt get invoked only when pods get scheduled on a new node (new pod or existing pod getting re-scheduled). So these logs entries won't be very chatty.

return volumeID, requestGB, nil, err
}

// DeleteVolume deletes a Portworx volume
func (util *PortworxVolumeUtil) DeleteVolume(d *portworxVolumeDeleter) error {
driver, err := util.getPortworxDriver(d.plugin.host)
driver, err := util.getPortworxDriver(d.plugin.host, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere we add a naked true or false argument - we should also add a comment of the form true /*localOnly*/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in incremental diff

@@ -181,13 +186,24 @@ func createDriverClient(hostname string) (*osdclient.Client, error) {
}
}

func (util *PortworxVolumeUtil) getPortworxDriver(volumeHost volume.VolumeHost) (volumeapi.VolumeDriver, error) {
func (util *PortworxVolumeUtil) getPortworxDriver(volumeHost volume.VolumeHost, localOnly bool) (volumeapi.VolumeDriver, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document - what that localOnly flag does? Why certain calls supply this as true while others as false ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a function doc in the incremental

Copy link
Member

@gnufied gnufied Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the comment - and I still don't understand why CreateDisk and DeleteDisk pass this flag as false while others as true.

Copy link
Member

@gnufied gnufied Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to link to some external docs if the context is large enough to not fit in a code comment. We frequently do that in AWS and other volume plugins (or we should be doing that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see if the update comment gives more context. I have given a high-level context on why create and delete calls can use localOnly as false while other's cannot.

Currently, we don't have external docs describing our internal algorithms (most of them are proprietary.)

if err != nil {
return nil, err
} else {
glog.Infof("Using portworx local service at: %v as api endpoint", volumeHost.GetHostName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO - this log message should only appear if verbosity is set to a lower level, not always.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to glog.V(4) in the incremental

@gnufied
Copy link
Member

gnufied commented Jul 14, 2017

@k8s-bot ok to test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 14, 2017
@harsh-px
Copy link
Contributor Author

Re-run of the failing test succeeded.

@harsh-px harsh-px force-pushed the fix-px-volume-calls branch from 8e4f1f8 to 90919e3 Compare July 14, 2017 20:20
@gnufied
Copy link
Member

gnufied commented Jul 14, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 14, 2017
@harsh-px
Copy link
Contributor Author

@gnufied Thanks for reviewing this. I am guessing you are the only who adds the approved label?

@gnufied
Copy link
Member

gnufied commented Jul 17, 2017

/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gnufied, harsh-px

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@gnufied
Copy link
Member

gnufied commented Jul 17, 2017

@harsh-px can you open a github issue and link this PR there?

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 17, 2017
@harsh-px
Copy link
Contributor Author

@gnufied Created issue and linked here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 48997, 48595, 48898, 48711, 48972)

@k8s-github-robot k8s-github-robot merged commit 2c1c33d into kubernetes:master Jul 17, 2017
@harsh-px
Copy link
Contributor Author

@gnufied I need to cherry-pick this into the next 1.7 and 1.6 release. However, the cherry-pick PR would need the milestone label set for these releases in this PR.

What is the process for getting these milestone labels?

@gnufied
Copy link
Member

gnufied commented Jul 17, 2017

@harsh-px you open the PRs against the branches. Since change here is mostly a bug fix, it would be cherry picked by release manager of those branches.

@harsh-px
Copy link
Contributor Author

/release-note Fixes an issue where a pod using a Portworx volume gets stuck in ContainerCreating phase.

@enisoc enisoc added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jul 18, 2017
wojtek-t added a commit that referenced this pull request Jul 19, 2017
…98-upstream-release-1.7

Automated cherry pick of #48898 upstream release 1.7
k8s-github-robot pushed a commit that referenced this pull request Jul 19, 2017
…98-upstream-release-1.6

Automatic merge from submit-queue

Automated cherry pick of #48898 upstream release 1.6

**What this PR does / why we need it**:
This PR fixes an issue with Setup and TearDown of Portworx volumes which has side-effects such a Pod using a Portworx volume not being able to start on the minion.

**Which issue this PR fixes**:
This PR addresses an issue that fails to mount, attach, unmount or detach a volume when Kubernetes sends these requests to Portworx when it's API server on that particular minion is down. 

Portworx mount, unmount, attach and detach requests need to be received on the minion where the pod is running. So these calls need to talk to the Portworx API server running locally on that node (and NOT to the Portworx k8s service since it may route the request to any node in the cluster). This PR explicitly makes such requests local only.

Cherrypick of #48898
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A pod using a Portworx volume driver PVC gets stuck in ContainerCreating phase
6 participants