-
Notifications
You must be signed in to change notification settings - Fork 42.1k
fix pod stuck in Terminating issue due to corrupt mnt point in flexvol plugin #75234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6b88c84 to
cbefeae
Compare
|
/assign |
|
thanks guys for the comments, let me hold this PR, and track the root case |
cbefeae to
cf69d84
Compare
|
PTAL, thanks. |
|
|
||
| if pathErr != nil && !mount.IsCorruptedMnt(pathErr) { | ||
| return fmt.Errorf("Error checking path: %v", pathErr) | ||
| if pathErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
if !pathExists & pathErr == nil {
return nil
}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then this function would depend on return value of PathExists, if it return <false, error>, it won't proceed, mainly I would like to cherry pick to old release, only for flexvol.TearDownAt func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, sorry my mistake. I corrected it.
only if it return (false, nil) which means path is definitely not exist , it will return immediately without any further actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jingxu97 logic of this PR is identical to this your suggested comment:
if pathErr != nil {
// only log warning here since plugins should anyways have to deal with errors
klog.Warningf("Error checking path: %v", pathErr)
} else {
if !pathExists {
klog.Warningf("Warning: Unmount skipped because path does not exist: %v", dir)
return nil
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the logic is the same. I am thinking to make it easier to read. And I think the other PR #75645 also uses the simpler way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jingxu97 shall we go with this PR first, and I would like to cherry pick this PR to old release, thanks.
|
/hold cancel |
|
/lgtm |
still needs |
|
/approve
…On Fri, Jun 7, 2019 at 6:20 PM Andy Zhang ***@***.***> wrote:
/lgtm
still needs /approve, and are we going to merge this into v1.15? @jingxu97
<https://github.com/jingxu97>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#75234?email_source=notifications&email_token=AESI3RNAELPMRYMHJEXCVYTPZMCORA5CNFSM4G42Q7X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXHKHHI#issuecomment-500081565>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AESI3RJKVH377RY3OJQCPZ3PZMCORANCNFSM4G42Q7XQ>
.
--
- Jing
|
|
/assign @saad-ali @chakri-nelluri @MikaelCluseau |
|
@andyzhangx: GitHub didn't allow me to assign the following users: MikaelCluseau. Note that only kubernetes members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
IIUC, before, Unmount would not be called if PathExists returns non-corrupted error. Now what this fix does is it always calls Unmount if PathExists returns any error, corrupted or not. Is that the intended behavior @andyzhangx? From the PR description, it sounds like you wanted some change in behavior for corrupted errors? |
I have changed the PR description, current reality is there is some unexpected error in flexvolume path, and in this PR, it would call Unmount if PathExists returns any error. @msau42 |
|
/approve Thanks for clarifying! Can you also fix the release note? |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx, jingxu97, msau42 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
done, thanks. |
|
@andyzhangx: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
…5234-upstream-release-1.12 Automated cherry pick of #75234: fix flexvol stuck issue due to corrupted mnt point
…5234-upstream-release-1.13 Automated cherry pick of #75234: fix flexvol stuck issue due to corrupted mnt point
…5234-upstream-release-1.14 Automated cherry pick of #75234: fix flexvol stuck issue due to corrupted mnt point
…5234-upstream-release-1.15 Automated cherry pick of #75234: fix flexvol stuck issue due to corrupted mnt point
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR fixed this issue:
pod could not be terminated when pod volume path is corrupted due to smb server return error status: resource temporarily unavailable, in this condition, pod will be in
Terminatingforever. In this PR, it would call Unmount if PathExists returns any error.Which issue(s) this PR fixes:
Fixes #75233
Special notes for your reviewer:
EAGAIN - APPLICATION_ERROR: "resource temporarily unavailable"This error happens when SMB server is unavailable due to network issue, password change, smb path change, etc. , in this PR, it would call Unmount if PathExists returns any error.
workaround for this issue
mount | grep cifsto get all cifs mountssudo ls -lt /var/lib/kubelet/pods/5c949781-4c6d-11e9-825d-000d3a0dd47b/volumes/microsoft.com~smb/test ls: cannot access '/var/lib/kubelet/pods/5c949781-4c6d-11e9-825d-000d3a0dd47b/volumes/microsoft.com~smb/test': Permission deniedls -lt ...failed, runAnd then the pod will be in terminated soon.
Does this PR introduce a user-facing change?:
/priority important-soon
/assign @jingxu97 @msau42 @davidz627