Skip to content

fix race condition issue for smb mount on windows #75371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 18, 2019

Conversation

andyzhangx
Copy link
Member

@andyzhangx andyzhangx commented Mar 14, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
This PR brings the lock when do smb mounting on the node with same target source, otherwise there would be race condition as user already hits.

Which issue(s) this PR fixes:

Fixes #75180

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

fix race condition issue for smb mount on windows

/kind bug
/assign @jingxu97 @msau42 @PatrickLang @michmike @craiglpeters
/priority critical-urgent
/sig windows
/test pull-kubernetes-cross

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 14, 2019
@andyzhangx
Copy link
Member Author

/test pull-kubernetes-cross

@michmike
Copy link
Contributor

@andyzhangx is this something we need for 1.14? can you talk a little bit about how a user could hit this issue (under what conditions)?
cc: @craiglpeters

@andyzhangx
Copy link
Member Author

hi @michmike I could easily repro this issue by "kubectl create -f https://raw.githubusercontent.com/andyzhangx/demo/master/windows/azurefile/aspnet-azurefile-multiple-replicas.yaml", the error is like following:

Events:
  Type     Reason       Age    From                 Message
  ----     ------       ----   ----                 -------
  Normal   Scheduled    2m28s  default-scheduler    Successfully assigned default/deployment-azurefile-7c8599c8c8-4fbbn to 1426k8s001
  Warning  FailedMount  2m17s  kubelet, 1426k8s001  MountVolume.SetUp failed for volume "pvc-b731c131-46d2-11e9-ba1f-000d3aa28f1b" : Remove-SmbGlobalMapping failed: exit status 1, output: "Remove-SmbGlobalMapping : No MSFT_SmbGlobalMapping objects found with property 'RemotePath' equal to \r\n'\\\\fa33e7109465b11e98xxxx.file.core.windows.net\\andy-win1134-dynamic-pvc-b731c131-46d2-11e9-ba1f-000d3aa28f1b'.  \r\nVerify the value of the property and retry.\r\nAt line:1 char:1\r\n+ Remove-SmbGlobalMapping -RemotePath $Env:smbremotepath -Force\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : ObjectNotFound: (\\\\fa33e7109465b...1f-000d3xxx:String) [Remove-SmbGlobalMapping], Ci \r\n   mJobException\r\n    + FullyQualifiedErrorId : CmdletizationQuery_NotFound_RemotePath,Remove-SmbGlobalMapping\r\n \r\n"
  Warning  FailedMount  2m12s  kubelet, 1426k8s001  MountVolume.SetUp failed for volume "pvc-b731c131-46d2-11e9-ba1f-000d3aa28f1b" : Remove-SmbGlobalMapping failed: exit status 1, output: "Remove-SmbGlobalMapping : This network connection does not exist. \r\nAt line:1 char:1\r\n+ Remove-SmbGlobalMapping -RemotePath $Env:smbremotepath -Force\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : NotSpecified: (MSFT_SmbGlobalM...le.core.win...):ROOT/Microsoft/...mbGlobalMapping) [Rem \r\n   ove-SmbGlobalMapping], CimException\r\n    + FullyQualifiedErrorId : Windows System Error 2250,Remove-SmbGlobalMapping\r\n \r\n"
  Normal   Pulling      96s    kubelet, 1426k8s001  pulling image "mcr.microsoft.com/dotnet/framework/aspnet:4.7.2-windowsservercore-ltsc2019"

And I have verified this PR has fixed this issue. It's better make it into v1.14 if possible.
/test pull-kubernetes-cross

cc @harshaperera532

@andyzhangx
Copy link
Member Author

/test pull-kubernetes-cross

@michmike
Copy link
Contributor

i pinged sig-storage
cc: @kubernetes/sig-storage-pr-reviews

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Mar 15, 2019
@jsafrane
Copy link
Member

I can't say anything about Windows, but the code looks OK.
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 15, 2019
@michmike
Copy link
Contributor

/milestone v1.14

@k8s-ci-robot k8s-ci-robot added this to the v1.14 milestone Mar 15, 2019
@jingxu97
Copy link
Contributor

jingxu97 commented Mar 15, 2019

I have a few questions

  1. Why we hit this issue in 1.13? Do we have the same issue before?

  2. So in windows, there will be race condition if user tries to mount with the same target? But according to your example, isn't the mount target different under different pod volume directory?

  3. How often you hit this problem, I tried the similar deployment, but could not reproduce it.

@andyzhangx
Copy link
Member Author

Why we hit this issue in 1.13? Do we have the same issue before?

I introduced a fix for remount same smb path issue on windows: #73661, that fix introduces this issue, it's not a regression btw since original remount on same smb path does not work.

So in windows, there will be race condition if user tries to mount with the same target? But according to your example, isn't the mount target different under different pod volume directory?

I used 6 replicas in my example, though those pod has different mount dirs, while they point to same smb path, that will cause race condition.

How often you hit this problem, I tried the similar deployment, but could not reproduce it.

If you use same smb path with multiple pods in parallel, I think it could reproduce easily.

@jingxu97 Let me explain a little more here. On Linux, every smb mount is independent even mount with same smb path, so there won't be race condition, while on Windows, we are using SMB Global Mapping, same smb path shares the smb mount, so we need to add lock for this entry.

@xmudrii
Copy link
Member

xmudrii commented Mar 18, 2019

@andyzhangx Is this PR still planned to be merged in time for 1.14? I'd like to remind that Code Thaw is starting tomorrow (Tuesday PST EOD), so it would be nice to merge this PR by then, otherwise it'll compete with herd of incoming PRs post-thaw.

@andyzhangx
Copy link
Member Author

@xmudrii yes, it should be merged into v1.14
could someone lgtm, thanks. @jingxu97 @msau42

@michmike
Copy link
Contributor

@neolit123 can you please take a look as well?

@spiffxp
Copy link
Member

spiffxp commented Mar 18, 2019

/lgtm
/hold
Before removing the hold I'd like to hear back from @jingxu97 to understand if their questions have been sufficiently answered

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Mar 18, 2019
@jingxu97
Copy link
Contributor

/lgtm

@spiffxp
Copy link
Member

spiffxp commented Mar 18, 2019

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2019
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michmike
Copy link
Contributor

thank you all for a quick review and approval!

@michmike
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot merged commit a4f2590 into kubernetes:master Mar 18, 2019
k8s-ci-robot added a commit that referenced this pull request Apr 18, 2019
…5371-upstream-release-1.12

Automated cherry pick of #75371: fix race condition issue for smb mount on windows
k8s-ci-robot added a commit that referenced this pull request Apr 30, 2019
…5371-upstream-release-1.13

Automated cherry pick of #75371: fix race condition issue for smb mount on windows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure to mount/remount Azure files
8 participants