-
Notifications
You must be signed in to change notification settings - Fork 42.1k
fix: change default timeout value in csi plugin #79529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @davidz627 |
| // for consistency. | ||
| csiAddrTemplate = "/var/lib/kubelet/plugins/%v/csi.sock" | ||
| csiTimeout = 15 * time.Second | ||
| csiTimeout = 2 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what is the best default value. If the csiTimeout can be configurable like other csi sidecar do that will be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hoyho yes, but now I would like cherry-pick this fix to old release from 1.13 to 1.15, 15s is really too small, it makes the timeout a lot. And when user switch to use CSI driver, they won't be happy to see that timeout issue every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with you. For our CSI driver, 15s is not enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also meet some issues caused by the default time.
LGTM
|
The same timeout is used for |
|
Also keep in mind that the CSI driver should be idempotent and handle another call coming in on the same volume even if one is already in progress. |
I think currently let's change the default timeout value to 2min, for other operations, like Mount, 15s is still too short if there is a large disk to format, 2min is much safer. If we found we need different timeouts in the future, then let's split them up at that time. |
|
/lgtm |
cwdsuzhou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also cherrypick it to 1.12 1.11
| // for consistency. | ||
| csiAddrTemplate = "/var/lib/kubelet/plugins/%v/csi.sock" | ||
| csiTimeout = 15 * time.Second | ||
| csiTimeout = 2 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also meet some issues caused by the default time.
LGTM
1.12, 1.11 is already code freezed, so the latest cherry pick version should be 1.13. @cwdsuzhou |
|
I must miss something. The linked issue mentions Attach/WaitForAttach timeouts. Taking Attach as example, each 15s waiting for attachment (=121.5s total) If the volume is attached in first 48.5 seconds, it takes only 2 second max for the controller to notice a volume is attached (that's the max sleep in 3 iterations). The real difference is the last sleep. If a volume gets attached 90.5s after start, the controller waits for extra 16 seconds to notice it. Is it really the case we are trying to fix here? What storage backend takes 90s to attach a volume?
|
Hi @jsafrane we are talking about the |
|
I have sent a PR #75963 in the past to expose the maxBackoff time which may help decrease the time |
Actually timeout value in the built-in driver is 2min (per kubernetes/pkg/kubelet/volumemanager/volume_manager.go Lines 65 to 74 in b812eaa
|
|
So the only problem here is a timeout error visible in |
Yes |
|
Oh, I expected there is some real issue :-) /approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx, jsafrane The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks, this will make some users nervous if first attach volume failed every time. |
…9529-upstream-release-1.15 Automated cherry pick of #79529: fix: change timeout value in csi plugin
…9529-upstream-release-1.13 Automated cherry pick of #79529: fix: change timeout value in csi plugin
…9529-upstream-release-1.14 Automated cherry pick of #79529: fix: change timeout value in csi plugin
See PR:kubernetes#79529 (cherry picked from commit d8c92f5)
…35774_20190702 Merge branch 'cherry-pick-kubernetesgh-79529 of [email protected]:antstack/cafe-kubernetes.git into EI61435774_20190702 Signed-off-by: 子波 <[email protected]>
See PR:kubernetes#79529 (cherry picked from commit d8c92f5)
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR changes timeout value in csi plugin from 15s to 2min which fixes the timeout issue
attach/WaitForAttach timeout value is too small in CSI driver, it's all 15s which is really too small:
kubernetes/pkg/volume/csi/csi_plugin.go
Line 57 in 967b23c
Compared to built-in driver timeout, it's really too small:
podAttachAndMountTimeoutis 2 minutes, totalwaitForAttachTimeoutis 10 minutesWhich issue(s) this PR fixes:
Fixes #79528
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
/kind bug
/assign @msau42
/priority important-soon
/sig storage