-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detach the volume when pod is terminated #45286
detach the volume when pod is terminated #45286
Conversation
@@ -395,8 +401,23 @@ func (adc *attachDetachController) GetDesiredStateOfWorld() cache.DesiredStateOf | |||
} | |||
|
|||
func (adc *attachDetachController) podUpdate(oldObj, newObj interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
podUpdate still only happens once every 12 hours though right?
I think we have to do it in one of the reconciler loops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume that, podUpdate
should get called when pod's status for example, changes from Running
to Complete
- not every 12 hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, regardless of this event though - I have also updated DSWP to remove the pod from DSW if pod is terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah...brain fart my b 🥇
319bfa0
to
2b30429
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR!
General comment without having context on this particular feature -- is there any way we could implement this without adding yet another flag? We really need to ask ourselves that question, otherwise we'll end up with thousands of flags. Just something to keep in mind ;)
/unassign |
@luxas the reason for the flag is to preserver backwards behavior, which in some cases may be desirable. |
@@ -233,6 +236,9 @@ type attachDetachController struct { | |||
|
|||
// recorder is used to record events in the API server | |||
recorder record.EventRecorder | |||
|
|||
// keep pod volumes attached for terminated pods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whether keep pod volumes
return | ||
} | ||
|
||
util.ProcessPodVolumes(pod, true, /* addVolumes */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make a boolean variable instead of calling ProcessPodVolumes
twice with mostly the same parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
@@ -46,7 +46,8 @@ func Test_NewAttachDetachController_Positive(t *testing.T) { | |||
nil, /* cloud */ | |||
nil, /* plugins */ | |||
false, | |||
time.Second*5) | |||
time.Second*5, | |||
false /*keep pod volumes attached for terminated pods */) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whether keep pod volumes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean change wording to /* whether keep pod volumes */
?
@@ -211,6 +212,7 @@ func (s *CMServer) AddFlags(fs *pflag.FlagSet, allControllers []string, disabled | |||
fs.Int32Var(&s.LargeClusterSizeThreshold, "large-cluster-size-threshold", 50, "Number of nodes from which NodeController treats the cluster as large for the eviction logic purposes. --secondary-node-eviction-rate is implicitly overridden to 0 for clusters this size or smaller.") | |||
fs.Float32Var(&s.UnhealthyZoneThreshold, "unhealthy-zone-threshold", 0.55, "Fraction of Nodes in a zone which needs to be not Ready (minimum 3) for zone to be treated as unhealthy. ") | |||
fs.BoolVar(&s.DisableAttachDetachReconcilerSync, "disable-attach-detach-reconcile-sync", false, "Disable volume attach detach reconciler sync. Disabling this may cause volumes to be mismatched with pods. Use wisely.") | |||
fs.BoolVar(&s.KeepTerminatedPodVolumes, "keep-terminated-pod-volumes", false, "Keep terminated pod volumes attached to the node after the pod terminates. Can be useful for debugging volume related issues.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> keep volumes attached to node ?
pkg/apis/componentconfig/types.go
Outdated
@@ -896,6 +896,10 @@ type KubeControllerManagerConfiguration struct { | |||
// through the kube-aggregator when enabled, instead of using the legacy metrics client | |||
// through the API server proxy. | |||
HorizontalPodAutoscalerUseRESTClients bool | |||
|
|||
// KeepTerminatedPodVolumes causes terminated pod volumes attached to the node after the pod terminates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=> KeepTerminatedPodVolumes keeps volumes attached to node after the pod terminates?
Largely looks fine to me. A couple of concerns:
Will leave it to @jingxu97 for a more in depth review. |
For concern-1, I think this is needed because Completed, Failed or Terminated pods currently leave the volume attached to the node and currently For concern-2, yeah it is bit iffy. I am hoping that most users would run with default and hence kubelet and controller flags will be in sync. But I am also thinking of improving documentation of the flag in controller and mention corresponding flag in kubelet, so as users know about both of them. ack about updating references in |
addPodFlag = false | ||
} | ||
|
||
util.ProcessPodVolumes(pod, addPodFlag, /* addVolumes */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking to be consistent with kubelet, for podAdd function, we can also check whether the pod is already terminated or not. It might be necessary if podAdd means pod would never be in terminated state...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do check that right? On line#415 (https://github.com/kubernetes/kubernetes/pull/45286/files#diff-72e66cba2ced392dc0f9cd10c424d6d0R415 )
I do not think there is any chance we are going to detach volumes still mounted and being used inside pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but that is for podUpdate, not podAdd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
We could make kubelet set "keep-terminated-pod-volumes" as a node annotation (similar to how we do master-attach-detach) then use that? I hate it, but we can if someone says we have to. I want no new flags if there is any way to do that. Not everything needs to be a choice. I do question if setting keep-terminated-pod-volumes 'per-node' is/was really appropriate or if it should be a cluster wide things. |
also see #45346 |
@eparis yeah, at the time I introduced that code and the new flag, it was geared toward freeing up emptydirs on pod termination. The detach-attach part of the PVs that might be associated with those mounts were outside the discussion at the time. Like Saad, I do worry about keeping the flags in sync. If they are not, the attach-detach controller could end up force detaching mounted volumes and maybe causing corruption if the data previously written by the terminated pod is not synced before the forced detach. However, if you can sync the data to volumes upon pod termination, I guess the worst that could happen is that your volumes are not their for debugging even if you set |
0fc4be5
to
31c874a
Compare
/approve |
/lgtm |
@k8s-bot bazel test this. |
Taking a quick final pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits. Feel free to address them here and do a self-LGTM or in a follow up PR.
@@ -144,6 +148,9 @@ type nodeManaged struct { | |||
// attached to this node. The key in the map is the name of the volume and | |||
// the value is a pod object containing more information about the volume. | |||
volumesToAttach map[v1.UniqueVolumeName]volumeToAttach | |||
|
|||
// keepTerminatedPodVolumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: More descriptive comment please!
@@ -46,7 +46,7 @@ type DesiredStateOfWorld interface { | |||
// AddNode adds the given node to the list of nodes managed by the attach/ | |||
// detach controller. | |||
// If the node already exists this is a no-op. | |||
AddNode(nodeName k8stypes.NodeName) | |||
AddNode(nodeName k8stypes.NodeName, keepTerminatedPodVolumes bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Update comment to describe the new parameter. This is particularly important for interfaces so that callers and implementers know what the expectations are.
/lgtm |
Given rather long build times right now because of bazel problems(kubernetes/test-infra#2708, #45558) - I am going to make the comment fixes as part of #45615. I will make a note for myself. thanks @saad-ali |
@k8s-bot bazel test this |
@k8s-bot unit test this |
Make sure volume is detached when pod is terminated because of any reason and not deleted from api server.
@eparis this needs a small code change now, which I am pushing. Basically master has changed contract of Obviously tests I added still uses old signature and it broke. Likely the change that changed the method signature was added after everything was green. |
and lets make sure that controller respects it and doesn't detaches mounted volumes.
826582d
to
951a36a
Compare
I have pushed a fix for integration tests which were failing because change in |
/lgtm |
@gnufied: you cannot LGTM your own PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: childsb, deads2k, derekwaynecarr, eparis, gnufied, jsafrane, saad-ali
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue |
When pods are terminated we should detach the volume.
Fixes #45191
Release note: