Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detach the volume when pod is terminated #45286

Merged

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented May 3, 2017

When pods are terminated we should detach the volume.

Fixes #45191

Release note:

Detach the volume when pods are terminated.  

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 3, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 3, 2017
@@ -395,8 +401,23 @@ func (adc *attachDetachController) GetDesiredStateOfWorld() cache.DesiredStateOf
}

func (adc *attachDetachController) podUpdate(oldObj, newObj interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

podUpdate still only happens once every 12 hours though right?

I think we have to do it in one of the reconciler loops

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume that, podUpdate should get called when pod's status for example, changes from Running to Complete - not every 12 hours.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, regardless of this event though - I have also updated DSWP to remove the pod from DSW if pod is terminated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah...brain fart my b 🥇

@gnufied gnufied force-pushed the fix-terminated-pods-detach branch from 319bfa0 to 2b30429 Compare May 3, 2017 18:51
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 3, 2017
@gnufied gnufied changed the title [WIP] detach the volume when pod is terminated detach the volume when pod is terminated May 3, 2017
Copy link
Member

@luxas luxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR!

General comment without having context on this particular feature -- is there any way we could implement this without adding yet another flag? We really need to ask ourselves that question, otherwise we'll end up with thousands of flags. Just something to keep in mind ;)

@luxas
Copy link
Member

luxas commented May 3, 2017

/unassign

@childsb
Copy link
Contributor

childsb commented May 3, 2017

@luxas the reason for the flag is to preserver backwards behavior, which in some cases may be desirable.

@@ -233,6 +236,9 @@ type attachDetachController struct {

// recorder is used to record events in the API server
recorder record.EventRecorder

// keep pod volumes attached for terminated pods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whether keep pod volumes

return
}

util.ProcessPodVolumes(pod, true, /* addVolumes */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make a boolean variable instead of calling ProcessPodVolumes twice with mostly the same parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@@ -46,7 +46,8 @@ func Test_NewAttachDetachController_Positive(t *testing.T) {
nil, /* cloud */
nil, /* plugins */
false,
time.Second*5)
time.Second*5,
false /*keep pod volumes attached for terminated pods */)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whether keep pod volumes...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean change wording to /* whether keep pod volumes */ ?

@@ -211,6 +212,7 @@ func (s *CMServer) AddFlags(fs *pflag.FlagSet, allControllers []string, disabled
fs.Int32Var(&s.LargeClusterSizeThreshold, "large-cluster-size-threshold", 50, "Number of nodes from which NodeController treats the cluster as large for the eviction logic purposes. --secondary-node-eviction-rate is implicitly overridden to 0 for clusters this size or smaller.")
fs.Float32Var(&s.UnhealthyZoneThreshold, "unhealthy-zone-threshold", 0.55, "Fraction of Nodes in a zone which needs to be not Ready (minimum 3) for zone to be treated as unhealthy. ")
fs.BoolVar(&s.DisableAttachDetachReconcilerSync, "disable-attach-detach-reconcile-sync", false, "Disable volume attach detach reconciler sync. Disabling this may cause volumes to be mismatched with pods. Use wisely.")
fs.BoolVar(&s.KeepTerminatedPodVolumes, "keep-terminated-pod-volumes", false, "Keep terminated pod volumes attached to the node after the pod terminates. Can be useful for debugging volume related issues.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> keep volumes attached to node ?

@@ -896,6 +896,10 @@ type KubeControllerManagerConfiguration struct {
// through the kube-aggregator when enabled, instead of using the legacy metrics client
// through the API server proxy.
HorizontalPodAutoscalerUseRESTClients bool

// KeepTerminatedPodVolumes causes terminated pod volumes attached to the node after the pod terminates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> KeepTerminatedPodVolumes keeps volumes attached to node after the pod terminates?

@saad-ali
Copy link
Member

saad-ali commented May 3, 2017

Largely looks fine to me. A couple of concerns:

  1. This makes the detach routine even more aggressive. I'm wary of doing that unless it is absolutely necessary since bugs here can result in a disk being pulled from a volume in use (and potential data loss).
  2. keep-terminated-pod-volumes is a flag on kubelet. You are adding it to the kube-controller-manager. How do they stay in sync? This is why flags are a poor way to configure k8s components, but I guess that's beyond the scope of this PR. At the very least please update the existing references to also set the new reference (e.g. hack/local-up-cluster.sh).

Will leave it to @jingxu97 for a more in depth review.

@gnufied
Copy link
Member Author

gnufied commented May 3, 2017

For concern-1, I think this is needed because Completed, Failed or Terminated pods currently leave the volume attached to the node and currently kubectl get pods for example doesn't even show completed pods, as a result we have users thinking they can reuse the PVC in other pod. But obviously that doesn't work. We had many users struggling with this. cc @eparis

For concern-2, yeah it is bit iffy. I am hoping that most users would run with default and hence kubelet and controller flags will be in sync. But I am also thinking of improving documentation of the flag in controller and mention corresponding flag in kubelet, so as users know about both of them. ack about updating references in ./hack/**

addPodFlag = false
}

util.ProcessPodVolumes(pod, addPodFlag, /* addVolumes */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking to be consistent with kubelet, for podAdd function, we can also check whether the pod is already terminated or not. It might be necessary if podAdd means pod would never be in terminated state...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do check that right? On line#415 (https://github.com/kubernetes/kubernetes/pull/45286/files#diff-72e66cba2ced392dc0f9cd10c424d6d0R415 )

I do not think there is any chance we are going to detach volumes still mounted and being used inside pod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that is for podUpdate, not podAdd?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@eparis
Copy link
Contributor

eparis commented May 3, 2017

We could make kubelet set "keep-terminated-pod-volumes" as a node annotation (similar to how we do master-attach-detach) then use that? I hate it, but we can if someone says we have to. I want no new flags if there is any way to do that. Not everything needs to be a choice.

I do question if setting keep-terminated-pod-volumes 'per-node' is/was really appropriate or if it should be a cluster wide things.

@rootfs
Copy link
Contributor

rootfs commented May 4, 2017

also see #45346

@sjenning
Copy link
Contributor

sjenning commented May 4, 2017

@eparis yeah, at the time I introduced that code and the new flag, it was geared toward freeing up emptydirs on pod termination. The detach-attach part of the PVs that might be associated with those mounts were outside the discussion at the time.

Like Saad, I do worry about keeping the flags in sync. If they are not, the attach-detach controller could end up force detaching mounted volumes and maybe causing corruption if the data previously written by the terminated pod is not synced before the forced detach.

However, if you can sync the data to volumes upon pod termination, I guess the worst that could happen is that your volumes are not their for debugging even if you set --keep-terminated-pod-volumes on the kubelet.

@gnufied gnufied force-pushed the fix-terminated-pods-detach branch from 0fc4be5 to 31c874a Compare May 4, 2017 20:26
@derekwaynecarr
Copy link
Member

/approve

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 11, 2017
@childsb
Copy link
Contributor

childsb commented May 11, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2017
@gnufied
Copy link
Member Author

gnufied commented May 11, 2017

@k8s-bot bazel test this.

@saad-ali saad-ali added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label May 11, 2017
@saad-ali
Copy link
Member

Taking a quick final pass

Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits. Feel free to address them here and do a self-LGTM or in a follow up PR.

@@ -144,6 +148,9 @@ type nodeManaged struct {
// attached to this node. The key in the map is the name of the volume and
// the value is a pod object containing more information about the volume.
volumesToAttach map[v1.UniqueVolumeName]volumeToAttach

// keepTerminatedPodVolumes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: More descriptive comment please!

@@ -46,7 +46,7 @@ type DesiredStateOfWorld interface {
// AddNode adds the given node to the list of nodes managed by the attach/
// detach controller.
// If the node already exists this is a no-op.
AddNode(nodeName k8stypes.NodeName)
AddNode(nodeName k8stypes.NodeName, keepTerminatedPodVolumes bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Update comment to describe the new parameter. This is particularly important for interfaces so that callers and implementers know what the expectations are.

@saad-ali saad-ali removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label May 11, 2017
@saad-ali
Copy link
Member

/lgtm
/approve

@gnufied
Copy link
Member Author

gnufied commented May 11, 2017

Given rather long build times right now because of bazel problems(kubernetes/test-infra#2708, #45558) - I am going to make the comment fixes as part of #45615.

I will make a note for myself. thanks @saad-ali

@gnufied
Copy link
Member Author

gnufied commented May 12, 2017

@k8s-bot bazel test this

@eparis
Copy link
Contributor

eparis commented May 12, 2017

@k8s-bot unit test this

Make sure volume is detached when pod is terminated because
of any reason and not deleted from api server.
@gnufied
Copy link
Member Author

gnufied commented May 12, 2017

@eparis this needs a small code change now, which I am pushing. Basically master has changed contract of RunAMaster now - https://github.com/kubernetes/kubernetes/blob/master/test/integration/volume/attach_detach_test.go#L80

Obviously tests I added still uses old signature and it broke. Likely the change that changed the method signature was added after everything was green.

and lets make sure that controller respects it
and doesn't detaches mounted volumes.
@gnufied gnufied force-pushed the fix-terminated-pods-detach branch from 826582d to 951a36a Compare May 12, 2017 02:33
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 12, 2017
@gnufied
Copy link
Member Author

gnufied commented May 12, 2017

I have pushed a fix for integration tests which were failing because change in RunAMaster signature. Also fixed comments by @saad-ali while at it.

@gnufied
Copy link
Member Author

gnufied commented May 12, 2017

/lgtm

@k8s-ci-robot
Copy link
Contributor

@gnufied: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: childsb, deads2k, derekwaynecarr, eparis, gnufied, jsafrane, saad-ali

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@eparis eparis added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 12, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.