Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete all dead containers and sandboxes when under disk pressure. #45896

Merged
merged 1 commit into from
Jun 4, 2017

Conversation

dashpole
Copy link
Contributor

@dashpole dashpole commented May 16, 2017

This PR modifies the eviction manager to add dead container and sandbox garbage collection as a resource reclaim function for disk. It also modifies the container GC logic to allow pods that are terminated, but not deleted to be removed.

It still does not delete containers that are less than the minGcAge. This should prevent nodes from entering a permanently bad state if the entire disk is occupied by pods that are terminated (in the state failed, or succeeded), but not deleted.

There are two improvements we should consider making in the future:

  • Track the disk space and inodes reclaimed by deleting containers. We currently do not track this, and it prevents us from determining if deleting containers resolves disk pressure. So we may still evict a pod even if we are able to free disk space by deleting dead containers.
  • Once we can track disk space and inodes reclaimed, we should consider only deleting the containers we need to in order to relieve disk pressure. This should help avoid a scenario where we try and delete a massive number of containers all at once, and overwhelm the runtime.

/assign @vishh
cc @derekwaynecarr

Disk Pressure triggers the deletion of terminated containers on the node.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 16, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels May 16, 2017
@dashpole dashpole force-pushed the disk_pressure_reclaim branch 2 times, most recently from 87a39e8 to 4b170e1 Compare May 16, 2017 20:58
@dashpole
Copy link
Contributor Author

@k8s-bot kops aws e2e test this

@dashpole
Copy link
Contributor Author

@k8s-bot gce etcd3 e2e test this

@dashpole
Copy link
Contributor Author

this passes the inode eviction test

resourceToReclaimFunc := map[v1.ResourceName]nodeReclaimFuncs{}
// usage of an imagefs is optional
if withImageFs {
// with an imagefs, nodefs pressure should just delete logs
resourceToReclaimFunc[resourceNodeFs] = nodeReclaimFuncs{deleteLogs()}
resourceToReclaimFunc[resourceNodeFsInodes] = nodeReclaimFuncs{deleteLogs()}
resourceToReclaimFunc[resourceNodeFs] = nodeReclaimFuncs{deleteTerminatedContainers(containerGC)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt this be on the imageFs and not nodeFs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, ill fix that.

@dashpole
Copy link
Contributor Author

I also changed the ContainerGC to take a SourcesReadyProvider so that callers of ContainerGC dont need a reference to the kubelet's AllReady function.

@derekwaynecarr
Copy link
Member

fyi @sjenning

@sjenning
Copy link
Contributor

what is the rationale for not deleting pod containers on pod termination? getting the logs from them? debugging?

@derekwaynecarr
Copy link
Member

@sjenning - yes

@dashpole
Copy link
Contributor Author

@derekwaynecarr or @vishh, if either of you have the time, this would be good to include in 1.7.

@vishh
Copy link
Contributor

vishh commented Jun 1, 2017

Track the disk space and inodes reclaimed by deleting containers. We currently do not track this, and it prevents us from determining if deleting containers resolves disk pressure. So we may still evict a pod even if we are able to free disk space by deleting dead containers.

Can we get a count on number of containers deleted? If that's 0, then we can safely assume no disk space has been reclaimed and proceed to evict pods.

Once we can track disk space and inodes reclaimed, we should consider only deleting the containers we need to in order to relieve disk pressure. This should help avoid a scenario where we try and delete a massive number of containers all at once, and overwhelm the runtime.

We need to ratelimit all our calls to the runtime in a central location. It doesn't make sense for each module in kubelet to perform it's own rate limiting.

@vishh
Copy link
Contributor

vishh commented Jun 1, 2017

With this change, will kubelet avoid evicting pods if GC has reclaimed required amount of disk?

@vishh
Copy link
Contributor

vishh commented Jun 1, 2017

@dashpole if you can provide an answer to my previous question, this PR is ready to go.

@dashpole
Copy link
Contributor Author

dashpole commented Jun 1, 2017

With this change, will kubelet avoid evicting pods if GC has reclaimed required amount of disk?

No. We dont know how much disk we have reclaimed, so we assume we have reclaimed 0. This PR isnt meant to prevent the eviction of pods, but to prevent the node from entering a bad state when it is filled with dead containers.
The behavior is essentially the same as before, except containerGC is triggered.

@dashpole
Copy link
Contributor Author

dashpole commented Jun 1, 2017

@k8s-bot pull-kubernetes-federation-e2e-gce test this

@vishh
Copy link
Contributor

vishh commented Jun 1, 2017 via email

@vishh
Copy link
Contributor

vishh commented Jun 1, 2017 via email

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2017
@falenn
Copy link

falenn commented Jun 1, 2017 via email

@dashpole
Copy link
Contributor Author

dashpole commented Jun 1, 2017

@vishh should we add the 1.7 milestone to this?

@vishh vishh added this to the v1.7 milestone Jun 1, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 3, 2017
@k8s-github-robot k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 3, 2017
@dashpole
Copy link
Contributor Author

dashpole commented Jun 3, 2017

@vishh rebased. Please reapply lgtm

@dashpole
Copy link
Contributor Author

dashpole commented Jun 4, 2017

@k8s-bot pull-kubernetes-federation-e2e-gce test this
@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

@vishh
Copy link
Contributor

vishh commented Jun 4, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, vishh

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@dashpole
Copy link
Contributor Author

dashpole commented Jun 4, 2017

@k8s-bot pull-kubernetes-federation-e2e-gce test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 3fdf6c3 into kubernetes:master Jun 4, 2017
@dashpole dashpole deleted the disk_pressure_reclaim branch June 4, 2017 19:10
@blakebarnett
Copy link

Any chance this can go into 1.6.8? We're seeing it (or something very like it) occasionally on 1.6.6. The nodes didn't seem to ever be under disk pressure but seeing these messages constantly in the kubelet logs now.

@dashpole
Copy link
Contributor Author

@blakebarnett as this is a large change, I would rather not cherrypick this to 1.6. As a workaround, it should be fairly trivial to run a job that periodically deletes terminated pods. I have also heard there is a controller called podgc, but I am not familiar enough with it to know if it would be a good solution to this issue.

@blakebarnett
Copy link

Thanks for the quick response. Sounds like we may just go to 1.7 ahead of schedule then. :)

k8s-github-robot pushed a commit that referenced this pull request Sep 29, 2017
Automatic merge from submit-queue (batch tested with PRs 44596, 52708, 53163, 53167, 52692). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Do not GC exited containers in running pods

This fixes a regression introduced by #45896, and was identified by #52462.
This bug causes the kubelet to garbage collect exited containers in a running pod.
This manifests in strange and confusing state when viewing the cluster.  For example, it can show running pods as having no init container (see #52462), if that container has exited and been removed.

This PR solves this problem by only removing containers and sandboxes from terminated pods.
The important line change is:
` if cgc.podDeletionProvider.IsPodDeleted(podUID) || evictNonDeletedPods {` ---> 
`if cgc.podStateProvider.IsPodDeleted(podUID) || (cgc.podStateProvider.IsPodTerminated(podUID) && evictTerminatedPods) {`

cc @MrHohn @yujuhong @kubernetes/sig-node-bugs 

```release-note
BugFix: Exited containers are not Garbage Collected by the kubelet while the pod is running
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants