Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EmptyDir volume and container overlay capacity isolation #45686

Merged
merged 1 commit into from
Jun 6, 2017

Conversation

jingxu97
Copy link
Contributor

@jingxu97 jingxu97 commented May 11, 2017

This PR adds the support for isolating the emptyDir volume use. If user
sets a size limit for emptyDir volume, kubelet's eviction manager monitors its usage
and evict the pod if the usage exceeds the limit.

This feature is part of local storage capacity isolation and described in the proposal kubernetes/community#306

Release note:

Alpha feature: allows users to set storage limit to isolate EmptyDir volumes. It enforces the limit by evicting pods that exceed their storage limits  

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 11, 2017
@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels May 11, 2017
if used != nil && size.Cmp(*zero) > 0 && used.Cmp(size) > 0 {
// the emptyDir usage exceeds the size limit, evict the pod
gracePeriod := int64(0)
m.evictPod(pod, v1.ResourceName("EmptyDir"), gracePeriod, fmt.Sprintf("emptyDir usage exceeds the limit %q", size.String()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err = m.evictPod(...) ?

@@ -0,0 +1,213 @@
/*
Copyright 2016 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/2016/2017

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files should not have capital letters in their names

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not review all the eviction logic, just the API

@@ -668,6 +668,11 @@ type EmptyDirVolumeSource struct {
// More info: http://kubernetes.io/docs/user-guide/volumes#emptydir
// +optional
Medium StorageMedium `json:"medium,omitempty" protobuf:"bytes,1,opt,name=medium,casttype=StorageMedium"`
// Total amount of local storage required for this directory.
// The default is nil which means that the directory can use all available local storage on the node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't document the default as "nil" (that's a go-ism). Instead document as "If not specified, the default is to not enforce a limit". Though we might want to loosen this and say "If not specified, the default limit is implementation defined". Default tmpfs is (I think) 1/2 physical mem. Default disk is all storage, etc. I don't think we need to define that here, though.

if !utilfeature.DefaultFeatureGate.Enabled(features.LocalStorageCapacityIsolation) {
unsetSizeLimit := resource.Quantity{}
if unsetSizeLimit.Cmp(source.EmptyDir.SizeLimit) != 0 {
allErrs = append(allErrs, field.Forbidden(fldPath.Child("emptyDir").Child("sizeLimit"), "LocalStorageCapacityIsolation are disabled by feature-gate"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is disabled"

@@ -0,0 +1,213 @@
/*
Copyright 2016 The Kubernetes Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files should not have capital letters in their names

@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 18, 2017
@@ -32,6 +32,7 @@ import (
"k8s.io/client-go/util/clock"
"k8s.io/kubernetes/pkg/api/v1"
"k8s.io/kubernetes/pkg/features"
statsapi "k8s.io/kubernetes/pkg/kubelet/api/v1alpha1/stats"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be k8s.io/kubernetes/pkg/kubelet/apis/stats/v1alpha1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching it. fix it. strange it was working in this way too...

glog.Errorf("eviction manager: pod %s exceeds emptyDir sizeLimit but failed to evict %v", format.Pod(pod), err)
} else {
glog.Infof("eviction manager: pod %s exceeds emptyDir sizeLimit and is evicted successfully", format.Pod(pod))
evict = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we return early since the pod is already evicted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is to evict all pods that violate the size limit for emptyDir volume. So we have to continue with the loop even after some pods are evicted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you are right, return early is fine.

@jingxu97 jingxu97 requested a review from dashpole May 26, 2017 19:05
@jingxu97 jingxu97 added this to the v1.7 milestone May 26, 2017
@jingxu97 jingxu97 force-pushed the May/emptyDir branch 2 times, most recently from adb9650 to 431baee Compare May 26, 2017 23:37
@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 26, 2017

// evict pods if there is a resource uage violation from local volume temporary storage
// If eviction happenes in localVolumeEviction function, skip the rest of eviction action
if m.localVolumeEviction(activePods) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishh @derekwaynecarr, any preference as to whether doing local volume evictions is part of the same control loop as the rest of evictions? The main advantage of having them together is to not perform other evictions when pods are already being evicted for exceeding their local volume storage space

return evict
}

func (m *managerImpl) evictPod(pod *v1.Pod, resourceName v1.ResourceName, gracePeriod int64, evictMsg string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gracePeriod is always passed as 0, so you dont need it as an argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this change in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but adding a parameter to a function is easy enough. Better to leave it out for now.

@dashpole
Copy link
Contributor

Changes to the eviction manager need unit testing as well.

Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small things

size := source.EmptyDir.SizeLimit
used := podVolumeUsed[pod.Spec.Volumes[i].Name]
zero := resource.NewQuantity(int64(0), resource.BinarySI)
glog.Infof("Jing pod %s limit %v, used %v", format.Pod(pod), size, used)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably left over from testing. Remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

used := podVolumeUsed[pod.Spec.Volumes[i].Name]
zero := resource.NewQuantity(int64(0), resource.BinarySI)
glog.Infof("Jing pod %s limit %v, used %v", format.Pod(pod), size, used)
if used != nil && size.Cmp(*zero) > 0 && used.Cmp(size) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of Cmp(*zero), I think you can use Sign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

if overlayThreshold, ok := thresholdsMap[containerStat.Name]; ok {
if overlayThreshold.Cmp(*rootfs) < 0 {
gracePeriod := int64(0)
err := m.evictPod(pod, v1.ResourceName("containerOverlay"), gracePeriod, fmt.Sprintf("container's overlay usage exceeds the limit %q", overlayThreshold.String()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are passing in a relevant message to evictPod, why not do the logging inside of evictPod using that message. evictPod doesnt need to return an error in this case, and we can avoid having essentially the same log messages duplicated in 3 places.

Copy link
Contributor Author

@jingxu97 jingxu97 May 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, put logging inside of evictPod, and return bool to indicate whether pod is evicted or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter whether the pod is evicted or not? I believe we set the pod's status to failed, so kubelet housekeeping will terminate it even if the eviction manager failed.
Essentially, we should treat any pod that we attempt to evict as evicted.

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 30, 2017
@jingxu97
Copy link
Contributor Author

addressed the comments. @dashpole PTAL

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2017
@falenn
Copy link

falenn commented Jun 1, 2017 via email

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: falenn, jingxu97, thockin, vishh

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@jingxu97
Copy link
Contributor Author

jingxu97 commented Jun 1, 2017

@k8s-bot unit test this

@jingxu97 jingxu97 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2017
@jingxu97
Copy link
Contributor Author

jingxu97 commented Jun 1, 2017

@k8s-bot pull-kubernetes-e2e-kops-aws test this

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 3, 2017
This PR adds two features:
1. add support for isolating the emptyDir volume use. If user
sets a size limit for emptyDir volume, kubelet's eviction manager
monitors its usage
and evict the pod if the usage exceeds the limit.
2. add support for isolating the local storage for container overlay. If
the container's overly usage exceeds the limit defined in container
spec, eviction manager will evict the pod.
@k8s-github-robot k8s-github-robot removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 5, 2017
@jingxu97
Copy link
Contributor Author

jingxu97 commented Jun 5, 2017

@k8s-bot gce etcd3 e2e test this

@jingxu97
Copy link
Contributor Author

jingxu97 commented Jun 5, 2017

@k8s-bot pull-kubernetes-verify test this

@jingxu97
Copy link
Contributor Author

jingxu97 commented Jun 5, 2017

@k8s-bot kops aws e2e test this

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jun 5, 2017

@jingxu97: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
Jenkins unit/integration f4cedab5e4c7481d41311a6b26eb984e21fcd476 link @k8s-bot unit test this
Jenkins kops AWS e2e f4cedab5e4c7481d41311a6b26eb984e21fcd476 link @k8s-bot kops aws e2e test this
Jenkins Kubemark GCE e2e f4cedab5e4c7481d41311a6b26eb984e21fcd476 link @k8s-bot kubemark e2e test this
Jenkins Bazel Build f4cedab5e4c7481d41311a6b26eb984e21fcd476 link @k8s-bot bazel test this
Jenkins GCE Node e2e f4cedab5e4c7481d41311a6b26eb984e21fcd476 link @k8s-bot node e2e test this
Jenkins GCE etcd3 e2e f4cedab5e4c7481d41311a6b26eb984e21fcd476 link @k8s-bot gce etcd3 e2e test this
Jenkins verification f4cedab5e4c7481d41311a6b26eb984e21fcd476 link @k8s-bot verify test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jingxu97 jingxu97 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2017
@jingxu97
Copy link
Contributor Author

jingxu97 commented Jun 6, 2017

@k8s-bot kops aws e2e test this

@jingxu97
Copy link
Contributor Author

jingxu97 commented Jun 6, 2017

rebased, add /lgtm back

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit cb68132 into kubernetes:master Jun 6, 2017
@jingxu97 jingxu97 changed the title Add EmptyDir volume capacity isolation Add EmptyDir volume and container overlay capacity isolation Jun 30, 2017
@andyxning
Copy link
Member

@jingxu97

  • Why we choose to evict a pod when the usage if exceed the limit?
  • IMO, we use the monitor and check logic to enforce this logic? Kubelet does not depends on any disk quota hard limit that underlying fs supports, i.e., xfs quota.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.