Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reset resultRun on pod restart #46371

Merged

Conversation

sjenning
Copy link
Contributor

@sjenning sjenning commented May 24, 2017

xref https://bugzilla.redhat.com/show_bug.cgi?id=1455056

There is currently an issue where, if the pod is restarted due to liveness probe failures exceeding failureThreshold, the failure count is not reset on the probe worker. When the pod restarts, if the liveness probe fails even once, the pod is restarted again, not honoring failureThreshold on the restart.

apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
  - name: busybox
    image: busybox
    command:
    - sleep
    - "3600"
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 3
      timeoutSeconds: 1
      periodSeconds: 3
      successThreshold: 1
      failureThreshold: 5
  terminationGracePeriodSeconds: 0

Before this PR:

$ kubectl create -f busybox-probe-fail.yaml 
pod "busybox" created
$ kubectl get pod -w
NAME      READY     STATUS    RESTARTS   AGE
busybox   1/1       Running   0          4s
busybox   1/1       Running   1         24s
busybox   1/1       Running   2         33s
busybox   0/1       CrashLoopBackOff   2         39s

After this PR:

$ kubectl create -f busybox-probe-fail.yaml
$ kubectl get pod -w
NAME      READY     STATUS              RESTARTS   AGE
busybox   0/1       ContainerCreating   0          2s
busybox   1/1       Running   0         4s
busybox   1/1       Running   1         27s
busybox   1/1       Running   2         45s
Fix kubelet reset liveness probe failure count across pod restart boundaries.

Restarts now happen at even intervals.

@derekwaynecarr

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 24, 2017
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels May 24, 2017
@derekwaynecarr derekwaynecarr self-assigned this May 24, 2017
@derekwaynecarr derekwaynecarr added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels May 24, 2017
@derekwaynecarr derekwaynecarr added this to the v1.6 milestone May 24, 2017
@derekwaynecarr
Copy link
Member

@sjenning - added release note text, poke when you have a test.

@sjenning sjenning force-pushed the fix-liveness-probe-reset branch from 2087aa0 to e87ed0a Compare May 24, 2017 19:53
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 24, 2017
@sjenning sjenning force-pushed the fix-liveness-probe-reset branch from e87ed0a to 2c866a7 Compare May 24, 2017 19:56
@sjenning
Copy link
Contributor Author

@derekwaynecarr test is ready

@derekwaynecarr
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 24, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, sjenning

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 24, 2017
@sjenning
Copy link
Contributor Author

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

2 similar comments
@sjenning
Copy link
Contributor Author

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

@sjenning
Copy link
Contributor Author

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

@sjenning
Copy link
Contributor Author

@k8s-bot pull-kubernetes-unit test this

@sjenning
Copy link
Contributor Author

flake for unit test failure #46446

@ravilr
Copy link
Contributor

ravilr commented Jun 2, 2017

@sjenning Have been running into this in 1.4 and 1.5 clusters also. Looks like this issue has been around for a long time. Thanks for the fix. Waiting for this to be released in 1.6.5.

@sjenning
Copy link
Contributor Author

sjenning commented Jun 3, 2017

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

@sjenning
Copy link
Contributor Author

sjenning commented Jun 3, 2017

@k8s-bot pull-kubernetes-unit test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit b641aed into kubernetes:master Jun 3, 2017
@ravilr
Copy link
Contributor

ravilr commented Jun 5, 2017

@sjenning Can we cherrypick this to 1.6 please.

@sjenning
Copy link
Contributor Author

sjenning commented Jun 5, 2017

@ravilr i don't have the authority. the cherrypick-candidate label has already been applied. i assume that label puts this on the radar of the person that has the authority to pick it to 1.6.

@ravilr
Copy link
Contributor

ravilr commented Jun 26, 2017

@enisoc @yujuhong @derekwaynecarr what needs to be done here to get this cherry-pick'ed to 1.6 release?

@enisoc
Copy link
Member

enisoc commented Jun 26, 2017

I'm the one who approves 1.6 cherrypicks, but I don't actually monitor the cherrypick-candidate label because it seemed redundant. I just wait until a PR shows up targeting the release-1.6 branch, such as would be created by the cherry-pick script.

@sjenning
Copy link
Contributor Author

@enisoc good to know. I'll open a PR then.

@yujuhong
Copy link
Contributor

@sjenning I just did #48099

@sjenning
Copy link
Contributor Author

@yujuhong great, thanks!

@wojtek-t wojtek-t added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jul 20, 2017
@sjenning sjenning deleted the fix-liveness-probe-reset branch August 16, 2017 02:16
@joelittlejohn
Copy link

Hi all. I think something may have gone wrong with the cherry-pick conversation here. @yujuhong referred to #48099 but that has nothing to do with this issue as far as I can tell. That relates to cherry-picking of #46246.

So 1.6.11 still doesn't have the required fix:

https://github.com/kubernetes/kubernetes/blob/v1.6.11/pkg/kubelet/prober/worker.go#L226

but 1.7.8 does have it:

https://github.com/kubernetes/kubernetes/blob/v1.7.8/pkg/kubelet/prober/worker.go#L227

Have I missed something here? Do we need a new PR that covers this cherry-pick?

@enisoc
Copy link
Member

enisoc commented Oct 6, 2017

@joelittlejohn Thanks for pointing this out! It does look like a miscommunication of some sort. @yujuhong can you confirm?

In the meantime, I went ahead and created a cherrypick of this: #53544.

@yujuhong
Copy link
Contributor

yujuhong commented Oct 9, 2017

@enisoc oops...I probably replied to the wrong issue. Go ahead and cherrypick this.

k8s-github-robot pushed a commit that referenced this pull request Oct 10, 2017
…-upstream-release-1.6

Automatic merge from submit-queue.

Automated cherry pick of #46371

Cherry pick of #46371 on release-1.6.

#46371: reset resultRun on pod restart
@enisoc
Copy link
Member

enisoc commented Oct 10, 2017

This is now merged into the 1.6 branch, for real. Thanks again for pointing it out @joelittlejohn!

@joelittlejohn
Copy link

Np, thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.