-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix sync loop health check #47124
fix sync loop health check #47124
Conversation
10346b9
to
0009b68
Compare
@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this |
pkg/kubelet/kubelet.go
Outdated
@@ -1805,6 +1805,21 @@ func (kl *Kubelet) syncLoop(updates <-chan kubetypes.PodUpdate, handler SyncHand | |||
if !kl.syncLoopIteration(updates, handler, syncTicker.C, housekeepingTicker.C, plegCh) { | |||
break | |||
} | |||
|
|||
kl.syncLoopMonitorCheck() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why you moved the check here.
How about moving the kl.syncLoopMonitor
operations to this loop instead? This way, it gets updated even if runtime is not functioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving the kl.syncLoopMonitor operations to this loop instead? This way, it gets updated even if runtime is not functioning.
There are two update to thee syncLoopMonitor
. One is at he start of syncLoopIteration
, the other is at the end of syncLoopIteration
. I am not quite sure why we need the two updation. Maybe the end update of syncLoopMonitor
is to indicate the long running of sync operation.
If that is true, we may need to do this:
kl.syncLoopMonitor.Store(kl.clock.Now())
if !kl.syncLoopIteration(updates, handler, syncTicker.C, housekeepingTicker.C, plegCh) {
break
}
kl.syncLoopMonitor.Store(kl.clock.Now())
I am not sure why you moved the check here.
The intention is to remove sync loop healthz endpoint and instead just logging this situation. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code snippet looks good.
I think we still want the healthz checkpoint in case the goroutine hangs, but we don't have to fail the health check when runtime is not ready like you said :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. Will update the code later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yujuhong Done. I have also update the PR's title and release note. PTAL. :)
0009b68
to
4b772a9
Compare
@k8s-bot pull-kubernetes-unit test this |
@k8s-bot pull-kubernetes-bazel test this |
4b772a9
to
96cb439
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyxning, yujuhong Associated issue: 37865 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
@yujuhong Thanks for review. There is still a problem that the PR has not been automatically merged. :) |
/cc @dchen1107 |
Automatic merge from submit-queue (batch tested with PRs 47000, 47188, 47094, 47323, 47124) |
Automatic merge from submit-queue (batch tested with PRs 47000, 47188, 47094, 47323, 47124) fix sync loop health check This PR will do error logging about the fall behind sync for kubelet instead of sync loop healthz checking. The reason is kubelet can not do sync loop and therefore can not update sync loop time when there is any runtime error, such as docker hung. When there is any runtime error, according to current implementation, kubelet will not do sync operation and thus kubelet's sync loop time will not be updated. This will make when there is any runtime error, kubelet will also return non 200 response status code when accessing healthz endpoint. This is contrary with #37865 which prevents kubelet from being killed when docker hangs. **Release note**: ```release-note fix sync loop health check with seperating runtime errors ``` /cc @yujuhong @Random-Liu @dchen1107
This PR will do error logging about the fall behind sync for kubelet instead of sync loop healthz checking.
The reason is kubelet can not do sync loop and therefore can not update sync loop time when there is any runtime error, such as docker hung.
When there is any runtime error, according to current implementation, kubelet will not do sync operation and thus kubelet's sync loop time will not be updated. This will make when there is any runtime error, kubelet will also return non 200 response status code when accessing healthz endpoint. This is contrary with #37865 which prevents kubelet from being killed when docker hangs.
Release note:
/cc @yujuhong @Random-Liu @dchen1107