-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start synchronizing pods after network is ready. #68752
Start synchronizing pods after network is ready. #68752
Conversation
/assign @yujuhong |
/assign mwielgus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We intentionally allow pods to start syncing because pods using host network should be able to run (and they often are required to set up the pod networks).
If you want to avoid hitting the backoff error, perhaps you'll need to add a check in the per-pod sync routine to skip the rest of the work (and not incur penalty) when the network is not ready and the pod does not use the host network.
/kind bug |
pkg/kubelet/kubelet.go
Outdated
@@ -1823,6 +1823,13 @@ func (kl *Kubelet) syncLoop(updates <-chan kubetypes.PodUpdate, handler SyncHand | |||
duration = time.Duration(math.Min(float64(max), factor*float64(duration))) | |||
continue | |||
} | |||
if ns := kl.runtimeState.networkErrors(); len(ns) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not combine with the above codes, like:
rs := kl.runtimeState.runtimeErrors()
rs = append(rs, kl.runtimeState.networkErrors())
if len(rs) != 0 {
......
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this lines.
e92ce36
to
7e9c6f3
Compare
/ok-to-test |
pkg/kubelet/pod_workers.go
Outdated
@@ -263,6 +267,9 @@ func (p *podWorkers) wrapUp(uid types.UID, syncErr error) { | |||
case syncErr == nil: | |||
// No error; requeue at the regular resync interval. | |||
p.workQueue.Enqueue(uid, wait.Jitter(p.resyncInterval, workerResyncIntervalJitterFactor)) | |||
case strings.Contains(syncErr.Error(), "network is not ready"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare network is not ready
a constant and use it in both places to avoid unnecessary breakage in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/kubelet/pod_workers.go
Outdated
workerBackOffPeriodJitterFactor = 0.5 | ||
|
||
// backoff period when network is not ready. | ||
backOffOnNetworkNotReadyPeriod = time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should generalize this for other non-pod-specific issues, but I think it's okay to do it later when there are more use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
7e9c6f3
to
f1e195c
Compare
f1e195c
to
ad330f7
Compare
/test pull-kubernetes-e2e-kops-aws |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
A unit test will be good, but I didn't find any existing test for this specific part of the code. Given the small scope of this PR, I think it's okay to let it in.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: krzysztof-jastrzebski, yujuhong The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-integration |
What this PR does / why we need it:
Start synchronizing pods after network is ready. If pod is synchronized before network is ready then it fails and kubelet tries to synchronize pod after 10 seconds.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #68751
Release note: