-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use separate client for leader election in scheduler #53793
Use separate client for leader election in scheduler #53793
Conversation
/approve |
} | ||
|
||
kubeconfig.ContentType = s.ContentType | ||
// Override kubeconfig qps/burst settings from flags | ||
kubeconfig.QPS = s.KubeAPIQPS | ||
kubeconfig.Burst = int(s.KubeAPIBurst) | ||
|
||
cli, err := clientset.NewForConfig(restclient.AddUserAgent(kubeconfig, "leader-election")) | ||
kubeClient, err := clientset.NewForConfig(restclient.AddUserAgent(kubeconfig, "scheduler")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use NewForConfigOrDie
here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - this is by design.
With the first call here, I want to catch an error if any.
With the second call, I already validated the config (via the first call). So it's safe to call ...OrDie, because something really bad need to happen so that it will actually die.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting.. Thanks for the explanation.
@@ -56,22 +56,22 @@ func createRecorder(kubecli *clientset.Clientset, s *options.SchedulerServer) re | |||
return eventBroadcaster.NewRecorder(api.Scheme, v1.EventSource{Component: s.SchedulerName}) | |||
} | |||
|
|||
func createClient(s *options.SchedulerServer) (*clientset.Clientset, error) { | |||
func createClient(s *options.SchedulerServer) (*clientset.Clientset, *clientset.Clientset, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/createClient/createClients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Couple of nits and lgtm. |
7b3bb54
to
234e20b
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: shyamjvs, wojtek-t Associated issue: 53327 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
This totally seems like a hack. I'm just getting through my backlog, why exactly did you do this? LE should really switch to configmaps for scale concerns. Otherwise were just shuffling around the problem. |
I completely don't understand. It doesn't matter if we are using configmaps, endpoints or a dedicated object. The client used for it should be a separate client. Otherwise if because of some reason scheduler (or any other component) will generate too many API calls, they may starve leader election and thus cause unnecessary leader reelection. |
Part of the reason for my config map comment, is LE is a source of endpoint spamming whose fanout still causes the api-server to notify all the watchers. I don't see how the LE client effects the api-server in any significant way. |
The fact whether it is the same or a different client doesn't impact apiserver at all. As I wrote above, if we are using the same client for regular client api calls and for leader election, then generating too many api calls (due to how the client library is implemented), if we are generating them faster then qps limit, they are accumulating, and this may block refreshing leader lock. Which mean lead into loosing a lock and unnecessary leader reelection. |
I've seen this a couple of times before in ~ year ago before we did all the api-server improvements. FWIW You can always change the default leader election time too. |
It's not about apiserver at all. If you do a bug in any component (and try to send to many apicalls), the QPS limit on client side may starve leader election heartbeats even if apiserver would be infinitely fast and efficient.
That's not a good workaround because if affect the legitimate reelections that should happen. BTW - I don't understand your concerns around this. Is there any scenario you have in mind that this PR is making worse? |
I think I misread the original problem, but it still leaves me unsettled b/c it looks like it's tap-dancing around the root of the problem, and makes the code cryptic to anyone reading it for the 1st time.
|
I don't think we saw a spike in the traffic. This change is more of a correctness fix iiuc from @wojtek-t 's comment. |
@timothysc - this isn't solving any problem we've seen. It's just something that i think we may potentially see at some point. Regarding using backpressure - I agree. But this won't fully solve this problem. There is no way we can guarantee that apiserver will reject other requests, not these one. The feature that we would need (and would solve this problem) is priority of api request. Those coming from leader election should have highest priority and should alwyas be processed before others. The trick will separate client is a bit of workaround of this problem. So in general, I agree that this isn't the perfect solution. But to solve it properly we will need:
OTOH, those changes are pretty local and not polluting the code, so it's not that big deal in my opinion. |
Automatic merge from submit-queue. UPSTREAM: 53989: Remove repeated random string generations in scheduler volume predicate @sjenning @smarterclayton Though the upstream PR 53793 has been backported to kube 1.7 branch (53884). I am not sure if we have a plan for another origin rebase to latest kube 1.7, and if we would want to wait for that. So this backports following 3 PRs: kubernetes/kubernetes#53793 kubernetes/kubernetes#53720 (partial) kubernetes/kubernetes#53989
Ref #53327
@kubernetes/sig-scheduling-bugs @bsalamat @davidopp