Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use separate client for leader election in scheduler #53793

Conversation

wojtek-t
Copy link
Member

@wojtek-t wojtek-t commented Oct 12, 2017

Ref #53327

@kubernetes/sig-scheduling-bugs @bsalamat @davidopp

Use separate client for leader election in scheduler to avoid starving leader election by regular scheduler operations.

@wojtek-t wojtek-t added cherrypick-candidate release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Oct 12, 2017
@wojtek-t wojtek-t added this to the v1.7 milestone Oct 12, 2017
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Oct 12, 2017
@wojtek-t wojtek-t added release-note-none Denotes a PR that doesn't merit a release note. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. release-note-none Denotes a PR that doesn't merit a release note. labels Oct 12, 2017
@wojtek-t
Copy link
Member Author

/approve

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 12, 2017
}

kubeconfig.ContentType = s.ContentType
// Override kubeconfig qps/burst settings from flags
kubeconfig.QPS = s.KubeAPIQPS
kubeconfig.Burst = int(s.KubeAPIBurst)

cli, err := clientset.NewForConfig(restclient.AddUserAgent(kubeconfig, "leader-election"))
kubeClient, err := clientset.NewForConfig(restclient.AddUserAgent(kubeconfig, "scheduler"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use NewForConfigOrDie here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - this is by design.
With the first call here, I want to catch an error if any.
With the second call, I already validated the config (via the first call). So it's safe to call ...OrDie, because something really bad need to happen so that it will actually die.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.. Thanks for the explanation.

@@ -56,22 +56,22 @@ func createRecorder(kubecli *clientset.Clientset, s *options.SchedulerServer) re
return eventBroadcaster.NewRecorder(api.Scheme, v1.EventSource{Component: s.SchedulerName})
}

func createClient(s *options.SchedulerServer) (*clientset.Clientset, error) {
func createClient(s *options.SchedulerServer) (*clientset.Clientset, *clientset.Clientset, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/createClient/createClients?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@shyamjvs
Copy link
Member

Couple of nits and lgtm.

@wojtek-t wojtek-t force-pushed the separate_leader_election_in_scheduler branch from 7b3bb54 to 234e20b Compare October 12, 2017 11:44
@shyamjvs
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 12, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shyamjvs, wojtek-t

Associated issue: 53327

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 9c1796a into kubernetes:master Oct 12, 2017
@wojtek-t wojtek-t changed the title User separate client for leader election in scheduler Use separate client for leader election in scheduler Oct 13, 2017
@wojtek-t wojtek-t added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Oct 13, 2017
k8s-github-robot pushed a commit that referenced this pull request Oct 13, 2017
…93-upstream-release-1.7

Automatic merge from submit-queue.

Automated cherry pick of #53793 upstream release 1.7 

Cherry pick of #53793 on release-1.7.

#53793: Use separate client for leader election in scheduler
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

@timothysc
Copy link
Member

This totally seems like a hack.

I'm just getting through my backlog, why exactly did you do this? LE should really switch to configmaps for scale concerns. Otherwise were just shuffling around the problem.

@wojtek-t
Copy link
Member Author

I'm just getting through my backlog, why exactly did you do this? LE should really switch to configmaps for scale concerns. Otherwise were just shuffling around the problem.

I completely don't understand. It doesn't matter if we are using configmaps, endpoints or a dedicated object. The client used for it should be a separate client. Otherwise if because of some reason scheduler (or any other component) will generate too many API calls, they may starve leader election and thus cause unnecessary leader reelection.
BTW - a separate leader election client is exactly what we are doing in other components (like controller manager).

@timothysc
Copy link
Member

Part of the reason for my config map comment, is LE is a source of endpoint spamming whose fanout still causes the api-server to notify all the watchers.

I don't see how the LE client effects the api-server in any significant way.

k8s-github-robot pushed a commit that referenced this pull request Oct 14, 2017
…93-upstream-release-1.8

Automatic merge from submit-queue.

Automated cherry pick of #53793 upstream release 1.8 

Cherry pick of #53793 on release-1.8.

#53793: Use separate client for leader election in scheduler
@wojtek-t
Copy link
Member Author

Part of the reason for my config map comment, is LE is a source of endpoint spamming whose fanout still causes the api-server to notify all the watchers.
I don't see how the LE client effects the api-server in any significant way.

The fact whether it is the same or a different client doesn't impact apiserver at all.

As I wrote above, if we are using the same client for regular client api calls and for leader election, then generating too many api calls (due to how the client library is implemented), if we are generating them faster then qps limit, they are accumulating, and this may block refreshing leader lock. Which mean lead into loosing a lock and unnecessary leader reelection.

@timothysc
Copy link
Member

I've seen this a couple of times before in ~ year ago before we did all the api-server improvements.

FWIW You can always change the default leader election time too.

@wojtek-t
Copy link
Member Author

I've seen this a couple of times before in ~ year ago before we did all the api-server improvements.

It's not about apiserver at all. If you do a bug in any component (and try to send to many apicalls), the QPS limit on client side may starve leader election heartbeats even if apiserver would be infinitely fast and efficient.

FWIW You can always change the default leader election time too.

That's not a good workaround because if affect the legitimate reelections that should happen.

BTW - I don't understand your concerns around this. Is there any scenario you have in mind that this PR is making worse?

@timothysc
Copy link
Member

I don't understand your concerns around this. Is there any scenario you have in mind that this PR is making worse?

I think I misread the original problem, but it still leaves me unsettled b/c it looks like it's tap-dancing around the root of the problem, and makes the code cryptic to anyone reading it for the 1st time.

  • 1 We should have updated the QPS limits a while ago, and used back-pressure, but we pushed it off...
  • 2 Why is there now a spike in traffic, to the point where it exhausts QPS, and why isn't the QPS limit adjusted appropriately?

@shyamjvs
Copy link
Member

Why is there now a spike in traffic, to the point where it exhausts QPS, and why isn't the QPS limit adjusted appropriately?

I don't think we saw a spike in the traffic. This change is more of a correctness fix iiuc from @wojtek-t 's comment.

@wojtek-t
Copy link
Member Author

@timothysc - this isn't solving any problem we've seen. It's just something that i think we may potentially see at some point.

Regarding using backpressure - I agree. But this won't fully solve this problem. There is no way we can guarantee that apiserver will reject other requests, not these one.

The feature that we would need (and would solve this problem) is priority of api request. Those coming from leader election should have highest priority and should alwyas be processed before others. The trick will separate client is a bit of workaround of this problem.

So in general, I agree that this isn't the perfect solution. But to solve it properly we will need:

  • backpressure in apiserver
  • priority of api calls
    Those two features are both quite big and noone really has capacity to do them. That's why we're doing things like this.

OTOH, those changes are pretty local and not polluting the code, so it's not that big deal in my opinion.

openshift-merge-robot added a commit to openshift/origin that referenced this pull request Oct 25, 2017
Automatic merge from submit-queue.

UPSTREAM: 53989: Remove repeated random string generations in scheduler volume predicate

@sjenning @smarterclayton

Though the upstream PR 53793 has been backported to kube 1.7 branch (53884). I am not sure if we have a plan for another origin rebase to latest kube 1.7, and if we would want to wait for that.

So this backports following 3 PRs:
kubernetes/kubernetes#53793 
kubernetes/kubernetes#53720 (partial)
kubernetes/kubernetes#53989
@wojtek-t wojtek-t deleted the separate_leader_election_in_scheduler branch February 1, 2018 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants