-
Notifications
You must be signed in to change notification settings - Fork 40.6k
Added Bind method to SchedulerExtender interface #41447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @ravigadde. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is NOT APPROVED The following people have approved this PR: ravigadde Needs approval from an approver in each of these OWNERS Files: We suggest the following people: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!! Just check the interface :). I'll check it more today.
} | ||
|
||
if len(g.extenders) != 0 { | ||
for _, extender := range g.extenders { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking to implement Binder
interface; HTTPBinder & DefaultBinder (call apiserver directly). We can put implementation in HTTPExtender.go
. I'd like to re-use bind's goroutine to avoid sync execution.
// Bind informs the extender about the scheduling decision, before the decision is | ||
// conveyed to the apiserver. The binding information contains the pod and the | ||
// selected node. | ||
Bind(pod *v1.Pod, node string) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to following interface:
Bind(pod *v1.Pod, nodeCandidates []*v1.Node) error
In extender, the node
maybe invalid. If only one node, extender can not re-schedule to other node because unknown scheduler's predicates; it may introduce more community between scheduler & extender.
nodeCandidates
are prepared by scheduler, so extender search the list to find a vaoid node. But it seems make affinity/ant-affinity hard to consider assumed Pods, let me check the code and back to you today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filter does that job. By the time it gets to Bind, node selection is done. The sequence of calls is 1) Filter 2) Prioritize and 3) Bind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the case in my mind:
- S->E: Filter request: n1, n2, n3, n4
- E->S: n1, n2, n3
- S->E: prioritize request: n1, n2, n3
- E->S: n1, n2, n3
- E: n1 is not valid, e.g. revoked by Mesos
- S->E: bind Pod to n1
- E->S: reject
If we changed step 6 to S-E: bind Pod to candidates: n1, n2, n3
, extender will return successful at step 7, e.g. E-S: bind to n2
. But the problem is that: the assumed pods are also considered for affinity/anti-affnity; if we assumed all candidates (e.g. n1, n2, n3), it'll make scheduler slower.
As a conclusion, I'm OK to keep current interface to send only one host: step 5 is less case compared to normal task launching :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this behavior is no different with k8s apiserver. If the node became unschedulable after the scheduler selected it, the binding can fail and the pod is scheduled again.
Pod: *pod, | ||
Node: node, | ||
} | ||
return h.send(h.bindVerb, &binding, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there is a nil, shouldn't the extender return something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming a relevant http error code is adequate. Lmk if you think an explicit error is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok is you use the response.status and response.statuscode, but the sent func will always return nil if your result param is null, that means the bind always succeed if the http request succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wanghaoran1988 I missed that. Added a check and an integration test case for bind error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no data to be returned. Is http code adequate or should I add an explicit error? I assumed http error code is adequate.
if err != nil { | ||
return err | ||
} | ||
if result != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am saying here, if the result is nil, always succeed.
@kubernetes/sig-scheduling-pr-reviews |
@ravigadde Do you really want to get this in for 1.6? |
@timothysc Yes, it simplifies the extender implementation quite a bit. I am waiting for #41119 to be merged to refresh this PR. |
@timothysc , that'll be great if in 1.6 for kube-mesos :). |
plugin/pkg/scheduler/api/types.go
Outdated
@@ -118,6 +118,8 @@ type ExtenderConfig struct { | |||
FilterVerb string | |||
// Verb for the prioritize call, empty if not supported. This verb is appended to the URLPrefix when issuing the prioritize call to extender. | |||
PrioritizeVerb string | |||
// Verb for the bind call, empty if not supported. This verb is appended to the URLPrefix when issuing the bind call to extender. | |||
BindVerb string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there really a case for something other than "bind" as the verb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention today is that this combines both the extender's intent to handle this method and the path where it is handled. If an extender didn't care about handling Bind OR if its an existing extender that doesn't handle binds at all, this is not populated. It helps preserve backward compatibility also.
} else if strings.Contains(req.URL.Path, bind) { | ||
var args schedulerapi.Binding | ||
|
||
decoder := json.NewDecoder(req.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a nit, but you have repeating blocks that can be pulled out above the 1st if.
t.Fatalf("Failed to create pod: %v", err) | ||
} | ||
|
||
time.Sleep(time.Second * 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally hard sleeps are discouraged in the tests if they can be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make another pass once the other PR is merged. I wanted to quickly put something together so folks get an idea on how much of a change this is (per the last sig discussion).
ping -- there's approaching zero time left to get this into 1.6 -- need to act quickly |
@ravigadde: you can't request testing unless you are a kubernetes member. In response to this comment:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Rebased.. it will be great if any of you can retrigger tests |
@k8s-bot test this #IGNORE |
I dont understand what the failure is: I0303 19:09:54.214] Ran 164 of 212 Specs in 1782.522 seconds |
The test failures (I clicked on the three links) seem to be flakes? I'm not sure, we can re-run the tests. BTW I don't have time to review this carefully right now, but I noticed you decided to stop allowing multiple extenders. The scheduler extension API doesn't say it's alpha or beta, so people might be considering that it is GA (or at the very least it is not alpha). I think this means you need to continue to support the old API, i.e. multiple extenders, for backward compatibility (of course you don't need to support Bind method when people are using the old API). |
@ravigadde: The following test(s) failed:
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@@ -14,15 +14,15 @@ | |||
{"name" : "ServiceSpreadingPriority", "weight" : 1}, | |||
{"name" : "EqualPriority", "weight" : 1} | |||
], | |||
"extenders":[ | |||
"extender": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a design doc that explains the rational for the loss of multiple extenders :) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RenaudWasTaken not aware of anyone using multiple extenders. Do you have a use case for it? Please see discussion on this PR about getting bind implemented.
- Bind is used to delegate the binding of pod to extender - Removed support for multiple extenders
@davidopp Please share your thoughts on this question I asked earlier when you get a chance. To implement Bind the way kube-mesos wants it (delegate Bind to extender as opposed to inform the extender of the Bind), we need to pick option #1 or #2 below.
Originally I implemented #3. Based on @timothysc's feedback, switched to #1. #2 may satisfy both concerns. I can implement in such a way that all extenders can be informed of the Bind, but atmost one is delegated to do the Bind with apiserver. |
so this will be 1.7? |
Yes, it's way too late for 1.6 and we don't add new features in patch releases, only bug fixes. Sorry @ravigadde ... |
Sorry I haven't been following this carefully, did you decide to change extender so you can only have one per scheduler? I think this is a breaking change so probably not good. As an aside, @RenaudWasTaken has some use case that I believe involves multiple extenders. which he mentioned at the Resource Management Working Group meeting earlier today. |
As a quick follow up on what @davidopp is saying, we believe that it might be interesting to use the extenders as "custom resource managers". The "full" picture is here: |
Reading this PR and the related issue (#41235), it looks like this is exactly what is needed for my use case. However, @ravigadde solution's #3 or #2 would seem to fit more in my use case. Additionally, it looks like the race condition that @ravigadde and @davidopp talked about would still present if #2 was chosen (as the Bind is ran in a goroutine). Example:
|
@RenaudWasTaken , the race condition is always here :). The point is about error handling, for example, if 3 extender, how to recover states if one of them failed. # 2 provide a central place for developer to keep sync. Cached states introduced several issue for kube-mesos. |
@RenaudWasTaken @davidopp @k82cn Sorry for the delay in responding. I will implement Option #2 for 1.7, preserving support for multiple extenders. @RenaudWasTaken If the extender serializes Binds, there is no race condition to deal with. If you have a specific situation in mind, please describe it with an example. |
@ravigadde PR needs rebase |
Working on the 1.7 changes, which I will post in a few days |
Submitted #44883 , closing this now. |
What this PR does / why we need it:
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer:
Release note: