Added Bind method to SchedulerExtender interface #41447

ravigadde · 2017-02-14T23:24:58Z

Bind is used to let the extender know of the scheduling decision
Extender uses this to set aside resources against the pod

What this PR does / why we need it:

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Release note:

k8s-ci-robot · 2017-02-14T23:24:59Z

Hi @ravigadde. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-reviewable · 2017-02-14T23:25:05Z

This change is

k8s-github-robot · 2017-02-14T23:25:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

The following people have approved this PR: ravigadde

Needs approval from an approver in each of these OWNERS Files:

We suggest the following people:
cc @spxtr @wojtek-t @brendandburns
You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

ravigadde · 2017-02-14T23:25:47Z

cc @k82cn @davidopp @timstclair @sarat-k

k82cn

Great!! Just check the interface :). I'll check it more today.

k82cn · 2017-02-15T00:06:00Z

plugin/pkg/scheduler/generic_scheduler.go

+	}
+
+	if len(g.extenders) != 0 {
+		for _, extender := range g.extenders {


I'm thinking to implement Binder interface; HTTPBinder & DefaultBinder (call apiserver directly). We can put implementation in HTTPExtender.go. I'd like to re-use bind's goroutine to avoid sync execution.

k82cn · 2017-02-15T00:17:16Z

plugin/pkg/scheduler/algorithm/scheduler_interface.go

+	// Bind informs the extender about the scheduling decision, before the decision is
+	// conveyed to the apiserver. The binding information contains the pod and the
+	// selected node.
+	Bind(pod *v1.Pod, node string) error


Prefer to following interface:

Bind(pod *v1.Pod, nodeCandidates []*v1.Node) error

In extender, the node maybe invalid. If only one node, extender can not re-schedule to other node because unknown scheduler's predicates; it may introduce more community between scheduler & extender.

nodeCandidates are prepared by scheduler, so extender search the list to find a vaoid node. But it seems make affinity/ant-affinity hard to consider assumed Pods, let me check the code and back to you today.

Filter does that job. By the time it gets to Bind, node selection is done. The sequence of calls is 1) Filter 2) Prioritize and 3) Bind

Here's the case in my mind:

S->E: Filter request: n1, n2, n3, n4

E->S: n1, n2, n3

S->E: prioritize request: n1, n2, n3

E->S: n1, n2, n3

E: n1 is not valid, e.g. revoked by Mesos

S->E: bind Pod to n1

E->S: reject

If we changed step 6 to S-E: bind Pod to candidates: n1, n2, n3, extender will return successful at step 7, e.g. E-S: bind to n2. But the problem is that: the assumed pods are also considered for affinity/anti-affnity; if we assumed all candidates (e.g. n1, n2, n3), it'll make scheduler slower.

As a conclusion, I'm OK to keep current interface to send only one host: step 5 is less case compared to normal task launching :).

Yes, this behavior is no different with k8s apiserver. If the node became unschedulable after the scheduler selected it, the binding can fail and the pod is scheduled again.

wanghaoran1988 · 2017-02-17T15:47:23Z

plugin/pkg/scheduler/core/extender.go

+		Pod:  *pod,
+		Node: node,
+	}
+	return h.send(h.bindVerb, &binding, nil)


Why there is a nil, shouldn't the extender return something?

I am assuming a relevant http error code is adequate. Lmk if you think an explicit error is better.

It's ok is you use the response.status and response.statuscode, but the sent func will always return nil if your result param is null, that means the bind always succeed if the http request succeed.

Thanks @wanghaoran1988 I missed that. Added a check and an integration test case for bind error.

There is no data to be returned. Is http code adequate or should I add an explicit error? I assumed http error code is adequate.

wanghaoran1988 · 2017-02-18T06:34:56Z

plugin/pkg/scheduler/core/extender.go

-	if err != nil {
-		return err
-	}
+	if result != nil {


I am saying here, if the result is nil, always succeed.

davidopp · 2017-02-18T22:31:07Z

@kubernetes/sig-scheduling-pr-reviews

timothysc · 2017-02-24T00:37:37Z

@ravigadde Do you really want to get this in for 1.6?

ravigadde · 2017-02-24T01:47:01Z

@timothysc Yes, it simplifies the extender implementation quite a bit. I am waiting for #41119 to be merged to refresh this PR.

k82cn · 2017-02-24T01:49:30Z

@timothysc , that'll be great if in 1.6 for kube-mesos :).

timothysc · 2017-02-24T00:19:45Z

plugin/pkg/scheduler/api/types.go

@@ -118,6 +118,8 @@ type ExtenderConfig struct {
 	FilterVerb string
 	// Verb for the prioritize call, empty if not supported. This verb is appended to the URLPrefix when issuing the prioritize call to extender.
 	PrioritizeVerb string
+	// Verb for the bind call, empty if not supported. This verb is appended to the URLPrefix when issuing the bind call to extender.
+	BindVerb string


Is there really a case for something other than "bind" as the verb?

The convention today is that this combines both the extender's intent to handle this method and the path where it is handled. If an extender didn't care about handling Bind OR if its an existing extender that doesn't handle binds at all, this is not populated. It helps preserve backward compatibility also.

timothysc · 2017-02-24T16:32:08Z

test/integration/scheduler/extender_test.go

+	} else if strings.Contains(req.URL.Path, bind) {
+		var args schedulerapi.Binding
+
+		decoder := json.NewDecoder(req.Body)


More of a nit, but you have repeating blocks that can be pulled out above the 1st if.

timothysc · 2017-02-24T16:33:59Z

test/integration/scheduler/extender_test.go

+		t.Fatalf("Failed to create pod: %v", err)
+	}
+
+	time.Sleep(time.Second * 2)


Generally hard sleeps are discouraged in the tests if they can be avoided.

I will make another pass once the other PR is merged. I wanted to quickly put something together so folks get an idea on how much of a change this is (per the last sig discussion).

davidopp · 2017-02-28T01:33:40Z

ping -- there's approaching zero time left to get this into 1.6 -- need to act quickly

ravigadde · 2017-03-04T02:29:57Z

@k8s-bot cvm gke e2e test this
@k8s-bot gci gke e2e test this

k8s-ci-robot · 2017-03-04T02:29:58Z

@ravigadde: you can't request testing unless you are a kubernetes member.

In response to this comment:

@k8s-bot cvm gke e2e test this
@k8s-bot gci gke e2e test this

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

ravigadde · 2017-03-04T02:32:02Z

Rebased.. it will be great if any of you can retrigger tests

@k82cn @davidopp @timothysc

k82cn · 2017-03-04T02:32:40Z

@k8s-bot test this #IGNORE

ravigadde · 2017-03-04T03:54:12Z

I dont understand what the failure is:

I0303 19:09:54.214] Ran 164 of 212 Specs in 1782.522 seconds
I0303 19:09:54.214] SUCCESS! -- 164 Passed | 0 Failed | 0 Flaked | 0 Pending | 48 Skipped
I0303 19:09:54.214]
I0303 19:09:54.214] Ginkgo ran 1 suite in 29m45.287562542s
I0303 19:09:54.214] Test Suite Passed
I0303 19:09:54.214]
I0303 19:09:54.214] Success Finished Test Suite on Host tmp-node-e2e-6f15cd87-gci-stable-56-9000-84-2
I0303 19:09:54.214] <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
I0303 19:09:54.214] < FINISH TEST <
I0303 19:09:54.215] <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
I0303 19:09:54.215]
I0303 19:09:54.215] Failure: 1 errors encountered.
W0303 19:09:54.315] exit status 1
W0303 19:09:54.946] Traceback (most recent call last):
W0303 19:09:54.946] File "/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/./test-infra/jenkins/../scenarios/kubernetes_kubelet.py", line 128, in
W0303 19:09:54.947] ARGS.service_account,
W0303 19:09:54.947] File "/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/./test-infra/jenkins/../scenarios/kubernetes_kubelet.py", line 92, in main
W0303 19:09:54.947] img,
W0303 19:09:54.947] File "/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/./test-infra/jenkins/../scenarios/kubernetes_kubelet.py", line 44, in check
W0303 19:09:54.947] subprocess.check_call(cmd)
W0303 19:09:54.947] File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
W0303 19:09:54.947] raise CalledProcessError(retcode, cmd)
W0303 19:09:54.948] subprocess.CalledProcessError: Command '('docker', 'run', '--rm=true', '-v', '/etc/localtime:/etc/localtime:ro', '-v', '/var/run/docker.sock:/var/run/docker.sock', '-v', '/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/go/src/k8s.io/kubernetes:/go/src/k8s.io/kubernetes', '-v', '/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/go/src/k8s.io/kubernetes/_artifacts:/workspace/_artifacts', '-v', '/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri@tmp/secretFiles/3888997b-7871-4a2a-aaf3-29dba03dc5cb/kubernetes-jenkins-pull-c67f9519f993.json:/service-account.json:ro', '-v', '/home/jenkins/.ssh/google_compute_engine:/root/.ssh/google_compute_engine:ro', '-v', '/home/jenkins/.ssh/google_compute_engine.pub:/root/.ssh/google_compute_engine.pub:ro', '-e', 'GCE_USER=jenkins', '-e', 'GOOGLE_APPLICATION_CREDENTIALS=/service-account.json', '-e', 'JENKINS_GCE_SSH_PRIVATE_KEY_FILE=/root/.ssh/google_compute_engine', '-e', 'JENKINS_GCE_SSH_PUBLIC_KEY_FILE=/root/.ssh/google_compute_engine.pub', '-e', 'NODE_TEST_PROPERTIES=test/e2e_node/jenkins/cri_validation/jenkins-pull.properties', '-e', 'NODE_TEST_SCRIPT=./test/e2e_node/jenkins/e2e-node-jenkins.sh', '-e', 'REPO_DIR=/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/go/src/k8s.io/kubernetes', 'gcr.io/k8s-testimages/kubekins-node:1.6-latest')' returned non-zero exit status 1
E0303 19:09:54.953] Build failed

davidopp · 2017-03-04T20:14:16Z

The test failures (I clicked on the three links) seem to be flakes? I'm not sure, we can re-run the tests.

BTW I don't have time to review this carefully right now, but I noticed you decided to stop allowing multiple extenders. The scheduler extension API doesn't say it's alpha or beta, so people might be considering that it is GA (or at the very least it is not alpha). I think this means you need to continue to support the old API, i.e. multiple extenders, for backward compatibility (of course you don't need to support Bind method when people are using the old API).

davidopp · 2017-03-04T20:14:41Z

@k8s-bot cvm gke e2e test this
@k8s-bot gci gke e2e test this
@k8s-bot non-cri node e2e test this

k82cn · 2017-03-05T00:52:29Z

@k8s-bot gci gke e2e test this
@k8s-bot cvm gke e2e test this

k8s-ci-robot · 2017-03-05T02:06:47Z

@ravigadde: The following test(s) failed:

Test name	Commit	Details	Rerun command
Jenkins GCI GKE smoke e2e	`216acc9`	link	`@k8s-bot gci gke e2e test this`
Jenkins GKE smoke e2e	`216acc9`	link	`@k8s-bot cvm gke e2e test this`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

RenaudWasTaken · 2017-03-06T22:10:45Z

examples/scheduler-policy-config-with-extender.json

@@ -14,15 +14,15 @@
 	{"name" : "ServiceSpreadingPriority", "weight" : 1},
 	{"name" : "EqualPriority", "weight" : 1}
 	],
-"extenders":[ 
+"extender": 


Is there a design doc that explains the rational for the loss of multiple extenders :) ?

@RenaudWasTaken not aware of anyone using multiple extenders. Do you have a use case for it? Please see discussion on this PR about getting bind implemented.

- Bind is used to delegate the binding of pod to extender - Removed support for multiple extenders

ravigadde · 2017-03-07T23:03:43Z

@davidopp Please share your thoughts on this question I asked earlier when you get a chance. To implement Bind the way kube-mesos wants it (delegate Bind to extender as opposed to inform the extender of the Bind), we need to pick option #1 or #2 below.

Drop support for multiple extenders and implement Bind the way kube-mesos wants it. Not sure if anyone uses multiple extenders. But your point about not breaking backward compatibility is valid.
Check that only one extender can implement Bind method and offload Bind to that extender.
Keep Bind the same way it was originally implemented in this PR and invent another mechanism to satisfy the kube-mesos case.

Originally I implemented #3. Based on @timothysc's feedback, switched to #1. #2 may satisfy both concerns. I can implement in such a way that all extenders can be informed of the Bind, but atmost one is delegated to do the Bind with apiserver.

k82cn · 2017-03-13T02:21:05Z

so this will be 1.7?

davidopp · 2017-03-13T05:20:31Z

so this will be 1.7?

Yes, it's way too late for 1.6 and we don't add new features in patch releases, only bug fixes.

Sorry @ravigadde ...

davidopp · 2017-03-15T04:32:27Z

Sorry I haven't been following this carefully, did you decide to change extender so you can only have one per scheduler? I think this is a breaking change so probably not good. As an aside, @RenaudWasTaken has some use case that I believe involves multiple extenders. which he mentioned at the Resource Management Working Group meeting earlier today.

RenaudWasTaken · 2017-03-15T05:25:11Z

As a quick follow up on what @davidopp is saying, we believe that it might be interesting to use the extenders as "custom resource managers".
The idea being that you would have one per resource (e.g: GPU, FPGA, ...).

The "full" picture is here:
https://groups.google.com/forum/#!topic/kubernetes-sig-scheduling/a-nvvjjP70M

RenaudWasTaken · 2017-03-16T00:11:45Z

Reading this PR and the related issue (#41235), it looks like this is exactly what is needed for my use case.

However, @ravigadde solution's #3 or #2 would seem to fit more in my use case.

Additionally, it looks like the race condition that @ravigadde and @davidopp talked about would still present if #2 was chosen (as the Bind is ran in a goroutine).

Example:

Extender's Filter and Prioritize are run on Pod-1
Scheduler starts goroutine to bind Pod-1
Scheduler starts to Process next Pod (Pod-2)
Scheduler calls Extender's Filter on Pod-2
Scheduler calls Extender's Bind on Pod-1

k82cn · 2017-03-16T01:16:53Z

@RenaudWasTaken , the race condition is always here :). The point is about error handling, for example, if 3 extender, how to recover states if one of them failed. # 2 provide a central place for developer to keep sync. Cached states introduced several issue for kube-mesos.

ravigadde · 2017-03-17T05:57:05Z

@RenaudWasTaken @davidopp @k82cn Sorry for the delay in responding. I will implement Option #2 for 1.7, preserving support for multiple extenders.

@RenaudWasTaken If the extender serializes Binds, there is no race condition to deal with. If you have a specific situation in mind, please describe it with an example.

k8s-github-robot · 2017-03-30T20:00:42Z

@ravigadde PR needs rebase

ravigadde · 2017-04-24T21:18:52Z

Working on the 1.7 changes, which I will post in a few days

ravigadde · 2017-04-24T22:39:31Z

Submitted #44883 , closing this now.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 14, 2017

k8s-github-robot assigned wojtek-t Feb 14, 2017

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Feb 14, 2017

ravigadde mentioned this pull request Feb 14, 2017

Add Binder to HTTPExtender of scheduler #41235

Closed

k82cn reviewed Feb 15, 2017

View reviewed changes

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 15, 2017

k82cn mentioned this pull request Feb 15, 2017

Build HTTPExtender service kubernetes-retired/kube-mesos-framework#66

Closed

3 tasks

ravigadde force-pushed the bind branch from c52d61c to e32a9a6 Compare February 15, 2017 20:18

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 15, 2017

wanghaoran1988 reviewed Feb 17, 2017

View reviewed changes

wanghaoran1988 reviewed Feb 18, 2017

View reviewed changes

davidopp assigned davidopp and unassigned wojtek-t Feb 18, 2017

ravigadde force-pushed the bind branch from e32a9a6 to f806440 Compare February 19, 2017 00:31

timothysc added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 24, 2017

timothysc self-assigned this Feb 24, 2017

timothysc suggested changes Feb 24, 2017

View reviewed changes

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2017

ravigadde force-pushed the bind branch from f806440 to 02c44d2 Compare February 28, 2017 02:37

RenaudWasTaken reviewed Mar 6, 2017

View reviewed changes

Added Bind method to SchedulerExtender interface

05da185

- Bind is used to delegate the binding of pod to extender - Removed support for multiple extenders

ravigadde force-pushed the bind branch from 216acc9 to 05da185 Compare March 6, 2017 22:13

timothysc mentioned this pull request Mar 9, 2017

Updated with Bind information kubernetes/community#423

Merged

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 30, 2017

chakri-nelluri mentioned this pull request Mar 30, 2017

Add support for attacher/detacher interface in Flex volume #41804

Merged

grodrigues3 added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 5, 2017

ravigadde closed this Apr 24, 2017

ravigadde mentioned this pull request Apr 24, 2017

Added Bind method to Scheduler Extender #44883

Merged

k82cn mentioned this pull request Jun 28, 2017

Bind method in scheduler extender kubernetes/enhancements#270

Closed

Added Bind method to SchedulerExtender interface #41447

Added Bind method to SchedulerExtender interface #41447

Conversation

ravigadde commented Feb 14, 2017

k8s-ci-robot commented Feb 14, 2017

k8s-reviewable commented Feb 14, 2017

k8s-github-robot commented Feb 14, 2017

ravigadde commented Feb 14, 2017

k82cn left a comment

Choose a reason for hiding this comment

k82cn Feb 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidopp commented Feb 18, 2017

timothysc commented Feb 24, 2017

ravigadde commented Feb 24, 2017

k82cn commented Feb 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidopp commented Feb 28, 2017

ravigadde commented Mar 4, 2017

k8s-ci-robot commented Mar 4, 2017

ravigadde commented Mar 4, 2017

k82cn commented Mar 4, 2017

ravigadde commented Mar 4, 2017

davidopp commented Mar 4, 2017

davidopp commented Mar 4, 2017

k82cn commented Mar 5, 2017

k8s-ci-robot commented Mar 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ravigadde commented Mar 7, 2017

k82cn commented Mar 13, 2017

davidopp commented Mar 13, 2017

davidopp commented Mar 15, 2017

RenaudWasTaken commented Mar 15, 2017

RenaudWasTaken commented Mar 16, 2017 • edited Loading

k82cn commented Mar 16, 2017

ravigadde commented Mar 17, 2017

k8s-github-robot commented Mar 30, 2017

ravigadde commented Apr 24, 2017

ravigadde commented Apr 24, 2017

k82cn Feb 15, 2017 •

edited

Loading

RenaudWasTaken commented Mar 16, 2017 •

edited

Loading