Skip to content

Added Bind method to SchedulerExtender interface #41447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ravigadde
Copy link
Contributor

  • Bind is used to let the extender know of the scheduling decision
  • Extender uses this to set aside resources against the pod

What this PR does / why we need it:

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Release note:

@k8s-ci-robot
Copy link
Contributor

Hi @ravigadde. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 14, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

The following people have approved this PR: ravigadde

Needs approval from an approver in each of these OWNERS Files:

We suggest the following people:
cc @spxtr @wojtek-t @brendandburns
You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Feb 14, 2017
@ravigadde
Copy link
Contributor Author

Copy link
Member

@k82cn k82cn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!! Just check the interface :). I'll check it more today.

}

if len(g.extenders) != 0 {
for _, extender := range g.extenders {
Copy link
Member

@k82cn k82cn Feb 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking to implement Binder interface; HTTPBinder & DefaultBinder (call apiserver directly). We can put implementation in HTTPExtender.go. I'd like to re-use bind's goroutine to avoid sync execution.

// Bind informs the extender about the scheduling decision, before the decision is
// conveyed to the apiserver. The binding information contains the pod and the
// selected node.
Bind(pod *v1.Pod, node string) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to following interface:

Bind(pod *v1.Pod, nodeCandidates []*v1.Node) error

In extender, the node maybe invalid. If only one node, extender can not re-schedule to other node because unknown scheduler's predicates; it may introduce more community between scheduler & extender.

nodeCandidates are prepared by scheduler, so extender search the list to find a vaoid node. But it seems make affinity/ant-affinity hard to consider assumed Pods, let me check the code and back to you today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter does that job. By the time it gets to Bind, node selection is done. The sequence of calls is 1) Filter 2) Prioritize and 3) Bind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the case in my mind:

  1. S->E: Filter request: n1, n2, n3, n4
  2. E->S: n1, n2, n3
  3. S->E: prioritize request: n1, n2, n3
  4. E->S: n1, n2, n3
  5. E: n1 is not valid, e.g. revoked by Mesos
  6. S->E: bind Pod to n1
  7. E->S: reject

If we changed step 6 to S-E: bind Pod to candidates: n1, n2, n3, extender will return successful at step 7, e.g. E-S: bind to n2. But the problem is that: the assumed pods are also considered for affinity/anti-affnity; if we assumed all candidates (e.g. n1, n2, n3), it'll make scheduler slower.

As a conclusion, I'm OK to keep current interface to send only one host: step 5 is less case compared to normal task launching :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this behavior is no different with k8s apiserver. If the node became unschedulable after the scheduler selected it, the binding can fail and the pod is scheduled again.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 15, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 15, 2017
Pod: *pod,
Node: node,
}
return h.send(h.bindVerb, &binding, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is a nil, shouldn't the extender return something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming a relevant http error code is adequate. Lmk if you think an explicit error is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok is you use the response.status and response.statuscode, but the sent func will always return nil if your result param is null, that means the bind always succeed if the http request succeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wanghaoran1988 I missed that. Added a check and an integration test case for bind error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no data to be returned. Is http code adequate or should I add an explicit error? I assumed http error code is adequate.

if err != nil {
return err
}
if result != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am saying here, if the result is nil, always succeed.

@davidopp
Copy link
Member

@kubernetes/sig-scheduling-pr-reviews

@timothysc
Copy link
Member

@ravigadde Do you really want to get this in for 1.6?

@ravigadde
Copy link
Contributor Author

@timothysc Yes, it simplifies the extender implementation quite a bit. I am waiting for #41119 to be merged to refresh this PR.

@k82cn
Copy link
Member

k82cn commented Feb 24, 2017

@timothysc , that'll be great if in 1.6 for kube-mesos :).

@timothysc timothysc added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 24, 2017
@timothysc timothysc self-assigned this Feb 24, 2017
@@ -118,6 +118,8 @@ type ExtenderConfig struct {
FilterVerb string
// Verb for the prioritize call, empty if not supported. This verb is appended to the URLPrefix when issuing the prioritize call to extender.
PrioritizeVerb string
// Verb for the bind call, empty if not supported. This verb is appended to the URLPrefix when issuing the bind call to extender.
BindVerb string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really a case for something other than "bind" as the verb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention today is that this combines both the extender's intent to handle this method and the path where it is handled. If an extender didn't care about handling Bind OR if its an existing extender that doesn't handle binds at all, this is not populated. It helps preserve backward compatibility also.

} else if strings.Contains(req.URL.Path, bind) {
var args schedulerapi.Binding

decoder := json.NewDecoder(req.Body)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a nit, but you have repeating blocks that can be pulled out above the 1st if.

t.Fatalf("Failed to create pod: %v", err)
}

time.Sleep(time.Second * 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally hard sleeps are discouraged in the tests if they can be avoided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make another pass once the other PR is merged. I wanted to quickly put something together so folks get an idea on how much of a change this is (per the last sig discussion).

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2017
@davidopp
Copy link
Member

ping -- there's approaching zero time left to get this into 1.6 -- need to act quickly

@ravigadde
Copy link
Contributor Author

@k8s-bot cvm gke e2e test this
@k8s-bot gci gke e2e test this

@k8s-ci-robot
Copy link
Contributor

@ravigadde: you can't request testing unless you are a kubernetes member.

In response to this comment:

@k8s-bot cvm gke e2e test this
@k8s-bot gci gke e2e test this

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ravigadde
Copy link
Contributor Author

Rebased.. it will be great if any of you can retrigger tests

@k82cn @davidopp @timothysc

@k82cn
Copy link
Member

k82cn commented Mar 4, 2017

@k8s-bot test this #IGNORE

@ravigadde
Copy link
Contributor Author

I dont understand what the failure is:

I0303 19:09:54.214] Ran 164 of 212 Specs in 1782.522 seconds
I0303 19:09:54.214] SUCCESS! -- 164 Passed | 0 Failed | 0 Flaked | 0 Pending | 48 Skipped
I0303 19:09:54.214]
I0303 19:09:54.214] Ginkgo ran 1 suite in 29m45.287562542s
I0303 19:09:54.214] Test Suite Passed
I0303 19:09:54.214]
I0303 19:09:54.214] Success Finished Test Suite on Host tmp-node-e2e-6f15cd87-gci-stable-56-9000-84-2
I0303 19:09:54.214] <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
I0303 19:09:54.214] < FINISH TEST <
I0303 19:09:54.215] <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
I0303 19:09:54.215]
I0303 19:09:54.215] Failure: 1 errors encountered.
W0303 19:09:54.315] exit status 1
W0303 19:09:54.946] Traceback (most recent call last):
W0303 19:09:54.946] File "/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/./test-infra/jenkins/../scenarios/kubernetes_kubelet.py", line 128, in
W0303 19:09:54.947] ARGS.service_account,
W0303 19:09:54.947] File "/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/./test-infra/jenkins/../scenarios/kubernetes_kubelet.py", line 92, in main
W0303 19:09:54.947] img,
W0303 19:09:54.947] File "/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/./test-infra/jenkins/../scenarios/kubernetes_kubelet.py", line 44, in check
W0303 19:09:54.947] subprocess.check_call(cmd)
W0303 19:09:54.947] File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
W0303 19:09:54.947] raise CalledProcessError(retcode, cmd)
W0303 19:09:54.948] subprocess.CalledProcessError: Command '('docker', 'run', '--rm=true', '-v', '/etc/localtime:/etc/localtime:ro', '-v', '/var/run/docker.sock:/var/run/docker.sock', '-v', '/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/go/src/k8s.io/kubernetes:/go/src/k8s.io/kubernetes', '-v', '/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/go/src/k8s.io/kubernetes/_artifacts:/workspace/_artifacts', '-v', '/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri@tmp/secretFiles/3888997b-7871-4a2a-aaf3-29dba03dc5cb/kubernetes-jenkins-pull-c67f9519f993.json:/service-account.json:ro', '-v', '/home/jenkins/.ssh/google_compute_engine:/root/.ssh/google_compute_engine:ro', '-v', '/home/jenkins/.ssh/google_compute_engine.pub:/root/.ssh/google_compute_engine.pub:ro', '-e', 'GCE_USER=jenkins', '-e', 'GOOGLE_APPLICATION_CREDENTIALS=/service-account.json', '-e', 'JENKINS_GCE_SSH_PRIVATE_KEY_FILE=/root/.ssh/google_compute_engine', '-e', 'JENKINS_GCE_SSH_PUBLIC_KEY_FILE=/root/.ssh/google_compute_engine.pub', '-e', 'NODE_TEST_PROPERTIES=test/e2e_node/jenkins/cri_validation/jenkins-pull.properties', '-e', 'NODE_TEST_SCRIPT=./test/e2e_node/jenkins/e2e-node-jenkins.sh', '-e', 'REPO_DIR=/var/lib/jenkins/workspace/pull-kubernetes-node-e2e-non-cri/go/src/k8s.io/kubernetes', 'gcr.io/k8s-testimages/kubekins-node:1.6-latest')' returned non-zero exit status 1
E0303 19:09:54.953] Build failed

@davidopp
Copy link
Member

davidopp commented Mar 4, 2017

The test failures (I clicked on the three links) seem to be flakes? I'm not sure, we can re-run the tests.

BTW I don't have time to review this carefully right now, but I noticed you decided to stop allowing multiple extenders. The scheduler extension API doesn't say it's alpha or beta, so people might be considering that it is GA (or at the very least it is not alpha). I think this means you need to continue to support the old API, i.e. multiple extenders, for backward compatibility (of course you don't need to support Bind method when people are using the old API).

@davidopp
Copy link
Member

davidopp commented Mar 4, 2017

@k8s-bot cvm gke e2e test this
@k8s-bot gci gke e2e test this
@k8s-bot non-cri node e2e test this

@k82cn
Copy link
Member

k82cn commented Mar 5, 2017

@k8s-bot gci gke e2e test this
@k8s-bot cvm gke e2e test this

@k8s-ci-robot
Copy link
Contributor

@ravigadde: The following test(s) failed:

Test name Commit Details Rerun command
Jenkins GCI GKE smoke e2e 216acc9 link @k8s-bot gci gke e2e test this
Jenkins GKE smoke e2e 216acc9 link @k8s-bot cvm gke e2e test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -14,15 +14,15 @@
{"name" : "ServiceSpreadingPriority", "weight" : 1},
{"name" : "EqualPriority", "weight" : 1}
],
"extenders":[
"extender":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a design doc that explains the rational for the loss of multiple extenders :) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RenaudWasTaken not aware of anyone using multiple extenders. Do you have a use case for it? Please see discussion on this PR about getting bind implemented.

- Bind is used to delegate the binding of pod to extender
- Removed support for multiple extenders
@ravigadde
Copy link
Contributor Author

@davidopp Please share your thoughts on this question I asked earlier when you get a chance. To implement Bind the way kube-mesos wants it (delegate Bind to extender as opposed to inform the extender of the Bind), we need to pick option #1 or #2 below.

  1. Drop support for multiple extenders and implement Bind the way kube-mesos wants it. Not sure if anyone uses multiple extenders. But your point about not breaking backward compatibility is valid.
  2. Check that only one extender can implement Bind method and offload Bind to that extender.
  3. Keep Bind the same way it was originally implemented in this PR and invent another mechanism to satisfy the kube-mesos case.

Originally I implemented #3. Based on @timothysc's feedback, switched to #1. #2 may satisfy both concerns. I can implement in such a way that all extenders can be informed of the Bind, but atmost one is delegated to do the Bind with apiserver.

@k82cn
Copy link
Member

k82cn commented Mar 13, 2017

so this will be 1.7?

@davidopp
Copy link
Member

so this will be 1.7?

Yes, it's way too late for 1.6 and we don't add new features in patch releases, only bug fixes.

Sorry @ravigadde ...

@davidopp
Copy link
Member

Sorry I haven't been following this carefully, did you decide to change extender so you can only have one per scheduler? I think this is a breaking change so probably not good. As an aside, @RenaudWasTaken has some use case that I believe involves multiple extenders. which he mentioned at the Resource Management Working Group meeting earlier today.

@RenaudWasTaken
Copy link
Contributor

As a quick follow up on what @davidopp is saying, we believe that it might be interesting to use the extenders as "custom resource managers".
The idea being that you would have one per resource (e.g: GPU, FPGA, ...).

The "full" picture is here:
https://groups.google.com/forum/#!topic/kubernetes-sig-scheduling/a-nvvjjP70M

@RenaudWasTaken
Copy link
Contributor

RenaudWasTaken commented Mar 16, 2017

Reading this PR and the related issue (#41235), it looks like this is exactly what is needed for my use case.

However, @ravigadde solution's #3 or #2 would seem to fit more in my use case.

Additionally, it looks like the race condition that @ravigadde and @davidopp talked about would still present if #2 was chosen (as the Bind is ran in a goroutine).

Example:

  1. Extender's Filter and Prioritize are run on Pod-1

  2. Scheduler starts goroutine to bind Pod-1

  3. Scheduler starts to Process next Pod (Pod-2)

  4. Scheduler calls Extender's Filter on Pod-2

  5. Scheduler calls Extender's Bind on Pod-1

@k82cn
Copy link
Member

k82cn commented Mar 16, 2017

@RenaudWasTaken , the race condition is always here :). The point is about error handling, for example, if 3 extender, how to recover states if one of them failed. # 2 provide a central place for developer to keep sync. Cached states introduced several issue for kube-mesos.

@ravigadde
Copy link
Contributor Author

@RenaudWasTaken @davidopp @k82cn Sorry for the delay in responding. I will implement Option #2 for 1.7, preserving support for multiple extenders.

@RenaudWasTaken If the extender serializes Binds, there is no race condition to deal with. If you have a specific situation in mind, please describe it with an example.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 30, 2017
@k8s-github-robot
Copy link

@ravigadde PR needs rebase

@grodrigues3 grodrigues3 added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 5, 2017
@ravigadde
Copy link
Contributor Author

Working on the 1.7 changes, which I will post in a few days

@ravigadde
Copy link
Contributor Author

Submitted #44883 , closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.