Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-architecture of scheduler perf tests to make them more extendable #44770

Conversation

ravisantoshgudimetla
Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla commented Apr 21, 2017

What this PR does / why we need it:

Special notes for your reviewer:
This is for re-architecture of scheduler, so that we can enable or disable certain predicates and priorities and see their impact.

Release note:

Scheduler perf modular extensions.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 21, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Hi @ravisantoshgudimetla. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 21, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 21, 2017
@ravisantoshgudimetla
Copy link
Contributor Author

/cc @jayunit100

Copy link
Member

@jayunit100 jayunit100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarek this mostly looks good to me I have some nits . This implements the changes discussed with the exception of a couple of minor details (like taint predicates and force limitation to 1 logical operation) .

Notwithstanding a formal review I think it's basically exactly what we want. Ravi is already working on the next iteration but we thought we should get this out first since it's backwards compatible and not adding any new features .

@0xmichalis 0xmichalis assigned gmarek and unassigned 0xmichalis Apr 23, 2017
// High Level Configuration for all predicates and priorities.
type schedulerPerfConfig struct {
NodeAffinity nodeAffinity
InterpodAffinity interpodAffinity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InterPodAffinity, or just PodAffinity

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree InterPodAffinity

type testConfig struct {
numPods int
numNodes int
// Note: We don't need numPods, numNodes anymore in this struct but keeping them for backward compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where numNodes got replaced, i.e. how you set number of Nodes created? From what I can tell it's still used by the mutate function.

Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the idea is to push these values into schedulerPerfConfig struct, so that there will be a single entry point into code base. Once we have them in that struct, we can remove them from the testConfig struct.(This will have fields that are specific to internal data representation like nodeStrategies etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea, but I don't care about having the comment for now;)

@@ -48,5 +48,6 @@ if ${RUN_BENCHMARK:-false}; then
fi
# Running density tests. It might take a long time.
kube::log::status "performance test (density) start"
go test -test.run=. -test.timeout=60m -test.short=false
go test -test.timeout=60m -test.short=false -test.run=.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing. I was testing with few other changes and this got pushed. No need to touch this file as such. I will remove it.


// nodeAffinity priority configuration details.
type nodeAffinity struct {
Enabled bool //If not enabled, node affinity is disabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just use a pointer in higher level struct instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that matter interpret nil as "false"? That way no coupling of high struct to predicate/prio names

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. We do it in number of places in the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense going with nil. I will make that change.

// nodeAffinity priority configuration details.
type nodeAffinity struct {
Enabled bool //If not enabled, node affinity is disabled.
numGroups int // the % of nodes and pods that should match. Higher # -> smaller performance deficit at scale.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how this comment matches this variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments were originally my idea based on internal discussions, I agree they have less useful context here though... Ravi any ideas ? .md file or struct header comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A README.md would be a better place, I guess.

type nodeAffinity struct {
Enabled bool //If not enabled, node affinity is disabled.
numGroups int // the % of nodes and pods that should match. Higher # -> smaller performance deficit at scale.
nodeAffinityKey string // the number of labels needed to match. Higher # -> larger performance deficit at scale.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

)

// High Level Configuration for all predicates and priorities.
type schedulerPerfConfig struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty non-extensible design - you allow only one NodeAffinity, and one PodAffinity. It may be fine, but it may bite us in the future as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree we could have multiple affinities. Are you suggesting a slice ? That's fine by me and not a major change to implement

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could, via composition, compose multiple schedulerPerfConfigs to support scenarios where there are multiple competing node affinities in the future. Either way i think leaving as-is for now is good for the first iteration: We can expand it further in the future, and this is a step forward from what is already there... Is that ok with you @gmarek ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Composing multiple schedulerPerfConfigs doesn't sound too natural to me. It's probably better to have a slice here if you consider having multiple affinities in future (but this is actually hard problem, I don't really know how to solve correctly).

Generally it'd be great if you could discuss this config with @jeremyeder and @sjug.

Copy link
Member

@jayunit100 jayunit100 Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah composition is awkward once we have a 1->N relationship between multiple pod+node mutations, but i think the slice is also awkward since we arent sure how we want to distribute label 'intersections' yet.. sebastian isn't currently experimenting deeply with scheduler predicates per internal conv.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, imo, There isnt much value in adding more cardinality to the nodeAffinity when we're only supporting one filter for now. i guess we will move forward and make handling the more complex scenarios our next priority for the improvements.

// interpodAffinity priority configuration details.
type interpodAffinity struct {
Enabled bool
Operator metav1.LabelSelectorOperator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what the Operator is in the context of Affinity. Is this together with affinityKey just a way to implement NodeSelector? Why not include it explicitly? Plus I believe it makes no sense in PodAffinity, as it's used only in the NodeAffinity if I'm not mistaken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for pod affinity, we need a label selector. So I guess, we need operator. I was looking at https://raw.githubusercontent.com/kubernetes/kubernetes.github.io/master/docs/concepts/configuration/pod-with-pod-affinity.yaml. Please let me know, if it is not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you just need LabelSelector. No need to inlining it's definition here.

}

// readInConfiguration reads the input parameters for every predicate and priority that are needed for test run.
// TODO: As of now, returning configs hardcoded, need to read from yaml or some other file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. First figure out how this should work when it's done, then we can discuss what part of this is needed for MVP. I want to see an interface first. We certainly don't want to loose ability to run tests without affinity, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think readInConfig to be necessary ; these tests can be extended easily by an engineer by simply writing a new test with a struct in it .

Copy link
Member

@jayunit100 jayunit100 Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear ; I originally figured we should have default behavior for readInConfig ... but am now thinking it really should be moved out to be done later (or maybe never ;)) ; good point that the MVP shouldn't hint at a user interface .

@@ -44,8 +39,10 @@ func TestSchedule100Node3KPods(t *testing.T) {
if testing.Short() {
t.Skip("Skipping because we want to run short tests")
}

config := defaultSchedulerBenchmarkConfig(100, 3000)
config := baseConfig()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of removing default... function maybe just re-implement it with those 3 lines?

@gmarek
Copy link
Contributor

gmarek commented Apr 24, 2017

@jayunit100 @ravisantoshgudimetla - I have mostly comments to the API, rest looks more or less OK-ish. Thing I'd like to see would be actually handling inputs somehow + fixing the API. Those are somewhat coupled, as we need to understand what knobs there'll be to know how to expose them.

nodePreparer testutils.TestNodePreparer
podCreator *testutils.TestPodCreator
schedulerSupportFunctions scheduler.Configurator
destroyFunc func()
}

// baseConfig returns a minimal testConfig to be customized be different tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo




// parseInput reads a configuration and then applies it to a test configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function isn't necessaey

@jayunit100
Copy link
Member

@gmarek what kind of an interface did you want to see... one describing the mutation operations? or an interface for configuration that could be sent to a run() testing stub?

I see potential value in both but just curious what your looking for .

@gmarek
Copy link
Contributor

gmarek commented Apr 24, 2017

By interface I meant API:). Is this somehow connected with cluster-loader? It's trying to accomplish similar thing (i.e. specify how to load the cluster). @sjug @jeremyeder

@sjug
Copy link
Contributor

sjug commented Apr 24, 2017

@gmarek @jayunit100 and I need to discuss further to see how it would be best to integrate scheduler-perf, I do like the concept.

}
return
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra empty lines here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanx.

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the scheduler_perf_tests_makeover branch 3 times, most recently from 05ba315 to 3e7b239 Compare April 25, 2017 01:17
@ravisantoshgudimetla
Copy link
Contributor Author

@gmarek @jayunit100 - This is ready for another round of review. I have addressed most of your comments.

@gmarek
Copy link
Contributor

gmarek commented Apr 25, 2017

One comment, but except that it looks fine-ish, as long as you can come up with some consistent API with cluster loader @sjug @jeremyeder.

@jayunit100
Copy link
Member

@gmarek maybe cluster-loader could be able to leverage the predicate definitions structure as a mechansim for specifying and building its own pods ? Is that what your suggesting? That is a clean and decoupled way to share APIs I guess.

@gmarek
Copy link
Contributor

gmarek commented Apr 25, 2017

No, it's not about decoupling. It's that we're creating to separate APIs to define how to put some load on the cluster. Of course cluster Loader is more complex, as it needs to handle way more stuff, but on a high level it does the same thing. I believe there can be some common ground here, but if both you and CL people decide that there's no point in working together then I'm fine with what's there now (modulo one comment I had).

@jayunit100
Copy link
Member

jayunit100 commented Apr 25, 2017

@gmarek gotcha; how do you feel about possibly creating a separate utility for load-generation that generates pods, nodes.... really dependent on either scheduler_perf or clusterloader ? Then we could both pull the same dependency............. created kubernetes/perf-tests#41 to discuss there.

@ravisantoshgudimetla
Copy link
Contributor Author

@gmarek @jayunit100 I am kind of inclined to think that the data structure that we are creating should be part of clusterloader or other utility(which can be imported as library) by both clusterloader and scheduler_test.go. This includes the mutatePod and mutateNode functions(or probably generators). LMK, what you think.

@gmarek
Copy link
Contributor

gmarek commented Apr 27, 2017

@ravisantoshgudimetla I'm fine with whatever you decide together with @sjug and @jeremyeder. You work in the same company, so it might be easier for you to just talk internally:)

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 4, 2017
@ravisantoshgudimetla ravisantoshgudimetla force-pushed the scheduler_perf_tests_makeover branch from d1146ad to fee3a13 Compare May 4, 2017 22:07
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 4, 2017
@ravisantoshgudimetla ravisantoshgudimetla force-pushed the scheduler_perf_tests_makeover branch from 816a554 to a223d5b Compare May 4, 2017 23:56
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 4, 2017
@ravisantoshgudimetla ravisantoshgudimetla force-pushed the scheduler_perf_tests_makeover branch from a223d5b to 509c2d3 Compare May 5, 2017 00:20
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2017
@jayunit100
Copy link
Member

@k8s-bot bazel test this

@@ -1,5 +1,5 @@
/*
Copyright 2015 The Kubernetes Authors.
Copyright 2017 The Kubernetes Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? this should still be 2015

@@ -209,7 +195,7 @@ func schedulePods(config *testConfig) int32 {

// Bake in time for the first pod scheduling event.
for {
time.Sleep(50 * time.Millisecond)
//time.Sleep(50 * time.Millisecond)
Copy link
Member

@jayunit100 jayunit100 May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we commenting this out? dont recall this from the last one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I tried multiple iterations and it worked even with this. So, commented that.(instead of deleting for future reference).

@@ -228,6 +214,7 @@ func schedulePods(config *testConfig) int32 {
// This can potentially affect performance of scheduler, since List() is done under mutex.
// Listing 10000 pods is an expensive operation, so running it frequently may impact scheduler.
// TODO: Setup watch on apiserver and wait until all pods scheduled.
time.Sleep(10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above... ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unnecessary one. Will remove it. Thnx for catching it.

@@ -242,7 +229,7 @@ func schedulePods(config *testConfig) int32 {
return minQps
}

// There's no point in printing it for the last iteration, as the value is random
// There's no point in printing it for the last iteration, as the\ value is random
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this slash here for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, I will remove it. Thnx.

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the scheduler_perf_tests_makeover branch 2 times, most recently from 2f75a8d to 764a11a Compare May 5, 2017 01:50
@ravisantoshgudimetla ravisantoshgudimetla force-pushed the scheduler_perf_tests_makeover branch from 764a11a to d3df9f5 Compare May 5, 2017 01:51
@gmarek
Copy link
Contributor

gmarek commented May 5, 2017

I have few minor comments, but we can fix them later, so it LGTM now.

@gmarek gmarek assigned jayunit100 and unassigned saad-ali May 5, 2017
@gmarek
Copy link
Contributor

gmarek commented May 5, 2017

/approve

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2017
@jayunit100
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gmarek, jayunit100, ravisantoshgudimetla

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 45322, 44770, 45411)

@k8s-github-robot k8s-github-robot merged commit a8522b0 into kubernetes:master May 5, 2017
@ravisantoshgudimetla ravisantoshgudimetla deleted the scheduler_perf_tests_makeover branch May 7, 2017 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants