-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-architecture of scheduler perf tests to make them more extendable #44770
Re-architecture of scheduler perf tests to make them more extendable #44770
Conversation
Hi @ravisantoshgudimetla. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/cc @jayunit100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmarek this mostly looks good to me I have some nits . This implements the changes discussed with the exception of a couple of minor details (like taint predicates and force limitation to 1 logical operation) .
Notwithstanding a formal review I think it's basically exactly what we want. Ravi is already working on the next iteration but we thought we should get this out first since it's backwards compatible and not adding any new features .
// High Level Configuration for all predicates and priorities. | ||
type schedulerPerfConfig struct { | ||
NodeAffinity nodeAffinity | ||
InterpodAffinity interpodAffinity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InterPodAffinity, or just PodAffinity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree InterPodAffinity
type testConfig struct { | ||
numPods int | ||
numNodes int | ||
// Note: We don't need numPods, numNodes anymore in this struct but keeping them for backward compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where numNodes got replaced, i.e. how you set number of Nodes created? From what I can tell it's still used by the mutate
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the idea is to push these values into schedulerPerfConfig struct, so that there will be a single entry point into code base. Once we have them in that struct, we can remove them from the testConfig struct.(This will have fields that are specific to internal data representation like nodeStrategies etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea, but I don't care about having the comment for now;)
@@ -48,5 +48,6 @@ if ${RUN_BENCHMARK:-false}; then | |||
fi | |||
# Running density tests. It might take a long time. | |||
kube::log::status "performance test (density) start" | |||
go test -test.run=. -test.timeout=60m -test.short=false | |||
go test -test.timeout=60m -test.short=false -test.run=. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing. I was testing with few other changes and this got pushed. No need to touch this file as such. I will remove it.
|
||
// nodeAffinity priority configuration details. | ||
type nodeAffinity struct { | ||
Enabled bool //If not enabled, node affinity is disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just use a pointer in higher level struct instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For that matter interpret nil as "false"? That way no coupling of high struct to predicate/prio names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. We do it in number of places in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense going with nil. I will make that change.
// nodeAffinity priority configuration details. | ||
type nodeAffinity struct { | ||
Enabled bool //If not enabled, node affinity is disabled. | ||
numGroups int // the % of nodes and pods that should match. Higher # -> smaller performance deficit at scale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how this comment matches this variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments were originally my idea based on internal discussions, I agree they have less useful context here though... Ravi any ideas ? .md file or struct header comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A README.md would be a better place, I guess.
type nodeAffinity struct { | ||
Enabled bool //If not enabled, node affinity is disabled. | ||
numGroups int // the % of nodes and pods that should match. Higher # -> smaller performance deficit at scale. | ||
nodeAffinityKey string // the number of labels needed to match. Higher # -> larger performance deficit at scale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
) | ||
|
||
// High Level Configuration for all predicates and priorities. | ||
type schedulerPerfConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty non-extensible design - you allow only one NodeAffinity, and one PodAffinity. It may be fine, but it may bite us in the future as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree we could have multiple affinities. Are you suggesting a slice ? That's fine by me and not a major change to implement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could, via composition, compose multiple schedulerPerfConfigs to support scenarios where there are multiple competing node affinities in the future. Either way i think leaving as-is for now is good for the first iteration: We can expand it further in the future, and this is a step forward from what is already there... Is that ok with you @gmarek ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Composing multiple schedulerPerfConfigs doesn't sound too natural to me. It's probably better to have a slice here if you consider having multiple affinities in future (but this is actually hard problem, I don't really know how to solve correctly).
Generally it'd be great if you could discuss this config with @jeremyeder and @sjug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah composition is awkward once we have a 1->N relationship between multiple pod+node mutations, but i think the slice is also awkward since we arent sure how we want to distribute label 'intersections' yet.. sebastian isn't currently experimenting deeply with scheduler predicates per internal conv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, imo, There isnt much value in adding more cardinality to the nodeAffinity when we're only supporting one filter for now. i guess we will move forward and make handling the more complex scenarios our next priority for the improvements.
// interpodAffinity priority configuration details. | ||
type interpodAffinity struct { | ||
Enabled bool | ||
Operator metav1.LabelSelectorOperator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what the Operator is in the context of Affinity. Is this together with affinityKey
just a way to implement NodeSelector? Why not include it explicitly? Plus I believe it makes no sense in PodAffinity, as it's used only in the NodeAffinity if I'm not mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even for pod affinity, we need a label selector. So I guess, we need operator. I was looking at https://raw.githubusercontent.com/kubernetes/kubernetes.github.io/master/docs/concepts/configuration/pod-with-pod-affinity.yaml. Please let me know, if it is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you just need LabelSelector
. No need to inlining it's definition here.
} | ||
|
||
// readInConfiguration reads the input parameters for every predicate and priority that are needed for test run. | ||
// TODO: As of now, returning configs hardcoded, need to read from yaml or some other file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. First figure out how this should work when it's done, then we can discuss what part of this is needed for MVP. I want to see an interface first. We certainly don't want to loose ability to run tests without affinity, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think readInConfig to be necessary ; these tests can be extended easily by an engineer by simply writing a new test with a struct in it .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear ; I originally figured we should have default behavior for readInConfig ... but am now thinking it really should be moved out to be done later (or maybe never ;)) ; good point that the MVP shouldn't hint at a user interface .
@@ -44,8 +39,10 @@ func TestSchedule100Node3KPods(t *testing.T) { | |||
if testing.Short() { | |||
t.Skip("Skipping because we want to run short tests") | |||
} | |||
|
|||
config := defaultSchedulerBenchmarkConfig(100, 3000) | |||
config := baseConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of removing default...
function maybe just re-implement it with those 3 lines?
@jayunit100 @ravisantoshgudimetla - I have mostly comments to the API, rest looks more or less OK-ish. Thing I'd like to see would be actually handling inputs somehow + fixing the API. Those are somewhat coupled, as we need to understand what knobs there'll be to know how to expose them. |
nodePreparer testutils.TestNodePreparer | ||
podCreator *testutils.TestPodCreator | ||
schedulerSupportFunctions scheduler.Configurator | ||
destroyFunc func() | ||
} | ||
|
||
// baseConfig returns a minimal testConfig to be customized be different tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
|
||
|
||
|
||
// parseInput reads a configuration and then applies it to a test configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function isn't necessaey
@gmarek what kind of an interface did you want to see... one describing the mutation operations? or an interface for configuration that could be sent to a run() testing stub? I see potential value in both but just curious what your looking for . |
By interface I meant API:). Is this somehow connected with cluster-loader? It's trying to accomplish similar thing (i.e. specify how to load the cluster). @sjug @jeremyeder |
@gmarek @jayunit100 and I need to discuss further to see how it would be best to integrate scheduler-perf, I do like the concept. |
} | ||
return | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra empty lines here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanx.
05ba315
to
3e7b239
Compare
@gmarek @jayunit100 - This is ready for another round of review. I have addressed most of your comments. |
One comment, but except that it looks fine-ish, as long as you can come up with some consistent API with cluster loader @sjug @jeremyeder. |
@gmarek maybe cluster-loader could be able to leverage the predicate definitions structure as a mechansim for specifying and building its own pods ? Is that what your suggesting? That is a clean and decoupled way to share APIs I guess. |
No, it's not about decoupling. It's that we're creating to separate APIs to define how to put some load on the cluster. Of course cluster Loader is more complex, as it needs to handle way more stuff, but on a high level it does the same thing. I believe there can be some common ground here, but if both you and CL people decide that there's no point in working together then I'm fine with what's there now (modulo one comment I had). |
@gmarek gotcha; how do you feel about possibly creating a separate utility for load-generation that generates pods, nodes.... really dependent on either scheduler_perf or clusterloader ? Then we could both pull the same dependency............. created kubernetes/perf-tests#41 to discuss there. |
@gmarek @jayunit100 I am kind of inclined to think that the data structure that we are creating should be part of clusterloader or other utility(which can be imported as library) by both clusterloader and scheduler_test.go. This includes the mutatePod and mutateNode functions(or probably generators). LMK, what you think. |
@ravisantoshgudimetla I'm fine with whatever you decide together with @sjug and @jeremyeder. You work in the same company, so it might be easier for you to just talk internally:) |
d1146ad
to
fee3a13
Compare
816a554
to
a223d5b
Compare
a223d5b
to
509c2d3
Compare
@k8s-bot bazel test this |
@@ -1,5 +1,5 @@ | |||
/* | |||
Copyright 2015 The Kubernetes Authors. | |||
Copyright 2017 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? this should still be 2015
@@ -209,7 +195,7 @@ func schedulePods(config *testConfig) int32 { | |||
|
|||
// Bake in time for the first pod scheduling event. | |||
for { | |||
time.Sleep(50 * time.Millisecond) | |||
//time.Sleep(50 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we commenting this out? dont recall this from the last one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I tried multiple iterations and it worked even with this. So, commented that.(instead of deleting for future reference).
@@ -228,6 +214,7 @@ func schedulePods(config *testConfig) int32 { | |||
// This can potentially affect performance of scheduler, since List() is done under mutex. | |||
// Listing 10000 pods is an expensive operation, so running it frequently may impact scheduler. | |||
// TODO: Setup watch on apiserver and wait until all pods scheduled. | |||
time.Sleep(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unnecessary one. Will remove it. Thnx for catching it.
@@ -242,7 +229,7 @@ func schedulePods(config *testConfig) int32 { | |||
return minQps | |||
} | |||
|
|||
// There's no point in printing it for the last iteration, as the value is random | |||
// There's no point in printing it for the last iteration, as the\ value is random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this slash here for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo, I will remove it. Thnx.
2f75a8d
to
764a11a
Compare
764a11a
to
d3df9f5
Compare
I have few minor comments, but we can fix them later, so it LGTM now. |
/approve |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gmarek, jayunit100, ravisantoshgudimetla
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 45322, 44770, 45411) |
What this PR does / why we need it:
Special notes for your reviewer:
This is for re-architecture of scheduler, so that we can enable or disable certain predicates and priorities and see their impact.
Release note: