Use monotonically increasing generation to prevent equivalence cache race #67308

cofyc · 2018-08-12T04:27:30Z

What this PR does / why we need it:

Use monotonically increasing generation to prevent equivalence cache race.

Because invalidating predicate cache invalidates latest cache, but in-flight predicate goroutines may write stale results into cache. This may happen when events occur frequently:

objectVersion	invalidation goroutine	predicate goroutine	cacheVersion	status
v0 -> v1	cache invalidated	begin to calculate result.v1	cache.v0	OK
v1 -> v2	cache invalidated	begin to write result.v1 into cache	no cache	OK
v2		result.v1 wrote into cache	cache.v1	cache mismatch

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #67260

Special notes for your reviewer:

Release note:

Use monotonically increasing generation to prevent scheduler equivalence cache race.

cofyc · 2018-08-12T04:29:39Z

@kubernetes/sig-scheduling-pr-reviews
/assign @msau42

msau42 · 2018-08-13T20:52:17Z

/assign @resouer @misterikkit

yastij

Just a nit

yastij · 2018-08-13T21:23:29Z

pkg/scheduler/core/equivalence/eqivalence.go

Rename to receivedInvalidationRequests

misterikkit · 2018-08-13T21:55:14Z

pkg/scheduler/core/equivalence/eqivalence.go

It seems like anybody who does a lookup will clear the flag for that predicate, causing the next updater to assume that the cache has not been invalidated.

yes, in theory the flag should be per cache result (per predicate per node and per pod),
but it will be too complex and current per predicate per node flag will not have this problem, because scheduler run predicates for nodes in parallel but for pods sequentially, so if there is per predicate per node goroutine is running, it's impossible that there will be another goroutine to lookup result for this predicate on this node.

misterikkit · 2018-08-13T21:59:32Z

Hi, have you confirmed that this error can happen? Bonus points if you can add a test that triggers the error.

The reason I ask is that we tried to address this specific problem in #63040. Perhaps the flaky test is due to an invalidation event that we aren't observing properly?

/cc @bsalamat

cofyc · 2018-08-14T07:16:07Z

@misterikkit Add a integration test in e3f6c61d10342cd218f8741ea403bc509c0a98e2.
This test will fail without this feature.

cofyc · 2018-08-14T09:15:54Z

/test pull-kubernetes-e2e-kops-aws

resouer · 2018-08-14T14:29:06Z

/test pull-kubernetes-e2e-kops-aws

resouer · 2018-08-14T14:40:12Z

pkg/scheduler/core/equivalence/eqivalence.go

lookupResult is in the critical path during scheduling, but this delete operation requires mu.Lock() instead of mu.RLock().

We may need evaluation of performance impact here.

~~With new solution misterikkit suggested, RW lock is not required now.~~
It's not right to get generation in lookupResult, because invalidations may happen after snapshotting scheduler cache and before lookupResult.

misterikkit · 2018-08-14T18:10:51Z

Just to add some more context, I believe you when you say this is a problem, but I don't believe this PR is the right fix. We tried to fix this exact problem in #63040, so my assumption is that something about that fix needs to be corrected.

misterikkit · 2018-08-14T23:56:56Z

Just to add some more context, I believe you when you say this is a problem, but I don't believe this PR is the right fix.

Time to eat my words. 😄 As mentioned in
#67260, this is an old bug whose fix was not 100% correct.

To merge this PR, I would like it to replace the incorrect fix from #63040. I also have some other comments on the code changes.

misterikkit

As I mentioned in other comments, I would like this fix to replace the fix for #62921

In particular, I think we should,

add atomic generation numbers to the eCache
snapshot the current generation numbers before snapshotting the scheduler cache
compare generation number in snapshot to the live version before writing to eCache
"undo" the fix from #63040 and #63895

Sorry that is a lot more work than your initial approach. Let me know if you need help with it. I am also available to chat if you want to discuss the fix.

misterikkit · 2018-08-14T23:58:35Z

pkg/scheduler/core/equivalence/eqivalence.go

I don't think a flag is sufficient, since lookupResult clears the flag, and I don't want us to make assumptions about when that may be called.

Instead, how about an int64 "generation number" which is atomically incremented?

misterikkit · 2018-08-15T00:04:47Z

pkg/scheduler/core/equivalence/eqivalence.go

Adding this check removes the need for the check on line 254, if !cache.IsUpToDate(nodeInfo). I think this PR should delete that check, and then remove nodeInfo from the arguments, and all the plumbing that goes with that.

See #63040

if !cache.IsUpToDate(nodeInfo) is removed now, but nodeInfo argument is kept, because it's used in other places.

cofyc · 2018-08-15T04:02:26Z

Thanks for clarification and the advice! I'll update this PR.

bsalamat · 2018-08-15T18:02:41Z

Thinking more about the solution that misterikkit outlined above last night, I am a bit worried that snapshotting the ecache might cause a noticeable slow down. We should measure performance impact of this solution.

cofyc · 2018-09-14T06:36:12Z

Any other concerns?

bsalamat

Thanks, @cofyc and sorry for the delay. It looks generally good. I have several minor comments.
I haven't looked at the tests yet.

bsalamat · 2018-09-11T23:26:37Z

pkg/scheduler/core/equivalence/eqivalence.go

we should add an else statement that logs an error when a predicate key is not found.

bsalamat · 2018-09-15T01:13:39Z

pkg/scheduler/factory/factory.go

s/caches/cache/

bsalamat · 2018-09-15T01:18:12Z

pkg/scheduler/factory/factory.go

If remove must happen after the invalidation, it deserves a better comment.

Sorry, I found I'm wrong with this part. All equivalence cache invalidations should be after resource objects are added, updated and deleted, then predicate functions will run and use new versions of resource objects. I revert this change.

By the way, there is no need to invalidate predicates for node after adding because no predicate functions will run on deleted node (we iterate on nodes in scheduler cache), this is different from other resources, e.g. PVs.

I agree with you that invalidation of eCache should happen after CUD operations on objects. The following diagram shows a scenario where the reverse order causes an issue. If invalidation happens first, we could snapshot eCache after the invalidation and then we snapshot the cache itself. A node is then removed/added/updated before we update eCache. We then would update eCache with stale information which is based on our latest snapshot of the cache taken before the node removal/addition/update:

------|-----------------|------------------|----------------|---------------|--------> Invalidate eCache Snapshot eCache Snapshot Cache Remove Node Update eCache

I think you should add this to the comment to document why the order is important.

Added in 2a05122. This simply update comments in PR https://github.com/kubernetes/kubernetes/pull/63895/files to reflect latest code.

There is no need to revert it now, I'll rebase it when PR is done.

cofyc · 2018-09-16T13:48:42Z

@bsalamat Thanks, I pushed a new commit: cc3b6d9.

resouer · 2018-09-17T07:05:05Z

pkg/scheduler/factory/factory.go

Since we will init NodeCache here, it seems we don't need to call g.equivalenceCache.GetNodeCache(nodeName) in generic_scheduler?

Instead, we can only call sth like g.equivalenceCache.LoadNodeCache() there, only read operation. WDYT?

I think it's a good idea! Please check 8ece5d6.

resouer

Just added several review comments, others look good, please take a look!

resouer · 2018-09-17T07:36:25Z

pkg/scheduler/core/equivalence/eqivalence.go

stores a *NodeCache

resouer · 2018-09-17T07:38:04Z

pkg/scheduler/core/equivalence/eqivalence.go

It would be better to include comments for all last three members, to make other developer's life easier.

bsalamat · 2018-09-18T21:47:13Z

pkg/scheduler/scheduler.go

I don't think this comment applies here.

After assuming pod into scheduler cache, it will invalidate affected predicates in equivalence cache.
https://github.com/kubernetes/kubernetes/blob/8ece5d624ded4815655104d8a12712cfe6d8743e/pkg/scheduler/scheduler.go#L355-L363

bsalamat

Thanks, @cofyc!

/lgtm

This reverts commit 17d0190.

cofyc · 2018-09-22T03:46:34Z

hi, @bsalamat
Old PR contains some temporary commits, I squashed them into fewer commits now (changes are same), could you please LGTM again? Thanks!

- snapshot equivalence cache generation numbers before snapshotting the scheduler cache - skip update when generation does not match live generation - keep the node and increment its generation to invalidate it instead of deletion - use predicates order ID as key to improve performance

cofyc · 2018-09-22T13:39:06Z

/retest

bsalamat

/lgtm

k8s-ci-robot · 2018-09-24T18:08:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, cofyc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [bsalamat]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · 2018-09-24T23:40:46Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Aug 12, 2018

k8s-ci-robot requested review from k82cn and timothysc August 12, 2018 04:27

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 12, 2018

k8s-ci-robot assigned msau42 Aug 12, 2018

cofyc force-pushed the fix67260 branch 2 times, most recently from 0a41fc2 to 3bbf639 Compare August 12, 2018 04:31

k8s-ci-robot assigned misterikkit and resouer Aug 13, 2018

yastij reviewed Aug 13, 2018

View reviewed changes

misterikkit reviewed Aug 13, 2018

View reviewed changes

k8s-ci-robot requested a review from bsalamat August 13, 2018 21:59

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 14, 2018

cofyc force-pushed the fix67260 branch from 482cde8 to e3f6c61 Compare August 14, 2018 07:14

resouer reviewed Aug 14, 2018

View reviewed changes

misterikkit mentioned this pull request Aug 14, 2018

TestVolumeBindingRescheduling out of sync; retrying flakes #67260

Closed

misterikkit reviewed Aug 15, 2018

View reviewed changes

bsalamat reviewed Sep 15, 2018

View reviewed changes

cofyc force-pushed the fix67260 branch 2 times, most recently from 73b4434 to cc3b6d9 Compare September 16, 2018 13:44

resouer mentioned this pull request Sep 17, 2018

Revisit and narrow down functionality of equivelance class cache #68720

Closed

4 tasks

resouer reviewed Sep 17, 2018

View reviewed changes

cofyc force-pushed the fix67260 branch 2 times, most recently from 6f266d7 to 8ece5d6 Compare September 17, 2018 11:46

bsalamat reviewed Sep 18, 2018

View reviewed changes

k8s-ci-robot assigned bsalamat Sep 21, 2018

bsalamat approved these changes Sep 21, 2018

View reviewed changes

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 21, 2018

Revert "Use sync.map to scale ecache better"

a2cc1b1

This reverts commit 17d0190.

cofyc force-pushed the fix67260 branch from 8ece5d6 to e93f1a4 Compare September 22, 2018 03:40

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 22, 2018

cofyc force-pushed the fix67260 branch from e93f1a4 to 135efcd Compare September 22, 2018 04:04

Yecheng Fu added 2 commits September 22, 2018 12:08

Update notes to document why invalidation order is important.

b3f1e12

cofyc force-pushed the fix67260 branch from 135efcd to b3f1e12 Compare September 22, 2018 04:10

bsalamat reviewed Sep 24, 2018

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2018

k8s-ci-robot merged commit 28d86ac into kubernetes:master Sep 25, 2018

cofyc deleted the fix67260 branch May 4, 2019 07:19

Use monotonically increasing generation to prevent equivalence cache race #67308

Use monotonically increasing generation to prevent equivalence cache race #67308

Uh oh!

Conversation

cofyc commented Aug 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cofyc commented Aug 12, 2018

Uh oh!

msau42 commented Aug 13, 2018

Uh oh!

yastij left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cofyc Aug 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

misterikkit commented Aug 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cofyc commented Aug 14, 2018

Uh oh!

cofyc commented Aug 14, 2018

Uh oh!

resouer commented Aug 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cofyc Aug 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

misterikkit commented Aug 14, 2018

Uh oh!

misterikkit commented Aug 14, 2018

Uh oh!

misterikkit left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cofyc commented Aug 15, 2018

Uh oh!

bsalamat commented Aug 15, 2018

Uh oh!

cofyc commented Sep 14, 2018

Uh oh!

bsalamat left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cofyc Sep 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cofyc Sep 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cofyc commented Aug 12, 2018 •

edited

Loading

yastij left a comment •

edited

Loading

cofyc Aug 14, 2018 •

edited

Loading

misterikkit commented Aug 13, 2018 •

edited

Loading

cofyc Aug 18, 2018 •

edited

Loading

cofyc Sep 16, 2018 •

edited

Loading

cofyc Sep 17, 2018 •

edited

Loading

cofyc Sep 17, 2018 •

edited

Loading

resouer left a comment •

edited

Loading