Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler should use a shared informer, and fix broken watch behavior for cached watches #46223

Merged
merged 2 commits into from
May 23, 2017

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented May 22, 2017

Can be used either from a true shared informer or a local shared
informer created just for the scheduler.

Fixes a bug in the cache watcher where we were returning the "current" object from a watch event, not the historic event. This means that we broke behavior when introducing the watch cache. This may have API implications for filtering watch consumers - but on the other hand, it prevents clients filtering from seeing objects outside of their watch correctly, which can lead to other subtle bugs.

The behavior of some watch calls to the server when filtering on fields was incorrect.  If watching objects with a filter, when an update was made that no longer matched the filter a DELETE event was correctly sent.  However, the object that was returned by that delete was not the (correct) version before the update, but instead, the newer version.  That meant the new object was not matched by the filter.  This was a regression from behavior between cached watches on the server side and uncached watches, and thus broke downstream API clients.

@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels May 22, 2017
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 22, 2017
@smarterclayton
Copy link
Contributor Author

Reverts #46199

@k8s-github-robot k8s-github-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels May 22, 2017
@ncdc
Copy link
Member

ncdc commented May 22, 2017

@smarterclayton do you still need the Switch the pod lister to a filtering implementation commit?

@liggitt
Copy link
Member

liggitt commented May 22, 2017

testcase?

@smarterclayton
Copy link
Contributor Author

@kubernetes/sig-api-machinery-pr-reviews we broke watch DELETE events with filters when we introduced the watch cache. This PR corrects that behavior... but it potentially has ramifications for API consumers. APIs not backed by watch caches are correct (in the strict sense that they have always behaved this way since 1.0) - APIs backed by watch caches return the most recent update, not the old update. This is an unfortunate state to be in - we are broken both ways, we regressed several releases ago, and no one noticed until I actually made code from a client that depended on the correct behavior of the system. We need to decide whether we accept the break, fix the break, or provide both modes for those who need correct behavior.

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label May 22, 2017
@smarterclayton
Copy link
Contributor Author

Yes, still need the filtering set for scheduler extensions that expect to get only assigned pods.

@liggitt
Copy link
Member

liggitt commented May 22, 2017

does this mean we never get a watch event containing the final etcd record, only the next-to-final record? (or is PrevObject the final etcd record?)

@smarterclayton
Copy link
Contributor Author

Not sure what it does for final delete.

@ncdc
Copy link
Member

ncdc commented May 22, 2017

@liggitt before this fix, we get (DELETED, pod.Phase=Succeeded). With the fix, we get (DELETED, pod.Phase=Running), which is correct, since the fieldSelector is for assigned, non-terminated pods. So with the fix we never get the final etcd watch record, but we shouldn't be getting that one in this case.

@smarterclayton
Copy link
Contributor Author

I think @liggitt was referring to an unfiltered watch.

@liggitt
Copy link
Member

liggitt commented May 22, 2017

I think @liggitt was referring to an unfiltered watch.

exactly

@ncdc
Copy link
Member

ncdc commented May 22, 2017

@smarterclayton would it make more sense to move the assigned pod lister functionality into pod_expansion.go?

@smarterclayton
Copy link
Contributor Author

I think we might want to consider back porting the cache watch stuff if we can decide on the API implications.

@ncdc
Copy link
Member

ncdc commented May 22, 2017

Send an email to the dev list? I doubt that would cover everyone using watches and relying on this incorrect behavior, but it's better than nothing.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 22, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 22, 2017
Can be used either from a true shared informer or a local shared
informer created just for the scheduler.
The underlying storage has always returned the old object on watch
delete events when filtering. The cache watcher does not, which means a
downsteam caller gets different behavior.

This fixes the cache watcher to be consistent with our long term
behavior for watch. It may result in a behavior change (the filter
becomes more precise) but this was a regression in behavior.
@smarterclayton smarterclayton changed the title Scheduler should use a shared informer Scheduler should use a shared informer, and fix broken watch behavior for cached watches May 22, 2017
@smarterclayton
Copy link
Contributor Author

Consensus seems to be merge and backport.

Reviewers, do your worst

@k82cn
Copy link
Member

k82cn commented May 23, 2017

@kubernetes/sig-scheduling-pr-reviews

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label May 23, 2017
c.scheduledPodsHasSynced = podInformer.Informer().HasSynced
// scheduled pod cache
podInformer.Informer().AddEventHandler(
cache.FilteringResourceEventHandler{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the informer cache all Pods in local (scheduler) and filter? or handled in apiserver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the informer's job. The informer in this package is filtered to non-terminal pods, and then split in half into the pod cache (assigned) and queue (unassigned). The global informer in the controller manager (which this PR does not use, but an extension scheduler or hyper kube could use) would be global, and this filters down to that set.

@wojtek-t
Copy link
Member

Consensus seems to be merge and backport.

Yeah - I would recommend this too.

@smarterclayton - thanks a lot for debugging and fixing it. It's unfortunate that we didn't find this bug for 5 releases...

@wojtek-t
Copy link
Member

This LGTM

@deads2k
Copy link
Contributor

deads2k commented May 23, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 23, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, smarterclayton

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-ci-robot
Copy link
Contributor

@smarterclayton: The following test(s) failed:

Test name Commit Details Rerun command
pull-kubernetes-federation-e2e-gce e9e6935 link @k8s-bot pull-kubernetes-federation-e2e-gce test this
pull-kubernetes-e2e-gce-etcd3 e9e6935 link @k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 45766, 46223)

@k8s-github-robot k8s-github-robot merged commit 8e07e61 into kubernetes:master May 23, 2017
@smarterclayton
Copy link
Contributor Author

@kubernetes/kubernetes-release-managers this is something we want to backport to 1.6, 1.5, and any older release that we release a patch for.

@k8s-cherrypick-bot
Copy link

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@wojtek-t
Copy link
Member

@mwielgus ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.