Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send correct resource version for delete events from watch cache #58547

Merged

Conversation

liggitt
Copy link
Member

@liggitt liggitt commented Jan 19, 2018

Fixes #58545
Fixes kubernetes/client-go#338

the watch cache filtering is returning the previous object content intact, including resource version. this is the logic the watch cache uses:

switch {
case curObjPasses && !oldObjPasses:
	watchEvent = watch.Event{Type: watch.Added, Object: event.Object.DeepCopyObject()}
case curObjPasses && oldObjPasses:
	watchEvent = watch.Event{Type: watch.Modified, Object: event.Object.DeepCopyObject()}
case !curObjPasses && oldObjPasses:
	watchEvent = watch.Event{Type: watch.Deleted, Object: event.PrevObject.DeepCopyObject()}
}

when processing a delete event, we should be sending the old object's content but with the event's resource version set in it. corresponding logic exists in the uncached stores:

if err := w.versioner.UpdateObject(oldObj, res.Node.ModifiedIndex); err != nil {
utilruntime.HandleError(fmt.Errorf("failure to version api object (%d) %#v: %v", res.Node.ModifiedIndex, oldObj, err))
}

// Note that this sends the *old* object with the etcd revision for the time at
// which it gets deleted.
oldObj, err = decodeObj(wc.watcher.codec, wc.watcher.versioner, data, e.rev)
if err != nil {
return nil, nil, err
}

Fixes an issue where the resourceVersion of an object in a DELETE watch event was not the resourceVersion of the delete itself, but of the last update to the object. This could disrupt the ability of clients to re-establish watches properly.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 19, 2018
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 19, 2018
@liggitt liggitt force-pushed the watch-cache-delete-resourceversion branch from e081817 to 57998d2 Compare January 19, 2018 23:08
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 19, 2018
@liggitt liggitt changed the title WIP - Send correct resource version for delete events from watch cache Send correct resource version for delete events from watch cache Jan 19, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 19, 2018
@liggitt liggitt added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jan 19, 2018
@liggitt liggitt added this to the v1.9 milestone Jan 19, 2018
@liggitt
Copy link
Member Author

liggitt commented Jan 19, 2018

@kubernetes/sig-api-machinery-bugs @kubernetes/sig-api-machinery-pr-reviews

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jan 19, 2018
@liggitt
Copy link
Member Author

liggitt commented Jan 19, 2018

cc @smarterclayton @wojtek-t

@liggitt
Copy link
Member Author

liggitt commented Jan 19, 2018

/status approved-for-milestone

@@ -73,79 +77,88 @@ func TestCacheWatcherHandlesFiltering(t *testing.T) {
{
events: []*watchCacheEvent{
Copy link
Member Author

@liggitt liggitt Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this diff is best viewed without whitespace:
https://github.com/kubernetes/kubernetes/pull/58547/files?w=1

this makes the test data accurate (events have resource versions), and actually tests for the result we want (delete event is delivered with the resource version of the event that caused it, not the last modified resourceVersion of the object, otherwise watch events would be able to have resourceVersions that went back in time, and caused re-watch to start from old points in history)

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@lavalamp
Copy link
Member

cc @jpbetz @wenjiaswe

@tnozicka
Copy link
Contributor

/retest

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@wanghaoran1988
Copy link
Contributor

/test pull-kubernetes-e2e-kops-aws

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@liggitt
Copy link
Member Author

liggitt commented Jan 23, 2018

/retest

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 58547, 57228, 58528, 58499, 58618). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 619305f into kubernetes:master Jan 23, 2018
@k8s-ci-robot
Copy link
Contributor

@liggitt: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kops-aws 57998d2 link /test pull-kubernetes-e2e-kops-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-github-robot pushed a commit that referenced this pull request Jan 23, 2018
…7-upstream-release-1.8

Automatic merge from submit-queue.

Automated cherry pick of #58547: Send correct resource version for delete events from watch

Cherry pick of #58547 on release-1.8.

#58547: Send correct resource version for delete events from watch
openshift-merge-robot added a commit to openshift/origin that referenced this pull request Jan 23, 2018
Automatic merge from submit-queue (batch tested with PRs 18233, 18068, 18228, 18227).

UPSTREAM: 58547: Send correct resource version for delete events from watch cache

Backport of kubernetes/kubernetes#58547

Watch cache was returning incorrect (old) ResourceVersion on "deleted" events breaking informers that were going back in time. This fixes it.

/assign @liggitt 
/cc @mfojtik 

Fixes #17581 #16003 and likely others
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Jan 25, 2018
…e-fix-58547

Automatic merge from submit-queue (batch tested with PRs 18233, 18068, 18228, 18227).

UPSTREAM: 58547: Send correct resource version for delete events from watch cache

Backport of kubernetes#58547

Watch cache was returning incorrect (old) ResourceVersion on "deleted" events breaking informers that were going back in time. This fixes it.

/assign @liggitt
/cc @mfojtik

Fixes openshift/origin#17581 openshift/origin#16003 and likely others

Origin-commit: 042a63f8c1effc2fb911ce2cf494458872e9f8a3
k8s-github-robot pushed a commit that referenced this pull request Jan 26, 2018
…7-upstream-release-1.9

Automatic merge from submit-queue.

Automated cherry pick of #58547: Send correct resource version for delete events from watch

Cherry pick of #58547 on release-1.9.

#58547: Send correct resource version for delete events from watch
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.9" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

@liggitt liggitt deleted the watch-cache-delete-resourceversion branch January 26, 2018 04:57
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Jan 30, 2018
…ect resource version for delete events from watch

Origin-commit: 346464d764b9f9a96ea29d41a8a3e3ca1b219468
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Jan 30, 2018
…e-fix-58547-release-3.8

Automatic merge from submit-queue.

[3.8] UPSTREAM: 58571: Automated cherry pick of kubernetes#58547: Send correct resource version for delete events from watch

Backport of openshift/origin#18233 via kubernetes#58571

/assign @liggitt

Origin-commit: 360fac5be27896518ad01b63a0ea132174af23e8
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Feb 27, 2018
…e-fix-58547

Automatic merge from submit-queue (batch tested with PRs 18233, 18068, 18228, 18227).

UPSTREAM: 58547: Send correct resource version for delete events from watch cache

Backport of kubernetes#58547

Watch cache was returning incorrect (old) ResourceVersion on "deleted" events breaking informers that were going back in time. This fixes it.

/assign @liggitt
/cc @mfojtik

Fixes openshift/origin#17581 openshift/origin#16003 and likely others

Origin-commit: 042a63f8c1effc2fb911ce2cf494458872e9f8a3
openshift-publish-robot pushed a commit to openshift/kubernetes-apiserver that referenced this pull request Feb 28, 2018
Automatic merge from submit-queue (batch tested with PRs 18233, 18068, 18228, 18227).

UPSTREAM: 58547: Send correct resource version for delete events from watch cache

Backport of kubernetes/kubernetes#58547

Watch cache was returning incorrect (old) ResourceVersion on "deleted" events breaking informers that were going back in time. This fixes it.

/assign @liggitt
/cc @mfojtik

Fixes openshift/origin#17581 openshift/origin#16003 and likely others

Origin-commit: 042a63f8c1effc2fb911ce2cf494458872e9f8a3


Kubernetes-commit: b1d49808af3db35be42e4b705953d656a21bc201
sycki added a commit to goodrain/kubernetes that referenced this pull request Jul 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.