Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event aggregation: include latest event message in aggregate event #46034

Merged

Conversation

kensimon
Copy link

@kensimon kensimon commented May 18, 2017

What this PR does / why we need it:

This changes the event aggregation behavior so that, when multiple events are deduplicated, the aggregated event includes the message of the latest related event.

This fixes an issue where the original event expires due to TTL, and the aggregate event doesn't contain any useful message.

Which issue this PR fixes:

fixes #45971

Duplicate recurring Events now include the latest event's Message string

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 18, 2017
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 18, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @kensimon. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kensimon kensimon force-pushed the canonical-aggregate-events branch from f206ff0 to 482c09d Compare May 18, 2017 15:02
@liggitt
Copy link
Member

liggitt commented May 18, 2017

/unassign
/assign @derekwaynecarr

@derekwaynecarr
Copy link
Member

i like the goal of not losing the source message due to ttl concerns. if you can add a test case as this evolves, it would help me know if this will not cause painful event spam issues to recur (which was extremely painful 2 yrs ago).

@kensimon
Copy link
Author

Yeah I'll definitely add the test case to prove the point a bit. I've tested this manually with local-up-cluster and deployed a broken container, and it would create the first 10 events as independent events, but after the 10th event, the first event would be modified (the count and lastseen bumped, and the message string changed to the latest event's string).

The only thing that's a bit confusing is that there's no clear sign that an event is being re-used as a canonical event. As an example, 15 events come in that are the same Reason but different Messages... For the first 10 you would see:

event1{count: 1}
event2{count: 1}
...
event10{count: 1}

and when the 11th event comes in, you'll see:

event2{count: 1}
event3{count: 1}
...
event1{count: 2}

And if the other events fall off from TTL (I simulated this by setting the event-ttl to 60s), and 4 more events come in, you'll eventually only see:

event1{count: 6}

which is a bit confusing, since the event actually happened a total of 15 times, but there's 9 events that expired off and only the first one was re-used. I think it's not worse than we have now, but it's still not ideal.

@derekwaynecarr
Copy link
Member

i would prefer a new event, and a message prefix that says something like "combined from similar : last-message known here"

@kensimon kensimon force-pushed the canonical-aggregate-events branch from 482c09d to c83e261 Compare May 19, 2017 17:01
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 19, 2017
@kensimon
Copy link
Author

kensimon commented May 19, 2017

@derekwaynecarr See update, this now preserves the behavior of still creating an aggregate event, but keeping its message up-to-date. The PR is now quite a bit simpler, and the tests basically work as-is (all I had to do was change the TestEventAggregatorByReasonMessageFunc test to check for a prefix instead of a complete match.)

I'm accomplishing this by having the EventAggregate function responsible for deciding what cache key to use for the event as it gets passed to observeEvent. If the event shouldn't be aggregated, it returns the complete getEventKey result. If the event should be aggregated, it returns the EventAggregatorByReasonFunc instead.

Since observeEvent now takes the cache key as a parameter, it is only responsible for "is there already an event I've seen with this cache key, if so replace, if not create".

This should now be ok to test (PR subject has been updated)

@kensimon kensimon changed the title WIP/RFC: Event aggregation: re-use original events Event aggregation: include latest event message in aggregate event May 19, 2017
@kensimon kensimon force-pushed the canonical-aggregate-events branch from c83e261 to b7c6a6d Compare May 19, 2017 17:48
@kensimon
Copy link
Author

Removed some dead code from when the PR was more complex, should be a much simpler change now.

@kensimon
Copy link
Author

@derekwaynecarr poke

@timothysc timothysc added kind/bug Categorizes issue or PR as related to a bug. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 23, 2017
@timothysc timothysc added this to the v1.7 milestone May 23, 2017
@timothysc
Copy link
Member

@pmorie @derekwaynecarr PTAL.

@timothysc
Copy link
Member

@k8s-bot test this #IGNORE

This changes the event aggregation behavior so that, when multiple events are
deduplicated, the aggregated event includes the message of the latest related
event.

This fixes an issue where the original event expires due to TTL, and the
aggregate event doesn't contain any useful message.
@kensimon kensimon force-pushed the canonical-aggregate-events branch from b7c6a6d to 6ada269 Compare May 25, 2017 01:10
@derekwaynecarr
Copy link
Member

@k8s-bot ok to test

@kensimon
Copy link
Author

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

I've seen these failures before, not sure if it's issues with the environment or just flakes...

@derekwaynecarr
Copy link
Member

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 31, 2017
@derekwaynecarr
Copy link
Member

hmm, seems i cant approve this. i did verify the behavior works as expected on a run-away controller, and the new experience is preferred.

/assign deads2k

@derekwaynecarr
Copy link
Member

/assign @deads2k

@kensimon
Copy link
Author

build failing due to gcloud's api quota getting exceeded, I'll try and trigger a re-test once things cool down a bit

@deads2k
Copy link
Contributor

deads2k commented Jun 1, 2017

/approve

@kensimon
Copy link
Author

kensimon commented Jun 1, 2017

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, derekwaynecarr, kensimon

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2017
@kensimon
Copy link
Author

kensimon commented Jun 2, 2017

@deads2k is this going to get merged for 1.7, or?

@deads2k
Copy link
Contributor

deads2k commented Jun 2, 2017

@deads2k is this going to get merged for 1.7, or?

yeah, its in queue https://submit-queue.k8s.io/#/queue

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 46726, 41912, 46695, 46034, 46551)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aggregated events can outlive the events they duplicate
8 participants