etcd lease auto-renewal can extend event TTL indefinitely

Per the [etcd ops guide](https://coreos.com/etcd/docs/latest/op-guide/failures.html#majority-failure): "Once a majority of members works, the etcd cluster elects a new leader automatically and returns to a healthy state. The new leader extends timeouts automatically for all leases. This mechanism ensures no lease expires due to server side unavailability."

For events, if leader elections occur more often than the event-ttl (which defaults to 1hr), event leases will be renewed indefinitely and the number of events stored in etcd (and the number of open leases) will grow until either the etcd storage space limit is exceeded or some other limit is hit (e.g. lease count results in excessively expensive revoke operations).

1. Add an option to etcd lease creation to disable auto-renewal, and use this option to disable auto-renewal of event leases created by k8s
2. Add a remediation routine (in kube-apiserver?) to remove old events (feels like an ugly hack)
3. Transition to a different approach of expiring events in k8s (GC?)

We're currently considering (1) as a short term fix for this issue.

@gyuho, @xiang90 Do you have data or intuition about how often leader elections typically occur? Or is this too dependent on the environment to say?

cc @wenjiaswe @wojtek-t @mborsz 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

etcd lease auto-renewal can extend event TTL indefinitely #65497

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

etcd lease auto-renewal can extend event TTL indefinitely #65497

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions