-
Notifications
You must be signed in to change notification settings - Fork 42k
Description
Per the etcd ops guide: "Once a majority of members works, the etcd cluster elects a new leader automatically and returns to a healthy state. The new leader extends timeouts automatically for all leases. This mechanism ensures no lease expires due to server side unavailability."
For events, if leader elections occur more often than the event-ttl (which defaults to 1hr), event leases will be renewed indefinitely and the number of events stored in etcd (and the number of open leases) will grow until either the etcd storage space limit is exceeded or some other limit is hit (e.g. lease count results in excessively expensive revoke operations).
- Add an option to etcd lease creation to disable auto-renewal, and use this option to disable auto-renewal of event leases created by k8s
- Add a remediation routine (in kube-apiserver?) to remove old events (feels like an ugly hack)
- Transition to a different approach of expiring events in k8s (GC?)
We're currently considering (1) as a short term fix for this issue.
@gyuho, @xiang90 Do you have data or intuition about how often leader elections typically occur? Or is this too dependent on the environment to say?