Skip to content

etcd lease auto-renewal can extend event TTL indefinitely #65497

@jpbetz

Description

@jpbetz

Per the etcd ops guide: "Once a majority of members works, the etcd cluster elects a new leader automatically and returns to a healthy state. The new leader extends timeouts automatically for all leases. This mechanism ensures no lease expires due to server side unavailability."

For events, if leader elections occur more often than the event-ttl (which defaults to 1hr), event leases will be renewed indefinitely and the number of events stored in etcd (and the number of open leases) will grow until either the etcd storage space limit is exceeded or some other limit is hit (e.g. lease count results in excessively expensive revoke operations).

  1. Add an option to etcd lease creation to disable auto-renewal, and use this option to disable auto-renewal of event leases created by k8s
  2. Add a remediation routine (in kube-apiserver?) to remove old events (feels like an ugly hack)
  3. Transition to a different approach of expiring events in k8s (GC?)

We're currently considering (1) as a short term fix for this issue.

@gyuho, @xiang90 Do you have data or intuition about how often leader elections typically occur? Or is this too dependent on the environment to say?

cc @wenjiaswe @wojtek-t @mborsz

Metadata

Metadata

Assignees

Labels

area/etcdsig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions