Skip to content

Conversation

@logicalhan
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently, our reflector metrics are a memory leak. While this has always been true (for these metrics), this has been exacerbated by the increased watches introduced by #64752. For now, let's just remove these metrics since (I think) no one wants a memory leak. This should be safe because it would not have been possible to have set up reliable monitoring around these metrics since the label names are unstable (as documented).

Which issue(s) this PR fixes:

Fixes #73587

Does this PR introduce a user-facing change?:

This PR removes the following metrics:

  reflector_items_per_list
  reflector_items_per_watch
  reflector_last_resource_version
  reflector_list_duration_seconds
  reflector_lists_total
  reflector_short_watches_total
  reflector_watch_duration_seconds
  reflector_watches_total

While this is a backwards-incompatible change, it would have been impossible to setup reliable monitoring around these metrics since the labels were not stable. 

/sig api-machinery instrumentation
/cc @wojtek-t @lavalamp @liggitt

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Feb 27, 2019
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 27, 2019
@smarterclayton
Copy link
Contributor

I agree, we need to be more judicious because these are generally low value relative to other metrics and are not targeted like our other metrics.

@liggitt
Copy link
Member

liggitt commented Feb 27, 2019

I'm in favor as well. If there are targeted metrics we want for fixed informers, we could add them in the future, but the impact of these (e.g. #73587) far outweighs any benefit.

cc @kubernetes/sig-instrumentation-pr-reviews

@danielqsj
Copy link
Contributor

/cc @brancz

@k8s-ci-robot k8s-ci-robot requested a review from brancz February 27, 2019 03:48
@logicalhan
Copy link
Member Author

/test pull-kubernetes-bazel-test pull-kubernetes-integration

defer r.lastSyncResourceVersionMutex.Unlock()
r.lastSyncResourceVersion = v

rv, err := strconv.Atoi(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this removed code is even worse than expected. Good call.

@lavalamp
Copy link
Contributor

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 27, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, logicalhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 27, 2019
@logicalhan
Copy link
Member Author

/test pull-kubernetes-integration

@wojtek-t
Copy link
Member

/lgtm
/retest

Thanks!

@brancz
Copy link
Member

brancz commented Feb 27, 2019

This is the second time these have caused a leak, so I welcome this 👍

@wojtek-t wojtek-t added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 27, 2019
@k8s-ci-robot k8s-ci-robot merged commit a514fa0 into kubernetes:master Feb 27, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 2, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 2, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 2, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 2, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 2, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 5, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 5, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 11, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 23, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 23, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 23, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 24, 2019
mfojtik pushed a commit to mfojtik/openshift-apiserver that referenced this pull request Sep 25, 2019
@Slutzky
Copy link

Slutzky commented Nov 19, 2019

I'm using promtheus-operator verison 5.3.8 and promethues 2.14.0 and still can see in the http://localhost:9090/status the following :

Name Count
reflector_list_duration_seconds 701745
reflector_watch_duration_seconds 701745
reflector_items_per_watch 701745
reflector_items_per_list 701745
http_server_requests_seconds_bucket 559314
hikaricp_connections_usage_seconds_bucket 312846
hikaricp_connections_creation_seconds_bucket 312846
hikaricp_connections_acquire_seconds_bucket 312846
reflector_short_watches_total 233915
reflector_items_per_list_count 233915

I thought these metrics were excluded in all versions after 2.12.6 or am I wrong? I also tried to exclude these metrics by adding, but it seems to be not working ...

additionalScrapeConfigs:
  - job_name: reflector_watches_total
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: 'reflector_watches_total'
      action: drop

  - job_name: reflector_items_per_list
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: 'reflector_items_per_list'
      action: drop

  - job_name: reflector_items_per_watch
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: 'reflector_items_per_watch'
      action: drop

  - job_name: reflector_list_duration_seconds
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: 'reflector_list_duration_seconds'
      action: drop

  - job_name: reflector_items_per_watch_sum
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: 'reflector_items_per_watch_sum'
      action: drop

@brancz
Copy link
Member

brancz commented Nov 21, 2019

It depends on your kubernetes version whether you may still have these, and metric relabeling would need to be done in the scrape config/ServiceMonitor that configures the target that these come from.

@Slutzky
Copy link

Slutzky commented Nov 22, 2019

hi @brancz , I'm using version v1.12.3 on k8s

@brancz
Copy link
Member

brancz commented Nov 22, 2019

Yes in 1.12 those are still present, so you need to drop the metrics on the component ServiceMonitors that expose them.

mm4tt added a commit to mm4tt/kubernetes that referenced this pull request Dec 10, 2019
The reflector metrics were removed in kubernetes#74636.
This code is not used anymore.
machadovilaca added a commit to machadovilaca/kubevirt that referenced this pull request Apr 8, 2024
Reflector metrics were removed in Kubernetes
due to them causing memory leaks:
kubernetes/kubernetes#74636

Signed-off-by: João Vilaça <[email protected]>
machadovilaca added a commit to machadovilaca/kubevirt that referenced this pull request Apr 8, 2024
Reflector metrics were removed in Kubernetes
due to them causing memory leaks:
kubernetes/kubernetes#74636

Signed-off-by: João Vilaça <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memory leak in kubelet 1.12.5

10 participants