Skip to content

PodMonitors being deleted after some amount of time #8447

@CrimsonFez

Description

@CrimsonFez

What happened?

Description

I found that my PodMonitor objects in namespaces other than the monitoring namespace are being deleted after some amount of time. In some cases it was 2 hours and in others it was 20 minutes. I am able to see when metrics start being collected and when they stop (cause the PodMonitor got deleted).
I'm not using any automation on my cluster. I am the only one who does anything with it.
I tried looking through documentation to see if this is a known feature, but I wasn't able to find anything.
Any help would be greatly appreciated!

Prometheus Operator Version

Name:                   prometheus-operator
Namespace:              monitoring
CreationTimestamp:      Sun, 26 Oct 2025 21:38:25 -0500
Labels:                 app=kube-prometheus-stack-operator
                        app.kubernetes.io/component=prometheus-operator
                        app.kubernetes.io/instance=prometheus
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=kube-prometheus-stack-prometheus-operator
                        app.kubernetes.io/part-of=kube-prometheus-stack
                        app.kubernetes.io/version=82.4.0
                        chart=kube-prometheus-stack-82.4.0
                        heritage=Helm
                        release=prometheus
Annotations:            deployment.kubernetes.io/revision: 3
                        meta.helm.sh/release-name: prometheus
                        meta.helm.sh/release-namespace: monitoring
Selector:               app=kube-prometheus-stack-operator,release=prometheus
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=kube-prometheus-stack-operator
                    app.kubernetes.io/component=prometheus-operator
                    app.kubernetes.io/instance=prometheus
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=kube-prometheus-stack-prometheus-operator
                    app.kubernetes.io/part-of=kube-prometheus-stack
                    app.kubernetes.io/version=82.4.0
                    chart=kube-prometheus-stack-82.4.0
                    heritage=Helm
                    release=prometheus
  Service Account:  prometheus-operator
  Containers:
   kube-prometheus-stack:
    Image:      quay.io/prometheus-operator/prometheus-operator:v0.89.0
    Port:       10250/TCP (https)
    Host Port:  0/TCP (https)
    Args:
      --kubelet-service=kube-system/prometheus-kubelet
      --kubelet-endpoints=true
      --kubelet-endpointslice=false
      --localhost=127.0.0.1
      --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.89.0
      --config-reloader-cpu-request=0
      --config-reloader-cpu-limit=0
      --config-reloader-memory-request=0
      --config-reloader-memory-limit=0
      --thanos-default-base-image=quay.io/thanos/thanos:v0.41.0
      --secret-field-selector=type!=kubernetes.io/dockercfg,type!=kubernetes.io/service-account-token,type!=helm.sh/release.v1
      --web.enable-tls=true
      --web.cert-file=/cert/cert
      --web.key-file=/cert/key
      --web.listen-address=:10250
      --web.tls-min-version=VersionTLS13
    Liveness:   http-get https://:https/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get https://:https/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      GOGC:  30
    Mounts:
      /cert from tls-secret (ro)
  Volumes:
   tls-secret:
    Type:          Secret (a volume populated by a Secret)
    SecretName:    prometheus-admission
    Optional:      false
  Node-Selectors:  <none>
  Tolerations:     <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  prometheus-operator-74fb6b5f9 (0/0 replicas created), prometheus-operator-656695c8c6 (0/0 replicas created)
NewReplicaSet:   prometheus-operator-77c45df586 (1/1 replicas created)
Events:          <none>

Kubernetes Version

clientVersion:
  buildDate: "1980-01-01T00:00:00Z"
  compiler: gc
  gitCommit: 8cc511e399b929453cd98ae65b419c3cc227ec79
  gitTreeState: archive
  gitVersion: v1.34.2
  goVersion: go1.25.4
  major: "1"
  minor: "34"
  platform: linux/amd64
kustomizeVersion: v5.7.1
serverVersion:
  buildDate: "2025-09-10T21:19:11Z"
  compiler: gc
  emulationMajor: "1"
  emulationMinor: "33"
  gitCommit: 03e764d0394bdff662e960c70d25b3c30d731666
  gitTreeState: clean
  gitVersion: v1.33.5+rke2r1
  goVersion: go1.24.6 X:boringcrypto
  major: "1"
  minCompatibilityMajor: "1"
  minCompatibilityMinor: "32"
  minor: "33"
  platform: linux/amd64

Kubernetes Cluster Type

Other (please comment)

How did you deploy Prometheus-Operator?

helm chart:prometheus-community/kube-prometheus-stack

Manifests

prometheus-operator log output

My logs are too long to include.
I didn't find anything mentioning PodMonitors either.

Here are the first few logs
ts=2026-02-26T01:48:18.311904789Z level=info caller=/workspace/cmd/operator/main.go:286 msg="Starting Prometheus Operator" version="(version=0.89.0, branch=, revision=e13fb15)" build_context="(go=go1.25.6, platform=linux/amd64, user=, date=20260206-08:22:55, tags=unknown)" feature_gates="PrometheusAgentDaemonSet=false,PrometheusShardRetentionPolicy=false,PrometheusTopologySharding=false,StatusForConfigurationResources=false"
ts=2026-02-26T01:48:18.312284985Z level=info caller=/workspace/cmd/operator/main.go:287 msg="Operator's configuration" watch_referenced_objects_in_all_namespaces=false controller_id="" enable_config_reloader_probes=false
ts=2026-02-26T01:48:18.312732401Z level=info caller=/workspace/internal/goruntime/cpu.go:27 msg="Leaving GOMAXPROCS=32: CPU quota undefined"
ts=2026-02-26T01:48:18.312806353Z level=info caller=/workspace/cmd/operator/main.go:301 msg="Namespaces filtering configuration " config="{allow_list=\"\",deny_list=\"\",prometheus_allow_list=\"\",alertmanager_allow_list=\"\",alertmanagerconfig_allow_list=\"\",thanosruler_allow_list=\"\"}"
ts=2026-02-26T01:48:18.326729699Z level=info caller=/workspace/cmd/operator/main.go:342 msg="connection established" kubernetes_version=1.33.5+rke2r1
ts=2026-02-26T01:48:18.346247938Z level=info caller=/workspace/cmd/operator/main.go:427 msg="Kubernetes API capabilities" endpointslices=true
ts=2026-02-26T01:48:18.408010001Z level=warn caller=/workspace/pkg/server/server.go:158 msg="server TLS client verification disabled" client_ca_file=/etc/tls/private/tls-ca.crt err="stat /etc/tls/private/tls-ca.crt: no such file or directory"
ts=2026-02-26T01:48:18.408879843Z level=info caller=/workspace/pkg/kubelet/controller.go:229 msg="Starting controller" component=kubelet_endpoints kubelet_object=kube-system/prometheus-kubelet
ts=2026-02-26T01:48:18.408937785Z level=info caller=/go/pkg/mod/k8s.io/[email protected]/pkg/server/dynamiccertificates/dynamic_serving_content.go:135 msg="Starting controller" name=servingCert::/cert/cert::/cert/key
ts=2026-02-26T01:48:18.408990492Z level=info caller=/go/pkg/mod/k8s.io/[email protected]/pkg/server/dynamiccertificates/tlsconfig.go:243 msg="Starting DynamicServingCertificateController"
ts=2026-02-26T01:48:18.408983516Z level=info caller=/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:349 msg="Waiting for caches to sync" controller=prometheusagent
ts=2026-02-26T01:48:18.408973967Z level=info caller=/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:349 msg="Waiting for caches to sync" controller=alertmanager
ts=2026-02-26T01:48:18.408890166Z level=info caller=/workspace/pkg/server/server.go:295 msg="starting secure server" address=[::]:10250 http2=false
ts=2026-02-26T01:48:18.411060987Z level=info caller=/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:349 msg="Waiting for caches to sync" controller=prometheus
ts=2026-02-26T01:48:18.408978135Z level=info caller=/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:349 msg="Waiting for caches to sync" controller=thanos
ts=2026-02-26T01:48:18.453428167Z level=info caller=/go/pkg/mod/k8s.io/[email protected]/rest/warnings.go:110 msg="Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice"
ts=2026-02-26T01:48:18.458981135Z level=info caller=/go/pkg/mod/k8s.io/[email protected]/rest/warnings.go:110 msg="Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice"
ts=2026-02-26T01:48:18.510149179Z level=info caller=/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:356 msg="Caches are synced" controller=alertmanager

The rest is filled with this
ts=2026-02-26T04:18:18.427913063Z level=info caller=/go/pkg/mod/k8s.io/[email protected]/rest/warnings.go:110 msg="Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice"

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions