Skip to content

Add metrics for authentication config reload#1

Closed
enj wants to merge 16 commits intoenj:enj/f/authn_config_reloadfrom
aramase:aramase/f/authn_config_reload_metrics
Closed

Add metrics for authentication config reload#1
enj wants to merge 16 commits intoenj:enj/f/authn_config_reloadfrom
aramase:aramase/f/authn_config_reload_metrics

Conversation

@enj
Copy link
Owner

@enj enj commented Mar 9, 2024

No description provided.

tenzen-y and others added 13 commits March 9, 2024 06:09
…b-unit

Job: Use the fake clock in TestTrackJobStatusAndRemoveFinalizers
The map is changed to an array so as to retain the order of the original array
propagated from the CRI runtime.

Signed-off-by: Akihiro Suda <[email protected]>
For KEP-3857: Recursive Read-only (RRO) mounts

Signed-off-by: Akihiro Suda <[email protected]>
For KEP-3857: Recursive Read-only (RRO) mounts

Signed-off-by: Akihiro Suda <[email protected]>
This commit modifies the following files:

- pkg/apis/core/types.go
- staging/src/k8s.io/api/core/v1/types.go

Other changes were auto-generated by running `make update`.

Signed-off-by: Akihiro Suda <[email protected]>
For KEP-3857: Recursive Read-only (RRO) mounts

Signed-off-by: Akihiro Suda <[email protected]>
For KEP-3857: Recursive Read-only (RRO) mounts

Signed-off-by: Akihiro Suda <[email protected]>
See <https://kep.k8s.io/3857>.

An example manifest:
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: rro
spec:
  volumes:
    - name: mnt
      hostPath:
        # tmpfs is mounted on /mnt/tmpfs
        path: /mnt
  containers:
    - name: busybox
      image: busybox
      args: ["sleep", "infinity"]
      volumeMounts:
        # /mnt-rro/tmpfs is not writable
        - name: mnt
          mountPath: /mnt-rro
          readOnly: true
          mountPropagation: None
          recursiveReadOnly: IfPossible
        # /mnt-ro/tmpfs is writable
        - name: mnt
          mountPath: /mnt-ro
          readOnly: true
        # /mnt-rw/tmpfs is writable
        - name: mnt
          mountPath: /mnt-rw
```

Requirements:
- Feature gate "RecursiveReadOnlyMounts" to be enabled
- Linux kernel >= 5.12
- runc >= 1.1

Signed-off-by: Akihiro Suda <[email protected]>
Usage:
```
make test-e2e-node \
  TEST_ARGS='--service-feature-gates=RecursiveReadOnlyMounts=true --kubelet-flags="--feature-gates=RecursiveReadOnlyMounts=true"' \
  FOCUS="Mount recursive read-only" SKIP=""
```

Signed-off-by: Akihiro Suda <[email protected]>
Copy link
Owner Author

@enj enj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor stuff.

klog.ErrorS(err, "failed to validate authentication config")
// this config is not structurally valid and never will be, update the tracker so we stop retrying
trackedAuthenticationConfigData = authConfigData
authenticationconfigmetrics.RecordAuthenticationConfigAutomaticReloadFailure(apiServerID)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need this for the load above too? Maybe this should be done via a defer so it is easier to track?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, I suggested it not be a defer so it was obvious all the places we trip a failure metric ... I only care about 3 out of 10 but I found the defer harder to reason about (as well as counting on err not being masked through the whole function)

)

var registerMetrics sync.Once
var hashPool *sync.Pool
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets drop all the hashPool stuff, we have never actually seen any performance justification to do this (and probably should undo this pattern elsewhere).

})
}

func ResetMetricsForTest() {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this Reset right before the swap of the new authenticator?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind on this, I am thinking about the JWKs metrics that are specific to issuer.

waitAfterConfigSwap: true,
wantMetricStrings: []string{
`apiserver_authentication_config_controller_automatic_reload_last_timestamp_seconds{apiserver_id_hash="sha256:3c607df3b2bf22c9d9f01d5314b4bbf411c48ef43ff44ff29b1d55b41367c795",status="failure"} FP`,
`apiserver_authentication_config_controller_automatic_reloads_total{apiserver_id_hash="sha256:3c607df3b2bf22c9d9f01d5314b4bbf411c48ef43ff44ff29b1d55b41367c795",status="failure"} 1`,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this exact counter won't flake on CI? i.e. could it be 2?

KEP-3857: Recursive Read-only (RRO) mounts
@aramase aramase force-pushed the aramase/f/authn_config_reload_metrics branch from b073083 to 8a2eaae Compare March 9, 2024 21:30
@enj enj changed the base branch from enj/f/authn_config_reload_review to enj/f/authn_config_reload March 9, 2024 21:50
Add dynamic reload support for authentication configuration
@aramase aramase force-pushed the aramase/f/authn_config_reload_metrics branch from 8a2eaae to 94571d2 Compare March 9, 2024 22:36
@aramase aramase force-pushed the aramase/f/authn_config_reload_metrics branch from 94571d2 to 62ac88b Compare March 9, 2024 22:40
@enj enj closed this Mar 9, 2024
@aramase aramase deleted the aramase/f/authn_config_reload_metrics branch March 9, 2024 23:59
enj pushed a commit that referenced this pull request Oct 22, 2025
Instead of creating a new test case, the permutation is passed down. This
enables adding the event numbers to the log output, which is useful to
understand better which output belongs to which input:

    === RUN   TestListPatchedResourceSlices/update-patch/2_3_0_1
    tracker.go:396: I0929 14:28:40.032318] event #1: ResourceSlice add slice="s1"
    tracker.go:581: I0929 14:28:40.032404] event #1: syncing ResourceSlice resourceslice="s1"
    tracker.go:659: I0929 14:28:40.032446] event #1: ResourceSlice synced resourceslice="s1" change="add"
    tracker.go:396: I0929 14:28:40.032502] event #2: ResourceSlice add slice="s2"
    tracker.go:581: I0929 14:28:40.032536] event #2: syncing ResourceSlice resourceslice="s2"
    tracker.go:659: I0929 14:28:40.032568] event #2: ResourceSlice synced resourceslice="s2" change="add"
    tracker.go:463: I0929 14:28:40.032609] event #0/#0: DeviceTaintRule add patch="rule"
    tracker.go:581: I0929 14:28:40.032639] event #0/#0: syncing ResourceSlice resourceslice="s1"
    tracker.go:703: I0929 14:28:40.032675] event #0/#0: processing DeviceTaintRule resourceslice="s1" deviceTaintRule="rule"
    tracker.go:807: I0929 14:28:40.032712] event #0/#0: applying matching DeviceTaintRule resourceslice="s1" deviceTaintRule="rule" device="driver1.example.com/pool-1/device-1"
    tracker.go:868: I0929 14:28:40.032780] event #0/#0: Assigned new taint ID, no matching taint resourceslice="s1" deviceTaintRule="rule" device="driver1.example.com/pool-1/device-1" taintID=0 taint="example.com/taint=tainted:NoExecute"
    tracker.go:654: I0929 14:28:40.033023] event #0/#0: ResourceSlice synced resourceslice="s1" change="update" diff=<
        	@@ -23,7 +23,32 @@
        	     "BindingConditions": null,
        	     "BindingFailureConditions": null,
        	     "AllowMultipleAllocations": null,
        	-    "Taints": null
        	+    "Taints": [
        	+     {
        	+      "Rule": {
        	+       "metadata": {
        	+        "name": "rule"
        	+       },
        	+       "spec": {
        	+        "deviceSelector": {
        	+         "pool": "pool-1"
        	+        },
        	+        "taint": {
        	+         "key": "example.com/taint",
        	+         "value": "tainted",
        	+         "effect": "NoExecute",
        	+         "timeAdded": "2006-01-02T15:04:05Z"
        	+        }
        	+       },
        	+       "status": {}
        	+      },
        	+      "ID": 1,
        	+      "key": "example.com/taint",
        	+      "value": "tainted",
        	+      "effect": "NoExecute",
        	+      "timeAdded": "2006-01-02T15:04:05Z"
        	+     }
        	+    ]
        	    }
        	   ],
        	   "Taints": null,
         >
    tracker.go:482: I0929 14:28:40.033224] event #0/#1: DeviceTaintRule update patch="rule" diff=<
        	@@ -4,7 +4,7 @@
        	  },
        	  "spec": {
        	   "deviceSelector": {
        	-   "pool": "pool-1"
        	+   "pool": "pool-2"
        	   },
        	   "taint": {
        	    "key": "example.com/taint",
         >
    tracker.go:581: I0929 14:28:40.033285] event #0/#1: syncing ResourceSlice resourceslice="s1"
    tracker.go:703: I0929 14:28:40.033319] event #0/#1: processing DeviceTaintRule resourceslice="s1" deviceTaintRule="rule"
    tracker.go:654: I0929 14:28:40.033478] event #0/#1: ResourceSlice synced resourceslice="s1" change="update" diff=<
        	@@ -23,32 +23,7 @@
        	     "BindingConditions": null,
        	     "BindingFailureConditions": null,
        	     "AllowMultipleAllocations": null,
        	-    "Taints": [
        	-     {
        	-      "Rule": {
        	-       "metadata": {
        	-        "name": "rule"
        	-       },
        	-       "spec": {
        	-        "deviceSelector": {
        	-         "pool": "pool-1"
        	-        },
        	-        "taint": {
        	-         "key": "example.com/taint",
        	-         "value": "tainted",
        	-         "effect": "NoExecute",
        	-         "timeAdded": "2006-01-02T15:04:05Z"
        	-        }
        	-       },
        	-       "status": {}
        	-      },
        	-      "ID": 1,
        	-      "key": "example.com/taint",
        	-      "value": "tainted",
        	-      "effect": "NoExecute",
        	-      "timeAdded": "2006-01-02T15:04:05Z"
        	-     }
        	-    ]
        	+    "Taints": null
        	    }
        	   ],
        	   "Taints": null,
         >
    tracker.go:581: I0929 14:28:40.033601] event #0/#1: syncing ResourceSlice resourceslice="s2"
    tracker.go:703: I0929 14:28:40.033633] event #0/#1: processing DeviceTaintRule resourceslice="s2" deviceTaintRule="rule"
    ...

Disabling event checking only worked when actually running all sub-tests. When
selectively running only one permutation with -run, the boolean variable was
wrong:

    $ go test -run='.*/^update-patch$' ./staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/
    ok  	k8s.io/dynamic-resource-allocation/resourceslice/tracker

    $ go test -run='.*/^update-patch$/3_2_0_1' ./staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/
    --- FAIL: TestListPatchedResourceSlices (0.01s)
        --- FAIL: TestListPatchedResourceSlices/update-patch (0.00s)
            --- FAIL: TestListPatchedResourceSlices/update-patch/3_2_0_1 (0.00s)

                tracker_test.go:762:
                     	Error Trace:	/nvme/gopath/src/k8s.io/kubernetes/staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/tracker_test.go:762
                     	            				/nvme/gopath/src/k8s.io/kubernetes/staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/tracker_test.go:856
                    	Error:      	Not equal:
                    	            	expected: []tracker.handlerEvent{tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000346000)}}
                    	            	actual  : []tracker.handlerEvent{tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc0001f9ba0)}, tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc000301d40), newObj:(*api.ResourceSlice)(0xc0003dba00)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc0003dba00), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc0001f9ba0), newObj:(*api.ResourceSlice)(0xc0003dbba0)}}

Now permutations are detected automatically based on the indices.

While at it, documentation gets moved around a bit to make reading test cases
easier without going to the implementation.
enj pushed a commit that referenced this pull request Jan 27, 2026
DRA depends on the assume cache having invoked all event handlers before
Assume() returns, because DRA maintains state that is relevant for scheduling
through those event handlers.

This log snippet shows how this went wrong during PreBind:

    dynamicresources.go:1150: I0115 10:35:29.264437] scheduler: Claim stored in assume cache pod="testdra-all-usesallresources-kqjpj/my-pod-0091" claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 resourceVersion="5636"
    dra_manager.go:198: I0115 10:35:29.264448] scheduler: Removed in-flight claim claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 version="287"
    dynamicresources.go:1157: I0115 10:35:29.264463] scheduler: Removed claim from in-flight claims pod="testdra-all-usesallresources-kqjpj/my-pod-0091" claim="testdra-all-usesallresources-kqjpj/claim-0091" uid=<types.UID>: 516f274f-e1a9-4a4b-b7d2-bb86138e4240 resourceVersion="5636" allocation=<
    ...
    allocateddevices.go:189: I0115 10:35:29.267315] scheduler: Observed device allocation device="testdra-all-usesallresources-kqjpj.driver/worker-1/worker-1-device-096" claim="testdra-all-usesallresources-kqjpj/claim-0091"

- goroutine #1: UpdateStatus result delivered via informer.
  AssumeCache updates cache, pushes event A, emitEvents pulls event A from queue.
  *Not* done with delivering it yet!
- goroutine #2: AssumeCache.Assume called. Updates cache, pushes event B, emits it.
  Old and new claim have allocation, so no "Observed device allocation".
- goroutine #3: Schedules next pod, without considering device as allocated (not in the log snippet).
- goroutine #1: Finally delivers event A: "Observed device allocation", but too late.

Also, events are delivered out-of-order.

The fix is to let emitEvents when called by Assume wait for a potentially
running emitEvents in some other goroutine, thus ensuring that an event pulled
out of the queue by that other goroutine got delivered before Assume itself
checks the queue one more time and then returns.

The time window were things go wrong is small. An E2E test covering this only
flaked rarely, and only in the CI. An integration test (separate commit) with
higher number of pods finally made it possible to reproduce locally. It also
uncovered a second race (fix in separate commit).

The unit test fails without the fix:

    === RUN   TestAssumeConcurrency
        assume_cache_test.go:311: FATAL ERROR:
            	Assume should have blocked and didn't.
    --- FAIL: TestAssumeConcurrency (0.00s)
enj pushed a commit that referenced this pull request Jan 27, 2026
GatherAllocatedState and ListAllAllocatedDevices need to collect information
from different sources (allocated devices, in-flight claims), potentially even
multiple times (GatherAllocatedState first gets allocated devices, then the
capacities).

The underlying assumption that nothing bad happens in parallel is not always
true. The following log snippet shows how an update of the assume
cache (feeding the allocated devices tracker) and in-flight claims lands such
that GatherAllocatedState doesn't see the device in that claim as allocated:

    dra_manager.go:263: I0115 15:11:04.407714      18778] scheduler: Starting GatherAllocatedState
    ...
    allocateddevices.go:189: I0115 15:11:04.407945      18066] scheduler: Observed device allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-094" claim="testdra-all-usesallresources-hvs5d/claim-0553"
    dynamicresources.go:1150: I0115 15:11:04.407981      89109] scheduler: Claim stored in assume cache pod="testdra-all-usesallresources-hvs5d/my-pod-0553" claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 resourceVersion="5680"
    dra_manager.go:201: I0115 15:11:04.408008      89109] scheduler: Removed in-flight claim claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 version="1211"
    dynamicresources.go:1157: I0115 15:11:04.408044      89109] scheduler: Removed claim from in-flight claims pod="testdra-all-usesallresources-hvs5d/my-pod-0553" claim="testdra-all-usesallresources-hvs5d/claim-0553" uid=<types.UID>: a84d3c4d-f752-4cfd-8993-f4ce58643685 resourceVersion="5680" allocation=<
        	{
        	  "devices": {
        	    "results": [
        	      {
        	        "request": "req-1",
        	        "driver": "testdra-all-usesallresources-hvs5d.driver",
        	        "pool": "worker-5",
        	        "device": "worker-5-device-094"
        	      }
        	    ]
        	  },
        	  "nodeSelector": {
        	    "nodeSelectorTerms": [
        	      {
        	        "matchFields": [
        	          {
        	            "key": "metadata.name",
        	            "operator": "In",
        	            "values": [
        	              "worker-5"
        	            ]
        	          }
        	        ]
        	      }
        	    ]
        	  },
        	  "allocationTimestamp": "2026-01-15T14:11:04Z"
        	}
         >
    dra_manager.go:280: I0115 15:11:04.408085      18778] scheduler: Device is in flight for allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-095" claim="testdra-all-usesallresources-hvs5d/claim-0086"
    dra_manager.go:280: I0115 15:11:04.408137      18778] scheduler: Device is in flight for allocation device="testdra-all-usesallresources-hvs5d.driver/worker-5/worker-5-device-096" claim="testdra-all-usesallresources-hvs5d/claim-0165"
    default_binder.go:69: I0115 15:11:04.408175      89109] scheduler: Attempting to bind pod to node pod="testdra-all-usesallresources-hvs5d/my-pod-0553" node="worker-5"
    dra_manager.go:265: I0115 15:11:04.408264      18778] scheduler: Finished GatherAllocatedState allocatedDevices=<map[string]interface {} | len:2>: {

Initial state: "worker-5-device-094" is in-flight, not in cache
- goroutine #1: starts GatherAllocatedState, copies cache
- goroutine #2: adds to assume cache, removes from in-flight
- goroutine #1: checks in-flight

=> device never seen as allocated

This is the second reason for double allocation of the same device in two
different claims. The other was timing in the assume cache. Both were
tracked down with an integration test (separate commit). It did not fail
all the time, but enough that regressions should show up as flakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants