Change in semantics for the device plugin registration process

### What happened?

The following refactoring of the kubelet's devicemanager was done as part of the 1.25 release: https://github.com/kubernetes/kubernetes/commit/db88676c20629354d8eeb4b7014ae7884481ffca

As part of this, a change in semantics was (unintentionally) introduced which is breaking at least one 3rd party plugin. The logic required to go back to the old semantics is fairly straight forward. However, it's not obvious that we should actually go back to the old semantics because (at least in my opinion) the new semantics are actually more correct.

Previously, the kubelet's devicemanager would do the following upon receiving a plugin registration request:
1. Register the plugin
2. Launch a go-routine to connect to the gRPC service being served by the plugin (with a timeout of 10s)
3. Return

This gave the plugin the opportunity to first register itself with the kubelet and then (within a 10s time window) start its gRPC server.

The new semantics are similar, except that no go-routine is launched to connect to the gRPC service. The connection is attempted synchronously. This means that the plugin must now start serving its gRPC server *before* registering itself with the kubelet. Otherwise the registration call will fail.

As mentioned before, this change was not necessarily intentional, but it's also not clear if it should be reverted. It brings the device plugin's custom registration process more in line with the kubelet's standard registration process and the order is actually a bit more intuitive. Why bother registering if you aren't able to start serving your API?

The first plugin to report this problem was the KubeVirt plugin and a discussion of the issue can be found here:
https://groups.google.com/g/kubevirt-dev/c/HE2lVvsLd3Y

The question now is -- should we revert the kubelet code to the old semantics to avoid breaking plugins that rely on this (somewhat awkward) ordering. Or do we leave the new semantics in place and send out an announcement of this (potentially) breaking change?

One thing to keep in mind -- if we go with the second option (i.e. keep the new semantics and force plugins to update), the updated plugins will continue to work on older versions of kubernetes since the ordering now implied in the new semantics has always been correct.

### What did you expect to happen?

Plugins relying on the old semantics would continue to work on kubernetes 1.25

### How can we reproduce it (as minimally and precisely as possible)?

Write a plugin that:
1. Registers itself with the kubelet
2. Only then starts to server the plugin API via a gRPC server

### Anything else we need to know?

_No response_

### Kubernetes version

<details>
1.25
</details>


### Cloud provider

<details>

</details>


### OS version

<details>

```console
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
```

</details>


### Install tools

<details>

</details>


### Container runtime (CRI) and version (if applicable)

<details>

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change in semantics for the device plugin registration process #112395

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Change in semantics for the device plugin registration process #112395

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions