-
Notifications
You must be signed in to change notification settings - Fork 42k
Description
What happened?
The following refactoring of the kubelet's devicemanager was done as part of the 1.25 release: db88676
As part of this, a change in semantics was (unintentionally) introduced which is breaking at least one 3rd party plugin. The logic required to go back to the old semantics is fairly straight forward. However, it's not obvious that we should actually go back to the old semantics because (at least in my opinion) the new semantics are actually more correct.
Previously, the kubelet's devicemanager would do the following upon receiving a plugin registration request:
- Register the plugin
- Launch a go-routine to connect to the gRPC service being served by the plugin (with a timeout of 10s)
- Return
This gave the plugin the opportunity to first register itself with the kubelet and then (within a 10s time window) start its gRPC server.
The new semantics are similar, except that no go-routine is launched to connect to the gRPC service. The connection is attempted synchronously. This means that the plugin must now start serving its gRPC server before registering itself with the kubelet. Otherwise the registration call will fail.
As mentioned before, this change was not necessarily intentional, but it's also not clear if it should be reverted. It brings the device plugin's custom registration process more in line with the kubelet's standard registration process and the order is actually a bit more intuitive. Why bother registering if you aren't able to start serving your API?
The first plugin to report this problem was the KubeVirt plugin and a discussion of the issue can be found here:
https://groups.google.com/g/kubevirt-dev/c/HE2lVvsLd3Y
The question now is -- should we revert the kubelet code to the old semantics to avoid breaking plugins that rely on this (somewhat awkward) ordering. Or do we leave the new semantics in place and send out an announcement of this (potentially) breaking change?
One thing to keep in mind -- if we go with the second option (i.e. keep the new semantics and force plugins to update), the updated plugins will continue to work on older versions of kubernetes since the ordering now implied in the new semantics has always been correct.
What did you expect to happen?
Plugins relying on the old semantics would continue to work on kubernetes 1.25
How can we reproduce it (as minimally and precisely as possible)?
Write a plugin that:
- Registers itself with the kubelet
- Only then starts to server the plugin API via a gRPC server
Anything else we need to know?
No response
Kubernetes version
Details
1.25Cloud provider
Details
OS version
Details
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output hereInstall tools
Details
Container runtime (CRI) and version (if applicable)
Details
Related plugins (CNI, CSI, ...) and versions (if applicable)
Details
Metadata
Metadata
Assignees
Labels
Type
Projects
Status