-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Juju: Enable GPU mode if GPU hardware detected #43467
Conversation
Hi @tvansteenburgh. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
9bb0e37
to
35b6f1b
Compare
@k8s-bot ok to test |
@@ -0,0 +1,48 @@ | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure we have the right headers in place for these new files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, right. Good catch, will fix.
/approve |
@k8s-bot cvm gce e2e test this |
layer-nvidia-cuda does the hardware detection and sets a state that the worker can react to. When gpu is available, worker updates config and restarts kubelet to enable gpu mode. Worker then notifies master that it's in gpu mode via the kube-control relation. When master sees that a worker is in gpu mode, it updates to privileged mode and restarts kube-apiserver. The kube-control interface has subsumed the kube-dns interface functionality. An 'allow-privileged' config option has been added to both worker and master charms. The gpu enablement respects the value of this option; i.e., we can't enable gpu mode if the operator has set allow-privileged="false".
78a6951
to
c87ac5e
Compare
# Not sure why this is necessary, but if you don't run this, k8s will | ||
# think that the node has 0 gpus (as shown by the output of | ||
# `kubectl get nodes -o yaml` | ||
check_call(['nvidia-smi']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open a bug for this and cc @cmluciano
remove_state('kubernetes-worker.kubelet.restart') | ||
|
||
|
||
@when('cuda.installed') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only check for detecting if GPUs are exposed? I may be missing logic somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the actual logic for detection is in a separate layer that this charm uses (https://github.com/juju-solutions/layer-nvidia-cuda). If GPUs are detected, the nvidia-cuda layer sets the 'cuda.installed' state so other layers can react to it.
+1 LGTM /approve |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chuckbutler, marcoceppi, tvansteenburgh
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 44047, 43514, 44037, 43467) |
What this PR does / why we need it:
Automatically configures kubernetes-worker node to utilize GPU hardware when such hardware is detected.
layer-nvidia-cuda does the hardware detection, installs CUDA and Nvidia
drivers, and sets a state that the k8s-worker can react to.
When gpu is available, worker updates config and restarts kubelet to
enable gpu mode. Worker then notifies master that it's in gpu mode via
the kube-control relation.
When master sees that a worker is in gpu mode, it updates to privileged
mode and restarts kube-apiserver.
The kube-control interface has subsumed the kube-dns interface
functionality.
An 'allow-privileged' config option has been added to both worker and
master charms. The gpu enablement respects the value of this option;
i.e., we can't enable gpu mode if the operator has set
allow-privileged="false".
Special notes for your reviewer:
Quickest test setup is as follows:
kube-control interface: https://github.com/juju-solutions/interface-kube-control
nvidia-cuda layer: https://github.com/juju-solutions/layer-nvidia-cuda
(Both are registered on http://interfaces.juju.solutions/)
Release note: