-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for Accelerators to GCE clusters. #45130
Conversation
vishh
commented
Apr 28, 2017
Signed-off-by: Vishnu kannan <[email protected]>
@dashpole will you be able to review this PR? |
local attempt=1 | ||
while true; do | ||
echo "Attempt ${attempt} to create ${1}" >&2 | ||
if ! ${gcloud} compute instance-templates create \ | ||
if ! ${gcloud} beta compute instance-templates create \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedanese @zmerlynn any objections towards switching OSS to using beta APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do similar to this PR in gke? cc @roberthbailey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a change list out for GKE already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedanese can I assume no objections towards this change then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not from me. Thanks.
@@ -55,6 +58,11 @@ if [[ "${NODE_OS_DISTRIBUTION}" == "cos" ]]; then | |||
NODE_OS_DISTRIBUTION="gci" | |||
fi | |||
|
|||
# GPUs supported in GCE do not have compatible drivers in Debian 7. | |||
if [[ "${NODE_OS_DISTRIBUTION}" == "debian" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are blacklisting debian for GPU support. Should we instead whitelist gci? May be better if we want to add more OS distributions (ubuntu?) in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ubuntu should work as well. Hence the blacklist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in that case this is a good approach.
# VMs with Accelerators cannot be live migrated. | ||
# More details here - https://cloud.google.com/compute/docs/gpus/add-gpus#create-new-gpu-instance | ||
if [[ ! -z "${NODE_ACCELERATORS}" ]]; then | ||
accelerator_args="--maintenance-policy TERMINATE --restart-on-failure --accelerator ${NODE_ACCELERATORS}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it matter that terminate is all caps? Looks like the documentation has it lowercase...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jk, found the full example here
code changes lgtm. feel free to add the label once you figure out beta apis question. |
/approve |
adding lgtm tag based on #45130 (comment) /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mikedanese, vishh
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 44830, 45130) |