The accelerator role allows users to set up the AMD ROCm platform or the CUDA Nvidia toolkit. These tools allow users to unlock the potential of installed GPUs.
Enter all required parameters in input/accelerator_config.yml
.
Parameters | Details |
---|---|
|
This variable accepts the amd gpu version for the RHEL specific OS version. Verify if the version provided is present in the repo for the OS version on your node. Verify the url for the compatible version: https://repo.radeon.com/amdgpu/ . If 'latest' is provided in the variable and the compute os version is rhel 8.5. Then the url transforms to https://repo.radeon.com/amdgpu/latest/rhel/8.5/main/x86_64/
Default values:
|
|
Required AMD ROCm driver version. Make sure the subscription is enabled for rocm installation because rocm packages are present in code ready builder repo for RHEL. If 'latest' is provided in the variable, the url transforms to https://repo.radeon.com/rocm/centos8/latest/main/. Only single instance is supported by Omnia.
Default values:
|
|
Required CUDA toolkit version. By default latest cuda is installed unless cuda_toolkit_path is specified. Default: latest (11.8.0).
Default values:
|
|
If the latest cuda toolkit is not required, provide an offline copy of the toolkit installer in the path specified. (Take an RPM copy of the toolkit from here). If cuda_toolkit_version is not latest, giving cuda_toolkit_path is mandatory. |
|
A stream in CUDA is a sequence of operations that execute on the device in the order in which they are issued by the host code.
Default values:
|
Note
- Nodes provisioned using the Omnia provision tool do not require a RedHat subscription to run
accelerator.yml
on RHEL target nodes. - For RHEL target nodes not provisioned by Omnia, ensure that RedHat subscription is enabled on all target nodes. Every target node will require a RedHat subscription.
- If
cuda_toolkit_path
is provided ininput/provision_config.yml
and NVIDIA GPUs are available on the target nodes, CUDA packages will be deployed post provisioning without user intervention during the execution ofprovision.yml
.
To install all the latest GPU drivers and toolkits, run:
cd accelerator ansible-playbook accelerator.yml -i inventory
(where inventory consists of manager, compute and login nodes)
- The following configurations take place when running
accelerator.yml
- Servers with AMD GPUs are identified and the latest GPU drivers and ROCm platforms are downloaded and installed.
- Servers with NVIDIA GPUs are identified and the specified CUDA toolkit is downloaded and installed.
- For the rare servers with both NVIDIA and AMD GPUs installed, all the above mentioned download-ables are installed to the server.
- Servers with neither GPU are skipped.