Skip to content

Latest commit

 

History

History

accelerator

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Accelerator

The accelerator role allows users to set up the AMD ROCm platform or the CUDA Nvidia toolkit. These tools allow users to unlock the potential of installed GPUs.

Enter all required parameters in input/accelerator_config.yml.

Parameters Details
amd_gpu_version
string Optional

This variable accepts the amd gpu version for the RHEL specific OS version. Verify if the version provided is present in the repo for the OS version on your node. Verify the url for the compatible version: https://repo.radeon.com/amdgpu/ . If 'latest' is provided in the variable and the compute os version is rhel 8.5. Then the url transforms to https://repo.radeon.com/amdgpu/latest/rhel/8.5/main/x86_64/

Default values: 22.20.3
amd_rocm_version
string Optional

Required AMD ROCm driver version. Make sure the subscription is enabled for rocm installation because rocm packages are present in code ready builder repo for RHEL. If 'latest' is provided in the variable, the url transforms to https://repo.radeon.com/rocm/centos8/latest/main/. Only single instance is supported by Omnia.

Default values: latest/main
cuda_toolkit_version
string Optional

Required CUDA toolkit version. By default latest cuda is installed unless cuda_toolkit_path is specified. Default: latest (11.8.0).

Default values: latest
cuda_toolkit_path
string Optional
If the latest cuda toolkit is not required, provide an offline copy of the toolkit installer in the path specified. (Take an RPM copy of the toolkit from here). If cuda_toolkit_version is not latest, giving cuda_toolkit_path is mandatory.
cuda_stream
string
Optional

A stream in CUDA is a sequence of operations that execute on the device in the order in which they are issued by the host code.

Default values: latest-dkms

Note

  • Nodes provisioned using the Omnia provision tool do not require a RedHat subscription to run accelerator.yml on RHEL target nodes.
  • For RHEL target nodes not provisioned by Omnia, ensure that RedHat subscription is enabled on all target nodes. Every target node will require a RedHat subscription.
  • If cuda_toolkit_path is provided in input/provision_config.yml and NVIDIA GPUs are available on the target nodes, CUDA packages will be deployed post provisioning without user intervention during the execution of provision.yml.

To install all the latest GPU drivers and toolkits, run:

cd accelerator
ansible-playbook accelerator.yml -i inventory

(where inventory consists of manager, compute and login nodes)

The following configurations take place when running accelerator.yml
  1. Servers with AMD GPUs are identified and the latest GPU drivers and ROCm platforms are downloaded and installed.
  2. Servers with NVIDIA GPUs are identified and the specified CUDA toolkit is downloaded and installed.
  3. For the rare servers with both NVIDIA and AMD GPUs installed, all the above mentioned download-ables are installed to the server.
  4. Servers with neither GPU are skipped.