Description
With the recent CCCL update (rapidsai/rapids-cmake#607), we should now be able to build RAPIDS with CUDA versions 12.5 and older.
We have CUDA driver R550 in CI now, which only supports up to CUDA 12.4, so that's the latest version we could adequately test. CUDA 12.5 needs driver R555, which does not yet have a production branch (PB) or long-term support (LTS) release.
edit: R550 is a Production Branch driver, and therefore supports CUDA Forward Compatibility with CUDA 12.5 containers. This means we are able to support CUDA 12.5 (the latest version at the time of writing).
I propose to update CI images, shared workflows, devcontainers, etc. to replace CUDA 12.2 with CUDA 12.5. We would retain CI testing for CUDA 12.0 as a lower bound of 12.x. This will also align with PyTorch's upcoming CUDA 12.4 support (there have been a series of PRs adding CUDA 12.4 support like pytorch/builder#1720). edit: We will upgrade to the latest CUDA, 12.5, instead of 12.4. I will separately address the issues of CUDA compatibility questions between RAPIDS and PyTorch by working on our docs and release selector (see also: https://github.com/rapidsai/build-infra/issues/55).
Tasks
- Add 12.5.0 miniforge-cuda images Add CUDA 12.5.0. miniforge-cuda#67
- Add 12.5.0 CI images Add CUDA 12.5.0. ci-imgs#153
- Update miniforge-cuda to 12.5.1 Update to CUDA 12.5.1 miniforge-cuda#71
- Depends on https://gitlab.com/nvidia/container-images/cuda being updated to 12.5.1
- Update CI images to 12.5.1 (after miniforge-cuda update) Use CUDA 12.5.1 instead ci-imgs#159
We can start this work now (not blocked by 12.5.1 updates above):
- Open PR to shared-workflows Update CUDA version to 12.5.1 shared-workflows#229
- Similar to prior work on CUDA 12.2, except we're just replacing 12.2.2 with 12.5.x
- Open PRs for every RAPIDS repository to update 12.2.2 to 12.5.x (xref'd below)
- See past CUDA 12.2 migration PRs for reference.
- Most of the items below can be automated:
- Add
cuda-version
matrix entry for 12.5 - Update
.github/workflows/
to use shared-workflows branch - Update any
matrix_filter
entries using 12.2 to 12.5 - Update README/CONTRIBUTING docs to use 12.5, especially for installation
- Update devcontainers to 12.5
- Add
- rmm
- cudf
- ...
- Publish new images from
rapidsai/docker
(add CUDA 12.5 images docker#689)
Once all repos are migrated, merge the shared-workflows
PR and then revert to the current default shared-workflows
branch.
Docs changes (wait until all repos are migrated):
- Update release selector for nightlies (past example) Add CUDA 12.5 to conda nightlys in the selector docs#522
- Update release selector for stable builds at release time (past example): Updates for RAPIDS 24.08 docs#529
- Update rapids.ai at release time (past example): Updates for RAPIDS 24.08 rapids.ai#389
Activity