Skip to content

Conversation

@jameslamb
Copy link
Member

Contributes to rapidsai/build-planning#73

Proposes the following:

  • adding CUDA 12.5 images
  • PR builds:
    • +1 test job covering (cuda=12.5, arch=arm64)
  • branch builds:
    • +1 test job covering (cuda=12.5, arch=x86_64)
    • -1 test job covering (cuda=12.2, arch=arm64)

context: rapidsai/build-planning#73 (comment)

Notes for Reviewers

Per offline discussion with @raydouglass , this would be accompanied by a deprecation notice in the RAPIDS 24.08 release stating that the CUDA 12.2 images will be removed in some future release (future release not yet determined).

@jameslamb jameslamb added the 2 - In Progress Currenty a work in progress label Jul 16, 2024
@jameslamb jameslamb requested a review from a team as a code owner July 16, 2024 20:30
Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks James! 🙏

Am curious what you think about swapping where the x86_64 and aarch64 jobs are

For more context CUDA 12.2 added some interesting things like cuFile (for ARM) and HMM, which are helpful to test on CUDA 12.2 ARM specifically. Am not aware of a similar change in CUDA 12.5 (though could be missing it)

Given there are fewer ARM machines than x86_64, it may make sense to use x86_64 primarily in PRs and run ARM more on branches. That would mean less ARM jobs are needed during PR development

Anyways please let me know what you think 🙂

@jameslamb
Copy link
Member Author

Thanks @jakirkham ! That's helpful context I didn't have. I just applied your suggestions.

@jakirkham
Copy link
Member

Updating to pull in PR ( #690 ) now that is in branch-24.08

@jameslamb
Copy link
Member Author

See this

9.818 Running dfg on cuspatial
9.939 Traceback (most recent call last):
9.939   File "/opt/conda/bin/rapids-dependency-file-generator", line 10, in <module>
9.939     sys.exit(main())
9.939              ^^^^^^
9.939   File "/opt/conda/lib/python3.11/site-packages/rapids_dependency_file_generator/_cli.py", line 121, in main
9.939     make_dependency_files(
9.939   File "/opt/conda/lib/python3.11/site-packages/rapids_dependency_file_generator/_rapids_dependency_file_generator.py", line 421, in make_dependency_files
9.940     raise ValueError(f"No matching matrix found in '{include}' for: {matrix_combo}")
9.940 ValueError: No matching matrix found in 'cuda_version' for: {'cuda': '12.5', 'arch': 'x86_64', 'py': '3.9'}

(build link)

This will be broken until rapidsai/cuspatial#1405 is merged. And that is blocked until CI in cuspatial is fixed (which could be done by merging rapidsai/cuspatial#1407).

Copy link
Member

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one quick question. Approved so it doesn't block merging.

Comment on lines 5 to 6
ARG LINUX_DISTRO=ubuntu
ARG LINUX_DISTRO_VER=22.04
Copy link
Member

@jakirkham jakirkham Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need not be done in this PR. Just the discussion above ( #689 (comment) ) made me wonder

Should we do the same thing for the Linux version?

Suggested change
ARG LINUX_DISTRO=ubuntu
ARG LINUX_DISTRO_VER=22.04
ARG LINUX_DISTRO=unset
ARG LINUX_DISTRO_VER=unset

If this sounds reasonable, happy to send a 2nd PR with this change. Also ok not doing this if we don't think it makes sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in my opinion, this is a pattern that should be used in any Dockerfile where it's unacceptable to silently fall back to default values if configuration is not passed.

I just didn't want to go any further with it in this PR and grow the diff here.

I'd support a 2nd PR with this change.

@jakirkham
Copy link
Member

jakirkham commented Jul 19, 2024

One of the 11.8.0 tests failed on CI. Restarted it

That said, we are not changing 11.8.0 builds here and all other builds pass

Edit: Actually that won't work since the images get deleted after

@jakirkham
Copy link
Member

Going to try restarting from scratch. If that doesn't work, we can discuss next steps on Monday

@raydouglass
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit d240fc3 into rapidsai:branch-24.08 Jul 20, 2024
@jakirkham
Copy link
Member

Thanks all! 🙏

rapids-bot bot pushed a commit that referenced this pull request Jan 8, 2026
Contributes to #832 

Removes reliance on https://hub.docker.com/r/rapidsai/ci-conda in this repo. It was only being used in 2 places, both of which could be run in simpler environments:

* in a `compute-matrix` job that just needs `bash`, `jq` and `yq` --> regular old `ubuntu-latest` GitHub Actions VM is fine
* in a part of a multi-stage build that just needs `bash`, `git`, `pip`, and a Python interpreter --> `python:{version}` container is fine

Other changes based on PR reviewers:

* puts versions to be managed by renovate in a `versions.yaml` ([borrowing the approach from rapidsai/ci-imgs](https://github.com/rapidsai/ci-imgs/blob/28b8b5ac127bcf5cd6679cd8ee40a53ce7240421/ci/compute-build-args.sh#L36))
* updates shell code in Dockerfiles to match our preferred style (including the array format for package lists from rapidsai/ci-imgs#343)
* sets all empty arg defaults to `notset` (ref: #689 (comment))
  - *weakly prefer `notset`  to `unset` because `unset` is a valid shell command*
* alphabetizes build args in Dockerfiles

## Notes for Reviewers

### This is only the beginning

To be clear, this project still heavily relies on `rapidsai/miniforge-cuda` (the base image for `rapidsai/ci-conda`. So there's more work to do for #832.

But this at least means we no longer need to care about anything in these lines breaking the things in this repo 😁 :

https://github.com/rapidsai/ci-imgs/blob/55b24734b7d594e768599771dcae2e718af52f9a/ci-conda.Dockerfile#L183-L313

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #834
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 - In Progress Currenty a work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants