TL;DR I want to support a build of the libc++
library targeting AMDGPU and NVPTX devices in the same vein as libc for GPUs — The LLVM C Library. This will make standard C++ runtime features available on the GPU.
Motivation
Currently, offloading languages like CUDA / HIP / OpenMP can only support C++ features on the GPU that are marked constexpr
. Providing a functioning C++ standard library on the GPU would allow more C++ programs to be trivially run on the GPU without modification. This is also attractive for efforts like CUDA or hip stdpar
that blur the line between GPU and CPU code.
Obviously, many features of the C++ standard library aren’t overly valuable when put on the GPU. But, if we can compile it we may as well provide it.
Overview
The proposed build process is to use the same support we use for the LLVM C library for GPUs. This is mostly focused around treating GPU libraries as cross-compiling targets, for example Compiler Explorer. This allows us to compile C++ for the GPU with minimal modifications and distribute it as a standard LTO enabled static library.
Enabling the build will be done using the LLVM_RUNTIME_TARGETS
support for cross-compiling to multiple target architectures. This option will create separate CMake jobs for each target architecture. Individual arguments can then be passed to these builds using the RUNTIMES_<target>_<option>
syntax. This allows us to stack libc++
on top of the existing libc
and compiler-rt
projects that can also be built for the GPU. The current invocation currently looks something like this:
$ cmake ../llvm -GNinja -DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS='clang;lld' -C ../libcxx/cmake/caches/GPU.cmake \
-DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx \
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx \
-DLLVM_RUNTIME_TARGETS=default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda
This will use the LLVM_ENABLE_PER_RUNTIME_DIR
option to install the libraries into a separate directory so they do not conflict with each other, for example,
<install>/lib/amdgcn-amd-amdhsa/libc.a
<install>/lib/amdgcn-amd-amdhsa/libc++.a
<install>/lib/clang/<major>/lib/amdgcn-amd-amdhsa/libclang_rt.builtins.a
Expected Support
The GPU build will include a config that disables C++ library features that are not easily implemented on the GPU. This currently excludes exceptions, RTTI, filesystem, threads, random device, and locales. Features like streams currently do not work due to locale.h
not being implemented in the LLVM libc.
Required changes
The libc++
project uses the __config
header to control which features are available depending on the target. We simply need to provide checks on the __NVPTX__
and __AMDGPU__
macros, similar to how things like BSD or musl are handled. So far, most of the complexity is hidden in the libc
project and I have only needed to disable a handful of features. I have made a series of patches that allow us to build a functional libc++
on the GPU awaiting the response to this RFC, [libcxx] Add cache file for the GPU build by jhuber6 · Pull Request #99348 · llvm/llvm-project · GitHub.
We will need a new configuration to tell libc++
to build using the LLVM C library in-tree, similar to libunwind
or libc++abi
. This is currently being handled by @petrhosek.
Testing
The goal is to set up a pre-commit CI interface for building and testing the GPU support and add it to the existing builds of the GPU C library. The first step will be to make a CI interface that simply builds the project and then work on getting tests to run.
Testing will be done using the same interface the LLVM C library uses in Testing the GPU C library — The LLVM C Library. To summarize, we use a minimal GPU runtime and startup utilities to pretend like the GPU is a normal target and execute a main
function. This will allow us to compile and run trivial unit tests.
The libc++
project has a similar interface using the -Dexecutor
option that uses a separate executable to launch the tests. We can simply pass this to the amdhsa-loader
after linking in the crt1.o
file that contains the _start
definition from the C library.
It’s likely that there will be tests that must be disabled, we could add a // UNSUPPORTED: gpu
option for the lit tests as-needed. I will need to investigate the test suite thoroughly to see if there are any unit tests that depend on external libraries.
Future Work
Ideally this project will be merged into existing offloading languages. This will may require modifying headers to declare certain things as available with __device__
or similar. The locale.h
interface will need to be implemented at least for the default locale in the LLVM C library. Allocation functions like aligned_alloc
also need to be added to the LLVM C library’s GPU build.
@jdoerfert @Artem-B @yxsamliu @ldionne @philnik @JonChesterfield @mordante @petrhosek @michaelrj-google