Allow shared object loading across layers #38

ncoghlan · 2024-10-23T04:05:58Z

Environment stacks on platforms other than Windows currently don't correctly support shared object (aka dynamic library) loading across different layers (Windows is different due to its reliance on os.add_dll_directory even within a single virtual environment).

It should be possible to resolve this limitation by:

Adding a new share/venv/dynlib folder within each non-Windows environment layer which contains symlinks to all of the shared objects found under the site-packages directory that aren't specifically marked as being Python extension modules
Replacing the symlink to the underlying Python implementation in each non-Windows layered environment with a wrapper script that sets the shared object loading environment variables appropriately, and then uses exec -a to invoke the underlying base Python runtime while still having sys.executable refer to the wrapper script inside the virtual environment

Additional implementation notes:

https://github.com/lmstudio-ai/venvstacks/blob/main/misc/find_shared_libs.py contains an example script for
finding shared objects in a directory tree, while excluding files that specifically match the suffixes defined for Python
binary extension modules. Note that these extensions are version dependent, so any scan should be executed with the
layered environment's base Python runtime, not with the Python version that happens to be running venvstacks itself.
the folder of symlinks will be generated in all environment layers (even application environments) as part of building the
environment. This avoids some potential future problems if it is ever decided to move beyond the strict three-tier layering and
instead allow applications to depend on other applications and frameworks to depend on other frameworks
(as Allow framework layers to form a directed acyclic graph #18 proposes for framework layers)
to manage false positives in the shared object scanning and to resolve naming conflicts,
a library symlink exclusion mechanism will need to be defined (attempting to symlink multiple
dynamic libraries with the same name in the same layer will cause a fatal build error, with the
exclusion mechanism used to ensure at most one of the conflicting libraries gets symlinked)
the wrapper scripts will be generated as part of linking the environment layers together
(i.e. at the same time sitecustomize.py is generated)
Python venv implementations and versions can vary as to which of the python, python3, and python3.x
symlinks is the one that actually links to the base runtime environment, and which are just internal symlinks
within the virtual environment. Rather than making assumptions, the link replacement logic will be:
- links within the environment are left alone
- links to targets outside the environment are renamed with a leading underscore,
  and a copy of the wrapper injected using their original name (executing the new name)
to allow embedding apps to impose their own dynamic library loading preferences,
the wrapper scripts will put the paths they add after any existing entries
when multiple frameworks are referenced from an application environment,
the order of listing in the layer spec determines the order of the shared folder lookup path priority
(using the same linearisation rules as sys.path, assuming Allow framework layers to form a directed acyclic graph #18 is implemented)
the wrapper scripts don't need to help with locating shared libraries from their own environment
(as the relative paths injected at library build time should suffice for that),
they're specifically for finding shared libraries published in lower layers

Background

Consider the following virtual environment with pytorch installed from PyPI:

(dynlib_example) ~/devel/dynlib_example$ pip list | grep torch
torch                    2.5.0
torchaudio               2.5.0
torchvision              0.20.0

The libtorch.so extension module within that environment includes relative load paths for several potential nVidia dependencies:

(dynlib_example) ~/devel/dynlib_example$ readelf -d lib/python3.12/site-packages/torch/lib/libtorch.so  | grep 'R.*PATH'
 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/../../nvidia/cublas/lib:$ORIGIN/../../nvidia/cuda_cupti/lib:$ORIGIN/../../nvidia/cuda_nvrtc/lib:$ORIGIN/../../nvidia/cuda_runtime/lib:$ORIGIN/../../nvidia/cudnn/lib:$ORIGIN/../../nvidia/cufft/lib:$ORIGIN/../../nvidia/curand/lib:$ORIGIN/../../nvidia/cusolver/lib:$ORIGIN/../../nvidia/cusparse/lib:$ORIGIN/../../nvidia/nccl/lib:$ORIGIN/../../nvidia/nvtx/lib:$ORIGIN]

This works because those nvidia libraries are installed into the same virtual environment:

(dynlib_example) ~/devel/dynlib_example$ ls lib/python3.12/site-packages/nvidia
cublas      cuda_nvrtc    cudnn  curand    cusparse     nccl       nvtx
cuda_cupti  cuda_runtime  cufft  cusolver  __init__.py  nvjitlink

In the context of venvstacks, this means that pytorch and the nVidia libraries must be installed as part of the same layer definition. Attempting to move the nVidia libaries lower in the stack (either to the base runtime layer, or to a separate framework layer if #18 is implemented) will fail, since the dynamic library loading will fail.

This is a reasonably common pattern, and one of the main reasons folks point out that the Python environment layering pattern implemented by venvstacks doesn't work in the general case: whereas Python extension module DLLs on Windows are able to make themselves dynamically discoverable with os.add_dll_directory, POSIX shared objects rely more heavily on relative paths that are fixed at module build time (and hence are only correct when the library and its dependencies are installed into the same target environment) and the LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH on macOS) setting, which needs to be configured prior to application startup (it can't be manipulated at runtime the way the Windows DLL search path can be).

If you're aware of the problem, it can be managed, but if you're not already aware of the possibility, the consequences of running into it can be utterly baffling to try and debug when all you have to work with is a cryptic shared object loading failure when Python attempts to import an extension module with a dynamically linked dependency that can't be resolved.

Finding shared objects to symlink

Simply searching for and symlinking all .so objects in a layered environment would result in a lot of pointless symlinks to Python binary extension modules that are only loaded directly after the interpreter finds them via sys.path.

https://github.com/lmstudio-ai/venvstacks/blob/main/misc/find_shared_libs.py proposes a better algorithm for that, which filters out the shared objects that specifically look like Python extension modules:

(dynlib_example) ~/devel/dynlib_example$ find . -name '*.so' | wc -l
61
(dynlib_example) ~/devel/dynlib_example$ ../venvstacks/misc/find_shared_libs.py . | wc -l
32

For this example environment:

(dynlib_example) ~/devel/dynlib_example$ ../venvstacks/misc/find_shared_libs.py .
lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-ff651d7f.so
lib/python3.12/site-packages/torchaudio/lib/_torchaudio_sox.so
lib/python3.12/site-packages/torchaudio/lib/_torchaudio.so
lib/python3.12/site-packages/torchaudio/lib/pybind11_prefixctc.so
lib/python3.12/site-packages/torchaudio/lib/libtorchaudio_sox.so
lib/python3.12/site-packages/torchaudio/lib/libctc_prefix_decoder.so
lib/python3.12/site-packages/torchaudio/lib/libtorchaudio.so
lib/python3.12/site-packages/pillow.libs/libopenjp2-05423b53.so
lib/python3.12/site-packages/torio/lib/_torio_ffmpeg6.so
lib/python3.12/site-packages/torio/lib/_torio_ffmpeg5.so
lib/python3.12/site-packages/torio/lib/_torio_ffmpeg4.so
lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg4.so
lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg6.so
lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg5.so
lib/python3.12/site-packages/nvidia/cuda_cupti/lib/libnvperf_target.so
lib/python3.12/site-packages/nvidia/cuda_cupti/lib/libpcsamplingutil.so
lib/python3.12/site-packages/nvidia/cuda_cupti/lib/libcheckpoint.so
lib/python3.12/site-packages/nvidia/cuda_cupti/lib/libnvperf_host.so
lib/python3.12/site-packages/triton/_C/libproton.so
lib/python3.12/site-packages/triton/_C/libtriton.so
lib/python3.12/site-packages/torchvision/image.so
lib/python3.12/site-packages/torchvision/_C.so
lib/python3.12/site-packages/torch/lib/libc10_cuda.so
lib/python3.12/site-packages/torch/lib/libtorch_cuda.so
lib/python3.12/site-packages/torch/lib/libtorch_python.so
lib/python3.12/site-packages/torch/lib/libtorch.so
lib/python3.12/site-packages/torch/lib/libcaffe2_nvrtc.so
lib/python3.12/site-packages/torch/lib/libtorch_cuda_linalg.so
lib/python3.12/site-packages/torch/lib/libshm.so
lib/python3.12/site-packages/torch/lib/libtorch_cpu.so
lib/python3.12/site-packages/torch/lib/libc10.so
lib/python3.12/site-packages/torch/lib/libtorch_global_deps.so

The torchvision case highlights the need for a library symlink exclusion mechanism in the layer specification syntax: the _C.so file is loaded via an explicit library loading call (relative to the Python file), so it shouldn't be symlinked into the dynamic library loading location. The generically named image.so shared library in that case also serves as an example of a case where it may necessary
to resolve shared object naming conflicts between packages that are installed into the same layer (the initial proposal is to have
naming conflicts trigger a fatal build error for that environment, with the exclusion mechanism then being used to pick which one gets linked).

Wrapping the Python runtime invocation

Both Linux and macOS should support the -a option to exec that allows execution of the correct Python binary while having sys.executable point at the wrapper script:

(dynlib_example) ~/devel/dynlib_example$ cat pyexec.sh
#!/bin/sh
exec -a "$PWD/pyexec.sh" bin/python3 "$@"
(dynlib_example) ~/devel/dynlib_example$ ./pyexec.sh -c "import sys; print(sys.executable)"
/home/acoghlan/devel/dynlib_example/pyexec.sh

The real script will do the full required "get the absolute path to this running script" dance rather than using $PWD, but this short snippet still illustrates the general approach needed to ensure invoked Python subprocesses still get the library path environment variable adjustments even if the parent process environment isn't passed to the subprocess (to avoid an ever growing environment variable, the environment variable adjustments will need to check that the directory of interest isn't already present).

For Linux, the search path environment variable to adjust is LD_LIBRARY_PATH, while on macOS it is DYLD_LIBRARY_PATH .

The text was updated successfully, but these errors were encountered:

ncoghlan added Category: Bug Something isn't working Category: Enhancement New feature or request Affects: Spec Format Affect the stack specification format labels Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow shared object loading across layers #38

Allow shared object loading across layers #38

ncoghlan commented Oct 23, 2024 •

edited

Loading

Allow shared object loading across layers #38

Allow shared object loading across layers #38

Comments

ncoghlan commented Oct 23, 2024 • edited Loading

Background

Finding shared objects to symlink

Wrapping the Python runtime invocation

ncoghlan commented Oct 23, 2024 •

edited

Loading