Allow shared object loading across layers #38
Labels
Affects: Spec Format
Affect the stack specification format
Category: Bug
Something isn't working
Category: Enhancement
New feature or request
Environment stacks on platforms other than Windows currently don't correctly support shared object (aka dynamic library) loading across different layers (Windows is different due to its reliance on os.add_dll_directory even within a single virtual environment).
It should be possible to resolve this limitation by:
share/venv/dynlib
folder within each non-Windows environment layer which contains symlinks to all of the shared objects found under thesite-packages
directory that aren't specifically marked as being Python extension modulesexec -a
to invoke the underlying base Python runtime while still havingsys.executable
refer to the wrapper script inside the virtual environmentAdditional implementation notes:
finding shared objects in a directory tree, while excluding files that specifically match the suffixes defined for Python
binary extension modules. Note that these extensions are version dependent, so any scan should be executed with the
layered environment's base Python runtime, not with the Python version that happens to be running
venvstacks
itself.environment. This avoids some potential future problems if it is ever decided to move beyond the strict three-tier layering and
instead allow applications to depend on other applications and frameworks to depend on other frameworks
(as Allow framework layers to form a directed acyclic graph #18 proposes for framework layers)
a library symlink exclusion mechanism will need to be defined (attempting to symlink multiple
dynamic libraries with the same name in the same layer will cause a fatal build error, with the
exclusion mechanism used to ensure at most one of the conflicting libraries gets symlinked)
(i.e. at the same time
sitecustomize.py
is generated)venv
implementations and versions can vary as to which of thepython
,python3
, andpython3.x
symlinks is the one that actually links to the base runtime environment, and which are just internal symlinks
within the virtual environment. Rather than making assumptions, the link replacement logic will be:
and a copy of the wrapper injected using their original name (executing the new name)
the wrapper scripts will put the paths they add after any existing entries
the order of listing in the layer spec determines the order of the shared folder lookup path priority
(using the same linearisation rules as
sys.path
, assuming Allow framework layers to form a directed acyclic graph #18 is implemented)(as the relative paths injected at library build time should suffice for that),
they're specifically for finding shared libraries published in lower layers
Background
Consider the following virtual environment with
pytorch
installed from PyPI:The
libtorch.so
extension module within that environment includes relative load paths for several potential nVidia dependencies:This works because those
nvidia
libraries are installed into the same virtual environment:In the context of
venvstacks
, this means thatpytorch
and the nVidia libraries must be installed as part of the same layer definition. Attempting to move the nVidia libaries lower in the stack (either to the base runtime layer, or to a separate framework layer if #18 is implemented) will fail, since the dynamic library loading will fail.This is a reasonably common pattern, and one of the main reasons folks point out that the Python environment layering pattern implemented by
venvstacks
doesn't work in the general case: whereas Python extension module DLLs on Windows are able to make themselves dynamically discoverable withos.add_dll_directory
, POSIX shared objects rely more heavily on relative paths that are fixed at module build time (and hence are only correct when the library and its dependencies are installed into the same target environment) and theLD_LIBRARY_PATH
(orDYLD_LIBRARY_PATH
on macOS) setting, which needs to be configured prior to application startup (it can't be manipulated at runtime the way the Windows DLL search path can be).If you're aware of the problem, it can be managed, but if you're not already aware of the possibility, the consequences of running into it can be utterly baffling to try and debug when all you have to work with is a cryptic shared object loading failure when Python attempts to import an extension module with a dynamically linked dependency that can't be resolved.
Finding shared objects to symlink
Simply searching for and symlinking all
.so
objects in a layered environment would result in a lot of pointless symlinks to Python binary extension modules that are only loaded directly after the interpreter finds them viasys.path
.https://github.com/lmstudio-ai/venvstacks/blob/main/misc/find_shared_libs.py proposes a better algorithm for that, which filters out the shared objects that specifically look like Python extension modules:
For this example environment:
The
torchvision
case highlights the need for a library symlink exclusion mechanism in the layer specification syntax: the_C.so
file is loaded via an explicit library loading call (relative to the Python file), so it shouldn't be symlinked into the dynamic library loading location. The generically namedimage.so
shared library in that case also serves as an example of a case where it may necessaryto resolve shared object naming conflicts between packages that are installed into the same layer (the initial proposal is to have
naming conflicts trigger a fatal build error for that environment, with the exclusion mechanism then being used to pick which one gets linked).
Wrapping the Python runtime invocation
Both Linux and macOS should support the
-a
option toexec
that allows execution of the correct Python binary while havingsys.executable
point at the wrapper script:The real script will do the full required "get the absolute path to this running script" dance rather than using
$PWD
, but this short snippet still illustrates the general approach needed to ensure invoked Python subprocesses still get the library path environment variable adjustments even if the parent process environment isn't passed to the subprocess (to avoid an ever growing environment variable, the environment variable adjustments will need to check that the directory of interest isn't already present).For Linux, the search path environment variable to adjust is
LD_LIBRARY_PATH
, while on macOS it isDYLD_LIBRARY_PATH
.The text was updated successfully, but these errors were encountered: