Goal: Make julia a more first-class unix language by making it possible to write separately-compiled shared libraries just like C++.
This is currently possible by building and linking against a system image, but you can only load one per process. Instead, it should be possible to build multiple shared libraries from julia source, have them share libjulia, and also allow them to link to each other.
- Build libraries to be used from C, C++, Python, and other languages, without thinking about what language they're written in (except that they will depend on libjulia). For example imagine a Python program using julia's DiffEq libraries, but also a BLAS written in julia, julia's special functions library, etc. at once, via separate build artifacts.
- Write larger applications and systems consisting of loosely-coupled components that
can be compiled in parallel via
make
. - Factor more of the julia compiler and run-time system into separate libraries. Example 1: Parser. Switch between the legacy flisp parser and the fancier one written in julia by linking to a different library. Example 2: Type inference and optimizer (Core.Compiler). Could compile it to a native library instead of using a pinned world age.
Comparing to C++ is useful, since julia has the potential to handle separate compilation in a similar way. In this model, all julia source files are like C++ header-only libraries. Julia load/include time is like C++ compile time. Once the output object file is generated, no new method definitions are be visible to the library code, just as no new template instantiations can be made via the C++ ABI.
The basic idea is that each compilation unit corresponds to our existing system image concept, and we "just" need to make it possible to load multiple system images in the same process (hereafter referred to as libraries, compilation units, or units).
Each unit contains a global context object that identifies it to the run-time system. Several items that are currently global variables in libjulia will move into this object. Happily, a pointer to this context is really just a generalization of the world age mechanism: anywhere we currently pass a world age, we will pass a context object instead.
The only other new primitive needed for a basic version of this feature is
invoke_in(context, f, args...)
, similar to invokelatest
.
With that, you can write or generate a "header file" that is just a julia source
file containing small wrapper functions around invoke_in
.
Here is an example header file (all syntax is provisional):
const Parser = @unit "libjulia_parser"
parse(args...) = invoke_in(Parser, Parser.parse, args...)::Expr
Note this is very similar to implementing julia bindings for native libraries,
and indeed this could use ccall
if the called function does not use the julia
run-time at all.
Here Parser
is a Module, and I'm imagining that a Module has a reference to
the context it belongs to, so passing a Module to invoke_in
is sufficient.
But the context could also be exposed as a separate first-class object.
The ::Expr
type declaration here of course is not strictly correct, but I
add it to point out that we will typically want type assertions on these calls.
In most cases header files for use from julia will probably be generated by a macro.
It is clear that the lowest-level features of the run-time system can be shared by all units: builtin functions, method lookup, GC, and the compiler and interpreter. Units can share object references, so they need a common GC heap, and can also schedule tasks to a common scheduler.
However, some subsystems that should be shared are currently implemented inside Base. Those will have to be factored out, since in general each unit will want to include a copy of (most of) Base. This can be thought of as separating the "template libraries" in Base (funcions that benefit from extension) from the separately-compilable code. In most cases the shared components can be identified by looking for places in the libjulia C/C++ code where we call into julia functions in jl_base_module.
The components implemented in Base that need to be shared are:
-
Task scheduler This contains global queues, and the run-time expects only one copy of those queues. The minimal scheduler code (Task, schedule, wait, poptask) should be moved.
-
Code loading (loading.jl)
Ideally, those would be compiled in to libjulia itself. It should also work to move them into Core, and compile that to its own unit.
There are some other components we may want to explore separating from Base in the future:
- Libuv I/O and external processes
- Possibly some reflection functions
Base also includes some global state that is used e.g. by interactive front-ends, and we will have to figure out where to move most of it.
We currently save certain mandatory objects (e.g. jl_any_type
) to every system image,
and assume that a system image contains them.
Instead, it needs to be possible to save only what is different about all the "user"
code currently loaded.
There are several ways to do that.
We should probably build a libjulia-core image that contains boot.jl plus those
builtin objects, and consider it essentially part of libjulia.
Each unit needs its own methods for everything.
This is currently not possible for constructors, in particular, leading to some ugly
if nameof(@__MODULE__) === :Base
checks to avoid conflicts.
For example, Vector{T}(::UndefInitializer, m::Integer)
calls convert(Int, m)
,
but that can only call Base.convert
in the current unit.
A different unit can have a different Base.convert
. For example, a unit
might want to use convert
methods without InexactError checks.
A few types in Base are specially known to the run-time system; e.g. Complex
gets a special ABI on x86. That needs to be generalized, so that any type definition
can be marked as using that ABI.
IdDict
and BigInt
are also special. It is not clear how to handle those, but some
kind of registration mechanism could work, e.g. you say "this is an IdDict type"
next to the definition.
Base has various global state that assumes julia is in control of the process, and/or that there is interactive use. Some of these are harmless, since multiple instances of Base can in some respects act like multiple concurrent "sessions", but it would still be nice to make it easier for stand-alone libraries to exclude that sort of code.
Here is a list of most of the global state in Base, with some mitigation ideas.
- setting OPENBLAS-related env vars --- move to BLAS.jl?
- Libc.srand() --- move to julia_init?
- Multimedia display stack
- LOAD_PATH, DEPOT_PATH, ACTIVE_PROJECT
- DL_LOAD_PATH --- move to Core?
- PROGRAM_FILE --- move to Core?
- active_repl --- generalize ^C interface
- creating_sysimg --- check output flags instead?
- have_color, is_interactive --- maybe not a problem
- library_threading_enabled
- stdin, stdout, stderr --- move to Core, or separate I/O unit?
- update_stackframes_callback
- ARGS
- atexit_hooks
- libblas_name, liblapack_name --- move to appropriate package
- methodloc_callback
- repl_hooks
- current logger: via task-local state, a logger could be specified in a different unit
- code coverage recognizes Base as "non-user" code; needs to be generalized
This is best thought of as an entirely new feature, orthogonal to the existing package mechanism. This is not intended as a general solution to caching native code for packages, since the entire point of most packages is to be composed with other packages at run time, and so they are not amenable to separate compilation. However, in a small number of cases a package might be self-contained enough to be separately compiled. Such a package could potentially be distributed via BinaryBuilder like other native libraries. An example candidate is SpecialFunctions.jl. It mostly accepts and returns Float64s, so its interface is easily described by the C ABI. However, it also throws exceptions, might print warning messages, and might want to allocate objects as part of the implementation of some functions. So, it needs to be possible to load a shared library containing arbitrary julia code, by linking to those entry points in libjulia.
This feature is orthogonal to the JIT --- it makes sense either with or without it. By default, we will still use the JIT, and we will still be able to generate new specializations of code in separately-compiled libraries. We could even potentially inline code from one library into another. The only difference is that everything in a given library happens within its isolated context. So, for example, new specializations are less likely to be useful since code in a library will tend to throw method errors when novel types are passed to it.
JuliaLang/julia#49586 proposes another mutable global dict of font faces.