-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support external linkage in "sysimages" #44527
Conversation
c7fc1a2
to
4698e4b
Compare
7e285eb
to
22124da
Compare
Superficially and IIRC, this seems related to Jeff's separate compilation idea gist. Have you seen it? |
Yes, at one point, though if you have a reference it would be helpful (slack ate my reference to it and I'm not seeing it in https://gist.github.com/JeffBezanson). |
Here it is, courtesy of jar: https://gist.github.com/JeffBezanson/dd86043ef867954bd7e2163ab66f8b4e |
22124da
to
f8fda3d
Compare
820d981
to
375fe69
Compare
375fe69
to
2e1d18a
Compare
2e1d18a
to
212c934
Compare
cb392b4
to
92ebe50
Compare
This comment was marked as outdated.
This comment was marked as outdated.
c0fd7a2
to
8595aea
Compare
ba5b2b2
to
636a945
Compare
4b17542
to
0a11b40
Compare
jl_fielddescdyn_t * desc = | ||
(jl_fielddescdyn_t *) ((char *)layout + sizeof(*layout)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with code elsewhere perhaps
jl_fielddescdyn_t * desc = | |
(jl_fielddescdyn_t *) ((char *)layout + sizeof(*layout)); | |
const jl_datatype_layout_t *layout = dt->layout; | |
jl_fielddescdyn_t * desc = (jl_fielddescdyn_t*)jl_dt_layout_fields(layout); |
and then the same in jl_new_foreign_type
?
Yet another option would be a dedicated access macro...
Of course this is very unimportant, just wanted to mention it
Quick heads-up: I suspect that this didn't trigger the nanosoldier since the reference is part of the code block. |
@nanosoldier |
Looks like nanosoldier didn't start either? |
There might be something wrong, because I saw the nanosoldier tab in the checks list. @maleadt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
80f7706
to
a397a1d
Compare
Note to self:
Should either be backported to 1.9 or need to be undone during the backport. |
This unifies two serializers, `dump.c` (used for packages) and `staticdata.c` (used for system images). It adopts the `staticdata` strategy, adding support for external linkage, uniquing of MethodInstances & types, method extensions, external specializations, and invalidation. This lays the groundwork for native code caching as done with system images. Co-authored-by: Valentin Churavy <[email protected]> Co-authored-by: Jameson Nash <[email protected]> Co-authored-by: Tim Holy <[email protected]>
Co-authored-by: Tim Holy <[email protected]> Co-authored-by: Max Horn <[email protected]>
a397a1d
to
e06a591
Compare
Wooohooooo!!!!!! |
🍾 |
🥳 🍻 🍾 |
🎆 |
Overview
This aims to establish the foundation needed to save native code in "package images." From a technical standpoint, all it does is replace our current mix of serializers (
dump.c
for*.ji
files,staticdata.c
for system images) with a unified serializer based onstaticdata.c
. Necessary functionality like uniquing types and MethodInstances, supporting external CodeInstances and new method roots, and load-time invalidations are now supported bystaticdata.c
. A key feature is the ability to link externally: the serialization format defines a tag for an external object, which is linked after loading by pointer relocation.The core system image and all "package images" are stored as contiguous blobs in memory, each identified by a pair of
*begin, *end
pointers stored injl_linkage_blobs
. Each of these images is identified by the modulebuild_id
, and this value is used in encoding external references. For individual objects, "ownership" is decided by pointer address, i.e., which pointer pair encloses the given object. Some objects, like external MethodInstances (new specializations of callables owned by other packages), are deliberated created with pointer addresses that fall out of these ranges to ensure that they go through the uniquing pipeline.Compared to the original version of this PR, we've temporarily stripped all the work devoted to native code support. That will be restored in a future PR expected to land shortly after this merges. The goal here is to transition to one-serializer-to-rule-them-all without breaking Julia. It completes the goal of the original PR and items 2 & 4 listed in the "Future work (TODOs)". Item 3 is already merged to master, so overall it has become far more ambitious than its original scope despite having taken a step backwards with respect to native code.
This has become a 3-way collaboration (@vchuravy, @vtjnash, and @timholy), with the recent participation of @vtjnash having added enormously to our progress.
For reference, the original version of this post is included below, but note that several points no longer apply.
Overview (original)
This pull request is the first fruit of a tight collaboration between @vchuravy and @timholy. Our hope is that this is the inaugural PR in a series whose ultimate goal is to allow packages to save & reuse their precompiled native code. A second outcome might be to enable (or contribute to enabling) StaticCompiler.jl to be implemented with few external dependencies.
The journey is long, and this first pull request is intended to have no user-visible consequences (neither good nor bad). But we believe it establishes many of the necessary fundamentals.
Background
Caching code requires serialization and deserialization. Julia has several code (de)serializers, but here our focus is on two, the one in
dump.c
and the one instaticdata.c
.dump.c
writes the.ji
files that we currently use for packages and an intermediate stage of building Julia, whereasstaticdata.c
creates the object files (.so
on Linux) that serve as Julia's system image.dump.c
can (now, after #43990) save almost all supportable objects except native code. Conversely,staticdata.c
can save native code and has a more streamlined design, but currently is only useful for writing monolithic system images. Part of the ultimate goal of the series of PRs is to blend the best of both (de)serializers together.In addition, substantial changes will be required in Julia's codegen/LLVM infrastructure. The core issue is that caching native code across multiple files is a lot like building a C application from a bunch of separate
.o
files: you need a linking step to get them to work together. Currently, Julia has no real mechanisms to perform this linking. Interaction among packages has to persist after the LLVM modules used to assemble them have been discarded.Details
The only way to engage the functionality here is to launch Julia with command-line arguments
which is not a supported combination on
master
. Hence this will not be used in package precompilation until we switchbase/loading.jl
to issue this combination of command-line arguments. Consequently, we can develop the required infrastructure, get all the pieces working, and then turn it on for regular usage.This first PR aims to allow system-image-like blobs ("package images"?) to encode external references and perform most of the necessary linking to connect them. It consists of a new "tagged" serialization enum,
ExternalLinkage
, used to encode these links. It also includes much of the LLVM functionality needed for successful linkage, with one major exception described below.Fundamentally, cross-references among Julia internal structures are made by pointers. External linkage is therefore achieved by pointer relocation. During serialization, package contents are copied into a single contiguous blob of memory. To make pointers relocatable, an external reference is decomposed into two pieces:
build_id
of the toplevel module in the precompilationworklist
An introduction to details of the (de)serialization mechanisms used in
staticdata.c
can be found in the extensive comment at the top of the file.What this does
This successfully serializes and deserializes external links, and implements much of the functionality needed for "partial" LLVM modules. In particular, we support:
It also introduces a "stub" implementation of new standard library,
LLD_jll
, used for performing some of the linkage.What this doesn't do
Currently, the decision about whether a reference is internal vs external is deliberately over-simplified, and arrives at the wrong answer in important cases, such as when PkgB triggers novel specialization of a method in PkgA. Because of some challenges involving exported names and/or the need for a trampoline, it also duplicates native code in the
.text
section of the object file (written by LLVM) rather than linking to a unique implementation.Future work (TODOs)
We expect this PR to be followed by at least four more PRs:
dump.c
tostaticdata.c
(adding methods to external functions, compiling novel specializations of external methods, uniquing compilations of the sameMethodInstance
by multiple downstream packages, managing backedges and invalidation, etc.)LLD_jll
a "real" standard libraryWe welcome participation by others in these future developments.
Future prospects
If all this works as well as we hope, we expect to see dramatic decreases in latency for precompiled workloads. Indeed, in favorable cases with little or no invalidation, compilation time may be nearly eliminated.
In such cases, the majority of Julia's remaining latency problem will be due to package loading. We do not expect this sequence of PRs to make load times worse, but despite efficiencies in the
staticdata.c
representation we also don't expect there to be much improvement: raw deserialization is likely to become faster, but on current master it is already dominated by the cost of method insertion and invalidation, and that won't change in this sequence of PRs. In the future, there are a number of possible ways to improve load times (perhaps dramatically), but we plan to get this whole series merged first before even beginning to contemplate tackling load times.Ideal schedule
We're well aware that important work remains to finalize Julia 1.8, and that work should take priority. However, once that ramps down it would be great to get this reviewed and merged fairly early in the 1.9 cycle. There's a long ways yet to go, and we'll need time if we are to get the entire sequence merged for 1.9.