Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support external linkage in "sysimages" #44527

Merged
merged 2 commits into from
Nov 30, 2022
Merged

Conversation

timholy
Copy link
Member

@timholy timholy commented Mar 8, 2022

Overview

This aims to establish the foundation needed to save native code in "package images." From a technical standpoint, all it does is replace our current mix of serializers (dump.c for *.ji files, staticdata.c for system images) with a unified serializer based on staticdata.c. Necessary functionality like uniquing types and MethodInstances, supporting external CodeInstances and new method roots, and load-time invalidations are now supported by staticdata.c. A key feature is the ability to link externally: the serialization format defines a tag for an external object, which is linked after loading by pointer relocation.

The core system image and all "package images" are stored as contiguous blobs in memory, each identified by a pair of *begin, *end pointers stored in jl_linkage_blobs. Each of these images is identified by the module build_id, and this value is used in encoding external references. For individual objects, "ownership" is decided by pointer address, i.e., which pointer pair encloses the given object. Some objects, like external MethodInstances (new specializations of callables owned by other packages), are deliberated created with pointer addresses that fall out of these ranges to ensure that they go through the uniquing pipeline.

Compared to the original version of this PR, we've temporarily stripped all the work devoted to native code support. That will be restored in a future PR expected to land shortly after this merges. The goal here is to transition to one-serializer-to-rule-them-all without breaking Julia. It completes the goal of the original PR and items 2 & 4 listed in the "Future work (TODOs)". Item 3 is already merged to master, so overall it has become far more ambitious than its original scope despite having taken a step backwards with respect to native code.

This has become a 3-way collaboration (@vchuravy, @vtjnash, and @timholy), with the recent participation of @vtjnash having added enormously to our progress.

For reference, the original version of this post is included below, but note that several points no longer apply.


Overview (original)

This pull request is the first fruit of a tight collaboration between @vchuravy and @timholy. Our hope is that this is the inaugural PR in a series whose ultimate goal is to allow packages to save & reuse their precompiled native code. A second outcome might be to enable (or contribute to enabling) StaticCompiler.jl to be implemented with few external dependencies.

The journey is long, and this first pull request is intended to have no user-visible consequences (neither good nor bad). But we believe it establishes many of the necessary fundamentals.

Background

Caching code requires serialization and deserialization. Julia has several code (de)serializers, but here our focus is on two, the one in dump.c and the one in staticdata.c. dump.c writes the .ji files that we currently use for packages and an intermediate stage of building Julia, whereas staticdata.c creates the object files (.so on Linux) that serve as Julia's system image. dump.c can (now, after #43990) save almost all supportable objects except native code. Conversely, staticdata.c can save native code and has a more streamlined design, but currently is only useful for writing monolithic system images. Part of the ultimate goal of the series of PRs is to blend the best of both (de)serializers together.

In addition, substantial changes will be required in Julia's codegen/LLVM infrastructure. The core issue is that caching native code across multiple files is a lot like building a C application from a bunch of separate .o files: you need a linking step to get them to work together. Currently, Julia has no real mechanisms to perform this linking. Interaction among packages has to persist after the LLVM modules used to assemble them have been discarded.

Details

The only way to engage the functionality here is to launch Julia with command-line arguments

--output-o $output --output-incremental=yes

which is not a supported combination on master. Hence this will not be used in package precompilation until we switch base/loading.jl to issue this combination of command-line arguments. Consequently, we can develop the required infrastructure, get all the pieces working, and then turn it on for regular usage.

This first PR aims to allow system-image-like blobs ("package images"?) to encode external references and perform most of the necessary linking to connect them. It consists of a new "tagged" serialization enum, ExternalLinkage, used to encode these links. It also includes much of the LLVM functionality needed for successful linkage, with one major exception described below.

Fundamentally, cross-references among Julia internal structures are made by pointers. External linkage is therefore achieved by pointer relocation. During serialization, package contents are copied into a single contiguous blob of memory. To make pointers relocatable, an external reference is decomposed into two pieces:

  • to encode "which blob are we linking against?", we use the build_id of the toplevel module in the precompilation worklist
  • within a blob, identity is determined by the offset from the blob's base pointer.

An introduction to details of the (de)serialization mechanisms used in staticdata.c can be found in the extensive comment at the top of the file.

What this does

This successfully serializes and deserializes external links, and implements much of the functionality needed for "partial" LLVM modules. In particular, we support:

  • saving lowered, type-inferred, and native code for methods defined in the package
  • accessing global variables from the same package, even from compiled code
  • calling compiled functions in other package images (partial, see below)
  • accessing global variables defined in other package images, even from compiled code

It also introduces a "stub" implementation of new standard library, LLD_jll, used for performing some of the linkage.

What this doesn't do

Currently, the decision about whether a reference is internal vs external is deliberately over-simplified, and arrives at the wrong answer in important cases, such as when PkgB triggers novel specialization of a method in PkgA. Because of some challenges involving exported names and/or the need for a trampoline, it also duplicates native code in the .text section of the object file (written by LLVM) rather than linking to a unique implementation.

Future work (TODOs)

We expect this PR to be followed by at least four more PRs:

  1. one that implements de-duplication of the native code (likely via implementation of a trampoline)
  2. one that expands/migrates functionality from dump.c to staticdata.c (adding methods to external functions, compiling novel specializations of external methods, uniquing compilations of the same MethodInstance by multiple downstream packages, managing backedges and invalidation, etc.)
  3. one that makes LLD_jll a "real" standard library
  4. one that makes this the default (or only) mechanism for precompiling packages

We welcome participation by others in these future developments.

Future prospects

If all this works as well as we hope, we expect to see dramatic decreases in latency for precompiled workloads. Indeed, in favorable cases with little or no invalidation, compilation time may be nearly eliminated.

In such cases, the majority of Julia's remaining latency problem will be due to package loading. We do not expect this sequence of PRs to make load times worse, but despite efficiencies in the staticdata.c representation we also don't expect there to be much improvement: raw deserialization is likely to become faster, but on current master it is already dominated by the cost of method insertion and invalidation, and that won't change in this sequence of PRs. In the future, there are a number of possible ways to improve load times (perhaps dramatically), but we plan to get this whole series merged first before even beginning to contemplate tackling load times.

Ideal schedule

We're well aware that important work remains to finalize Julia 1.8, and that work should take priority. However, once that ramps down it would be great to get this reviewed and merged fairly early in the 1.9 cycle. There's a long ways yet to go, and we'll need time if we are to get the entire sequence merged for 1.9.

@timholy timholy force-pushed the teh-vc/serialize_partial branch from c7fc1a2 to 4698e4b Compare March 8, 2022 22:21
@jpsamaroo jpsamaroo added compiler:latency Compiler latency compiler:precompilation Precompilation of modules labels Mar 9, 2022
@timholy timholy force-pushed the teh-vc/serialize_partial branch 2 times, most recently from 7e285eb to 22124da Compare March 9, 2022 10:29
@AriMKatz
Copy link

AriMKatz commented Mar 22, 2022

Superficially and IIRC, this seems related to Jeff's separate compilation idea gist. Have you seen it?

@timholy
Copy link
Member Author

timholy commented Mar 23, 2022

Superficially and IIRC, this seems related to Jeff's separate compilation idea gist. Have you seen it?

Yes, at one point, though if you have a reference it would be helpful (slack ate my reference to it and I'm not seeing it in https://gist.github.com/JeffBezanson).

@AriMKatz
Copy link

deps/llvm.mk Show resolved Hide resolved
@vchuravy vchuravy force-pushed the teh-vc/serialize_partial branch 2 times, most recently from cb392b4 to 92ebe50 Compare October 3, 2022 18:37
deps/Makefile Outdated Show resolved Hide resolved
@vchuravy

This comment was marked as outdated.

@timholy timholy force-pushed the teh-vc/serialize_partial branch 2 times, most recently from c0fd7a2 to 8595aea Compare October 4, 2022 19:20
@timholy timholy force-pushed the teh-vc/serialize_partial branch from ba5b2b2 to 636a945 Compare October 16, 2022 10:08
src/staticdata.c Outdated Show resolved Hide resolved
@timholy timholy force-pushed the teh-vc/serialize_partial branch 2 times, most recently from 4b17542 to 0a11b40 Compare October 16, 2022 14:13
Comment on lines +836 to +837
jl_fielddescdyn_t * desc =
(jl_fielddescdyn_t *) ((char *)layout + sizeof(*layout));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with code elsewhere perhaps

Suggested change
jl_fielddescdyn_t * desc =
(jl_fielddescdyn_t *) ((char *)layout + sizeof(*layout));
const jl_datatype_layout_t *layout = dt->layout;
jl_fielddescdyn_t * desc = (jl_fielddescdyn_t*)jl_dt_layout_fields(layout);

and then the same in jl_new_foreign_type ?

Yet another option would be a dedicated access macro...

Of course this is very unimportant, just wanted to mention it

@lassepe
Copy link
Contributor

lassepe commented Nov 28, 2022

@nanosoldier runtests()

Quick heads-up: I suspect that this didn't trigger the nanosoldier since the reference is part of the code block.

@gbaraldi
Copy link
Member

@nanosoldier runtests()

@vchuravy
Copy link
Member

Looks like nanosoldier didn't start either?

@gbaraldi
Copy link
Member

There might be something wrong, because I saw the nanosoldier tab in the checks list. @maleadt

Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maleadt
Copy link
Member

maleadt commented Nov 29, 2022

There might be something wrong, because I saw the nanosoldier tab in the checks list. @maleadt

It is running, the status is hidden because you pushed another commit afterwards. If you click on the mark next to c591e39 above, you can see the PkgEval status.

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@vchuravy vchuravy force-pushed the teh-vc/serialize_partial branch from 80f7706 to a397a1d Compare November 29, 2022 18:20
@vchuravy
Copy link
Member

Note to self:

Should either be backported to 1.9 or need to be undone during the backport.

@timholy
Copy link
Member Author

timholy commented Nov 29, 2022

Personally I'd favor backporting #44478 but not #46825.

If we do that, #46825 will make merging this a bit awkward. My suggestion is that we have a separate commit to port this branch to #46825, and then omit that commit when we backport to 1.9.

timholy and others added 2 commits November 29, 2022 15:56
This unifies two serializers, `dump.c` (used for packages)
and `staticdata.c` (used for system images). It adopts the
`staticdata` strategy, adding support for external linkage,
uniquing of MethodInstances & types, method extensions,
external specializations, and invalidation. This lays the
groundwork for native code caching as done with system images.

Co-authored-by: Valentin Churavy <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Tim Holy <[email protected]>
@vchuravy vchuravy force-pushed the teh-vc/serialize_partial branch from a397a1d to e06a591 Compare November 29, 2022 20:56
@vchuravy vchuravy merged commit 70bda2c into master Nov 30, 2022
@vchuravy vchuravy deleted the teh-vc/serialize_partial branch November 30, 2022 01:44
@timholy
Copy link
Member Author

timholy commented Nov 30, 2022

Wooohooooo!!!!!!

@timholy
Copy link
Member Author

timholy commented Nov 30, 2022

🍾

@gbaraldi
Copy link
Member

🥳 🍻 🍾

@oschulz
Copy link
Contributor

oschulz commented Nov 30, 2022

🎆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency compiler:precompilation Precompilation of modules
Projects
None yet
Development

Successfully merging this pull request may close these issues.