-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache external CodeInstances #43990
Cache external CodeInstances #43990
Conversation
The regression for Plots.jl looks quite serious. Have you looked into that? |
Is the header wrong? The current header does not make me think it's a clear win, almost the opposite. Time to finish task presumably includes loading the package as well, and all but one package takes longer to load, some by quite a lot as well. I can imagine that many Julia sessions include loading several packages that aren't used, and an increase in load time might then hurt more than the saving in TTFT. |
This That's a pretty clear win @baggepinnen, no? |
Not yet. It's fresh off the bus, there might indeed be some stuff that can be made better. |
Here's a really fun one: with a package module LV
export filter2davx
using LoopVectorization
function filter2davx!(out::AbstractMatrix, A::AbstractMatrix, kern::AbstractMatrix)
@turbo for J in CartesianIndices(out)
tmp = zero(eltype(out))
for I ∈ CartesianIndices(kern)
tmp += A[I + J] * kern[I]
end
out[J] = tmp
end
out
end
function filter2davx(A::AbstractMatrix, kern::AbstractMatrix)
out = similar(A, size(A) .- size(kern) .+ 1)
return filter2davx!(out, A, kern)
end
# precompilation
A = rand(Float64, 512, 512)
kern = [0.1 0.3 0.1;
0.3 0.5 0.3;
0.1 0.3 0.1];
filter2davx(A, kern)
end and task (this is what the other tasks look like) @time using LV
A = rand(Float64, 512, 512)
kern = [0.1 0.3 0.1;
0.3 0.5 0.3;
0.1 0.3 0.1];
tstart = time(); filter2davx(A, kern); println(time() - tstart) # avoid @time to prevent compilation of the task we get for nightly julia> include("tasks_LV.jl")
2.463157 seconds (6.24 M allocations: 408.848 MiB, 3.33% gc time, 18.92% compilation time)
8.572684049606323 and for this PR julia> include("tasks_LV.jl")
2.932197 seconds (6.09 M allocations: 408.457 MiB, 1.98% gc time, 18.04% compilation time)
0.23031187057495117 |
I dug into the Plots regression a bit. For the loading, it can almost entirely be explained by the time to load My guess is we can prevent the compilation by doing a little more precompilation in, e.g., |
Just want to add a datapoint where this PR does really well: If have a branch of ACME laying around where I've added some more EDIT: Darn, I did |
That's not good, I presume the loading time of |
@giordano, I've finally got most of them stomped (locally), but the annoying bit is compilations like these: ...
structdiff(NamedTuple{(:validate_strict, :libc, :call_abi, :libgfortran_version, :cxxstring_abi, :libstdcxx_version, :os_version), Tuple{Bool, Nothing, Nothing, Nothing, String, Nothing, Nothing}}, Type{NamedTuple{(:validate_strict, :compare_strategies), T} where T<:Tuple}) from structdiff(NamedTuple{an, T} where T<:Tuple, Union{Type{NamedTuple{bn, T} where T<:Tuple}, NamedTuple{bn, T} where T<:Tuple}) where {an, bn}
from module Base
pairs(NamedTuple{(:libc, :cxxstring_abi), Tuple{String, String}}) from pairs(NamedTuple{names, T} where T<:Tuple where names)
from module Base.Iterators
_pairs_elt(Base.Pairs{Symbol, String, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:libc, :cxxstring_abi, :call_abi), Tuple{String, String, String}}}, Symbol) from _pairs_elt(Base.Pairs{K, V, I, A} where A where I, Any) where {K, V}
from module Base.Iterators
Type##kw(NamedTuple{(:cxxstring_abi,), Tuple{String}}, Type{Base.BinaryPlatforms.Platform}, String, String) from Type##kw(Any, Type{Base.BinaryPlatforms.Platform}, String, String)
from module Base.BinaryPlatforms
... I've added a block at the end of Platform("x86_64", "linux"; validate_strict=true, libc="glibc")
for nt in (
(libc="glibc", cxxstring_abi=""),
(libc="glibc", cxxstring_abi="", call_abi=""),
( libc=nothing, call_abi=nothing, libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
( libc="glibc", call_abi=nothing, libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
( libc="glibc", call_abi="", libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
( libc=nothing, call_abi=nothing, libgfortran_version=nothing, cxxstring_abi="cxx11", libstdcxx_version=nothing, os_version=nothing),
( libc="glibc", call_abi="", libgfortran_version=nothing, cxxstring_abi="cxx11", libstdcxx_version=nothing, os_version=nothing),
(validate_strict=false, libc=nothing, call_abi=nothing, libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
(validate_strict=false, libc="glibc", call_abi=nothing, libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
(validate_strict=false, libc="glibc", call_abi="", libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
(validate_strict=false, libc=nothing, call_abi=nothing, libgfortran_version=nothing, cxxstring_abi="cxx11", libstdcxx_version=nothing, os_version=nothing),
(validate_strict=false, libc="glibc", call_abi="", libgfortran_version=nothing, cxxstring_abi="cxx11", libstdcxx_version=nothing, os_version=nothing),
)
pairs(nt)
iterate(pairs(nt))
iterate(pairs(nt), haskey(nt, :libgfortran_version) ? (nt.libgfortran_version === nothing)+1 : 1)
merge(nt, NamedTuple())
merge(nt, Pair{Symbol,String}[])
Base.structdiff(nt, (validate_strict=false, compare_strategies=Dict{String,Function}()))
Platform("x86_64", "linux"; nt...)
end
but it's annoying and so far isn't complete. Can you think of a better approach? |
Is there anyway that can be prevented from compiling at all? It seems unlikely to be useful to compile? |
Motivated by JuliaLang/julia#43990 (comment), but probably not a bad idea in its own right.
This is motivated by JuliaLang/julia#43990 (comment), but doesn't seem likely to hurt anything even if that never merges.
OK, with:
I've got the time to load |
I guess if we reordered the file order in |
This is motivated by JuliaLang/julia#43990 (comment), but doesn't seem likely to hurt anything even if that never merges.
That seems to have helped the gap in load times substantially (updated numbers posted above), although still not zero. I'm not sure we should expect it to be: if the packages are using precompilation, you're storing more stuff in the file, so loading might be expected to take longer. It's not really so simple since the load time is actually dominated by backedge insertion and other method-table perusals, but certainly it wouldn't be surprising for a load-time regression for some packages. |
Ok, except for a backwards-compatible shim that nothing calls, I just got rid of kwargs in the constructor for With this and JuliaPackaging/Preferences.jl#28 we're basically tied with master wrt the time to load GR_jll. |
With the latest changes,
so we're definitely at parity. The time to actually plot seems to vary a lot, and of course some packages might have been updated since I first started running this, but this comparison here uses the same Manifest file and were run moments apart from one another so the comparison between them is valid. |
And now with MakieOrg/Makie.jl#1634 we get
With those changes, the majority of Makie's inference time has been wiped out (~27s is not inference). |
I noticed one problem & fixed it (see the commit just above this). Assuming CI passes, I think this is ready to go? But in contrast with the two predecessor PRs, I don't feel comfortable merging this without a review from someone who knows more than me. |
This pull request reduced time-to-first-X for QuantumClifford from 11 seconds to 0.5 seconds. This is amazing! Thank you! Hopefully this is the correct venue to share this and I am not causing noise. |
Understood, but do keep in mind that typically many more people use a package than develop it, so there's a bit of a tradeoff between developer convenience and user convenience. You may be able to have both, though: it should be possible to set up a compile-time Preference that will control whether you precompile locally with generated code. E.g., you might be able to set a local environment variable That said, there are some things Julia could do better here. We recompile all dependents whenever a package changes, and perhaps we could be more selective (kind of like linking a bunch of .o files in a C compiler). But that's actually a pretty hard problem, so don't expect it to be solved anytime soon. |
I also recently thought about the issue about cost of precompilation for developers. Couldn't you choose whether or not to run precompilation code based on if the package is being checked out for development? Something like the following pseudo code if !is_developed(@__MODULE__)
include("precompile.jl")
end My initial experiments with this appeared to work well. My implementation of const __CONTROLSYSTEMS_SOURCE_DIR__ = dirname(Base.source_path())
is_developed() = !(occursin(joinpath(".julia", "dev"), __CONTROLSYSTEMS_SOURCE_DIR__)) |
That's a pretty clever approach. I'd want to shield that being a flag so that I could easily check the consequences of precompilation even if I have it devved, but there's a lot to like about your suggestion. |
Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations of methods from Base or previously-loaded packages were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred, as long as there is a backedge linking it back to a method owned by a module being precompiled. (The latter condition ensures it will actually be called by package methods, and not merely transiently generated for the purpose of, e.g., metaprogramming or variable initialization.) This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes JuliaLang#42016 Fixes JuliaLang#35972 Issue JuliaLang#35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations of methods from Base or previously-loaded packages were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred, as long as there is a backedge linking it back to a method owned by a module being precompiled. (The latter condition ensures it will actually be called by package methods, and not merely transiently generated for the purpose of, e.g., metaprogramming or variable initialization.) This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes #42016 Fixes #35972 Issue #35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation. (cherry picked from commit df81bf9)
Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations of methods from Base or previously-loaded packages were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred, as long as there is a backedge linking it back to a method owned by a module being precompiled. (The latter condition ensures it will actually be called by package methods, and not merely transiently generated for the purpose of, e.g., metaprogramming or variable initialization.) This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes JuliaLang#42016 Fixes JuliaLang#35972 Issue JuliaLang#35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
This fixes a separate bug in which #43990 should have checked that the method had not been deleted. Tests passed formerly simply because we weren't caching external CodeInstances that inferred down to a `Const`; fixing that exposed the bug. This bug has been exposed since merging #43990 for non-`Const` inference, and would affect Revise etc.
This allows a MethodInstance to store dispatch tuples as well as other MethodInstances among their backedges. The motivation is to properly handle invalidation with `invoke`, which allows calling a less-specific method and thus runs afoul of our standard mechanisms to detect "outdated" dispatch. Additionally: - provide backedge-iterators for both C and Julia that abstracts the specific storage mechanism. - fix a bug in the CodeInstance `relocatability` field, where methods that only return a constant (and hence store `nothing` for `inferred`) were deemed non-relocatable. - fix a bug in which #43990 should have checked that the method had not been deleted. Tests passed formerly simply because we weren't caching external CodeInstances that inferred down to a `Const`; fixing that exposed the bug. This bug has been exposed since merging #43990 for non-`Const` inference, and would affect Revise etc.
This allows a MethodInstance to store dispatch tuples as well as other MethodInstances among their backedges. The motivation is to properly handle invalidation with `invoke`, which allows calling a less-specific method and thus runs afoul of our standard mechanisms to detect "outdated" dispatch. Additionally: - provide backedge-iterators for both C and Julia that abstracts the specific storage mechanism. - fix a bug in the CodeInstance `relocatability` field, where methods that only return a constant (and hence store `nothing` for `inferred`) were deemed non-relocatable. - fix a bug in which #43990 should have checked that the method had not been deleted. Tests passed formerly simply because we weren't caching external CodeInstances that inferred down to a `Const`; fixing that exposed the bug. This bug has been exposed since merging #43990 for non-`Const` inference, and would affect Revise etc.
This fixes a long-standing issue with how we've handled `invoke` calls with respect to method invalidation. When we load a package, we need to ask whether a given MethodInstance would be compiled in the same way now (aka, in the user's running session) as when the package was precompiled; in practice, the way we do that is to test whether the dispatches would be to the same methods in the current world-age. `invoke` presents special challenges because it allows the coder to deliberately select a different method than the one that would be chosen by ordinary dispatch; if there is no record of how this choice was made, it can look like it resolves to the wrong method and this can trigger invalidation. This allows a MethodInstance to store dispatch tuples as well as other MethodInstances among their backedges. Additionally: - provide backedge-iterators for both C and Julia that abstracts the specific storage mechanism. - fix a bug in the CodeInstance `relocatability` field, where methods that only return a constant (and hence store `nothing` for `inferred`) were deemed non-relocatable. - fix a bug in which #43990 should have checked that the method had not been deleted. Tests passed formerly simply because we weren't caching external CodeInstances that inferred down to a `Const`; fixing that exposed the bug. This bug has been exposed since merging #43990 for non-`Const` inference, and would affect Revise etc. Co-authored-by: Jameson Nash <[email protected]>
This includes only the changes to `dump.c` for this change, but excludes the functional part of the change (except for the additional bugfixes mentioned below). ORIGINAL COMMIT TEXT: This fixes a long-standing issue with how we've handled `invoke` calls with respect to method invalidation. When we load a package, we need to ask whether a given MethodInstance would be compiled in the same way now (aka, in the user's running session) as when the package was precompiled; in practice, the way we do that is to test whether the dispatches would be to the same methods in the current world-age. `invoke` presents special challenges because it allows the coder to deliberately select a different method than the one that would be chosen by ordinary dispatch; if there is no record of how this choice was made, it can look like it resolves to the wrong method and this can trigger invalidation. This allows a MethodInstance to store dispatch tuples as well as other MethodInstances among their backedges. Additionally: - provide backedge-iterators for both C and Julia that abstracts the specific storage mechanism. - fix a bug in the CodeInstance `relocatability` field, where methods that only return a constant (and hence store `nothing` for `inferred`) were deemed non-relocatable. - fix a bug in which #43990 should have checked that the method had not been deleted. Tests passed formerly simply because we weren't caching external CodeInstances that inferred down to a `Const`; fixing that exposed the bug. This bug has been exposed since merging #43990 for non-`Const` inference, and would affect Revise etc. Co-authored-by: Jameson Nash <[email protected]> (cherry picked from commits dd375e1, cb0721b, 9e39fe9)
This is motivated by JuliaLang/julia#43990 (comment), but doesn't seem likely to hurt anything even if that never merges.
Builds on #43793 and #43881. Tested each package with a task for which it was precompiled (I've devved most of these, see diffs that force precompilation here; ModelingToolkit is here). TTFT = Time To Finish Task
EDIT: latest benchmarks are in #43990 (comment)
"Load update" includes the changes in #43990 (comment). (If not reported, they seemed unchanged from the previous measurement.)
Looks like a clear win. (It's not obvious the TTFT for Plots is a significant difference, I've never replicated that gap.)
Closes #42016
Fixes #35972 and possibly the other issues listed in JuliaGraphics/Colors.jl#496 (I haven't checked yet).