Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert eager calls to finalize for otherwise-dead finalizeable objects #44056

Closed

Conversation

jpsamaroo
Copy link
Member

Finalizers are a great way to defer the freeing side of memory management until some later point; however, they can have unpredictable behavior when the data that they free is not fully known to the GC (e.g. GPU allocations, or distributed references). This can result in behavior like out-of-memory situations, excessive memory usage, and sometimes more costly freeing behavior (in the event that locks need to be taken).

This seems like a bad situation, but there is a silver lining: some code patterns which allocate such objects don't actually need the allocations to stick around very long, and the lifetime of the object could (in theory) be statically determined by the compiler. Thankfully, with the ongoing work of integrating EscapeAnalysis.jl into the optimizer, we can use the generated escape information to improve this situation.

This PR uses escape info from EA to determine when an object has an attached finalizer, and when its lifetime is provably finite (i.e. the object does not escape the analyzed scope). For such objects, we can insert an early call to finalize(obj) at the end of obj's lifetime, which will allow the object's finalizer to be enqueued for execution immediately, minimizing how long finalizeable object stay live in the GC.

@jpsamaroo jpsamaroo added GC Garbage collector compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) labels Feb 6, 2022
@jpsamaroo jpsamaroo requested a review from aviatesk February 6, 2022 22:23
@jpsamaroo jpsamaroo force-pushed the jps/finalizer-elision branch from 17791bb to 1d03b6d Compare February 7, 2022 19:45
@jpsamaroo jpsamaroo changed the base branch from avi/EscapeAnalysis to avi/EASROA February 7, 2022 19:46
Copy link
Member

@aviatesk aviatesk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased avi/EASROA. Maybe rebasing this branch against it would fix the build error?

base/compiler/optimize.jl Outdated Show resolved Hide resolved
@jpsamaroo
Copy link
Member Author

Latest push post-rebase still spams UndefRefError() in adce_pass!

@jpsamaroo
Copy link
Member Author

Error:

Internal error: encountered unexpected error in runtime:
UndefRefError()
getindex at ./array.jl:921 [inlined]
getindex at ./compiler/ssair/ir.jl:238 [inlined]
is_union_phi at ./compiler/ssair/passes.jl:1186 [inlined]
adce_pass! at ./compiler/ssair/passes.jl:1240
run_passes at ./compiler/optimize.jl:606
optimize at ./compiler/optimize.jl:585 [inlined]
_typeinf at ./compiler/typeinfer.jl:253
typeinf at ./compiler/typeinfer.jl:209
typeinf_edge at ./compiler/typeinfer.jl:831
abstract_call_method at ./compiler/abstractinterpretation.jl:561
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:114
abstract_call_known at ./compiler/abstractinterpretation.jl:1475
unknown function (ip: 0x7f13e64d468d)
_jl_invoke at /home/jpsamaroo/julia-fin-el/src/gf.c:2311
ijl_invoke at /home/jpsamaroo/julia-fin-el/src/gf.c:2337
unknown function (ip: 0x7f13e6aceb4c)
unknown function (ip: 0x7f13e6aceaad)

@jpsamaroo
Copy link
Member Author

Per discussion: the issue here is probably that we don't check that the return dominates the allocation passed to finalize, so we'll need to query the domtree as well.

This change will also potentially cause extended lifetimes for some allocations, but that's apparently a general issue that needs resolving.

@aviatesk aviatesk force-pushed the avi/EASROA branch 4 times, most recently from ef8f7be to 04f2d48 Compare February 10, 2022 15:44
@jpsamaroo jpsamaroo force-pushed the jps/finalizer-elision branch from 9dff04d to 9714afc Compare February 11, 2022 15:16
@jpsamaroo jpsamaroo marked this pull request as ready for review February 11, 2022 15:25
@jpsamaroo jpsamaroo added the needs tests Unit tests are required for this change label Feb 11, 2022
@jpsamaroo
Copy link
Member Author

We're now factoring in dominance information (hopefully correctly), which appears to have fixed the errors!

This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (JuliaLang#43888),
array SROA (JuliaLang#43909), `mutating_arrayfreeze` optimization (JuliaLang#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA in this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA in this commit, and then merge the
depending PRs built on top of this commit like JuliaLang#43888, JuliaLang#43909 and JuliaLang#42465

This commit simply defines and runs EA inside Julia base compiler and
enables the existing test suite with it. In this commit, we just run EA
before inlining to generate IPO cache. The depending PRs, EA will be
invoked again after inlining to be used for various local optimizations.
Enhances SROA of mutables using the novel Julia-level escape analysis (on top of JuliaLang#43800):
1. alias-aware SROA, mutable ϕ-node elimination
2. `isdefined` check elimination
3. load-forwarding for non-eliminable but analyzable mutables

---

1. alias-aware SROA, mutable ϕ-node elimination

EA's alias analysis allows this new SROA to handle nested mutables allocations
pretty well. Now we can eliminate the heap allocations completely from
this insanely nested examples by the single analysis/optimization pass:
```julia
julia> function refs(x)
           (Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref((x))))))))))))[][][][][][][][][][]
       end
refs (generic function with 1 method)

julia> refs("julia"); @allocated refs("julia")
0
```

EA can also analyze escape of ϕ-node as well as its aliasing.
Mutable ϕ-nodes would be eliminated even for a very tricky case as like:
```julia
julia> code_typed((Bool,String,)) do cond, x
           # these allocation form multiple ϕ-nodes
           if cond
               ϕ2 = ϕ1 = Ref{Any}("foo")
           else
               ϕ2 = ϕ1 = Ref{Any}("bar")
           end
           ϕ2[] = x
           y = ϕ1[] # => x
           return y
       end
1-element Vector{Any}:
 CodeInfo(
1 ─     goto JuliaLang#3 if not cond
2 ─     goto JuliaLang#4
3 ─     nothing::Nothing
4 ┄     return x
) => Any
```

Combined with the alias analysis and ϕ-node handling above,
allocations in the following "realistic" examples will be optimized:
```julia
julia> # demonstrate the power of our field / alias analysis with realistic end to end examples
       # adapted from http://wiki.luajit.org/Allocation-Sinking-Optimization#implementation%5B
       abstract type AbstractPoint{T} end

julia> struct Point{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> mutable struct MPoint{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> add(a::P, b::P) where P<:AbstractPoint = P(a.x + b.x, a.y + b.y);

julia> function compute_point(T, n, ax, ay, bx, by)
           a = T(ax, ay)
           b = T(bx, by)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point(n, a, b)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point!(n, a, b)
           for i in 0:(n-1)
               a′ = add(add(a, b), b)
               a.x = a′.x
               a.y = a′.y
           end
       end;

julia> compute_point(MPoint, 10, 1+.5, 2+.5, 2+.25, 4+.75);

julia> compute_point(MPoint, 10, 1+.5im, 2+.5im, 2+.25im, 4+.75im);

julia> @allocated compute_point(MPoint, 10000, 1+.5, 2+.5, 2+.25, 4+.75)
0

julia> @allocated compute_point(MPoint, 10000, 1+.5im, 2+.5im, 2+.25im, 4+.75im)
0

julia> compute_point(10, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75));

julia> compute_point(10, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im));

julia> @allocated compute_point(10000, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75))
0

julia> @allocated compute_point(10000, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im))
0

julia> af, bf = MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75);

julia> ac, bc = MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im);

julia> compute_point!(10, af, bf);

julia> compute_point!(10, ac, bc);

julia> @allocated compute_point!(10000, af, bf)
0

julia> @allocated compute_point!(10000, ac, bc)
0
```

2. `isdefined` check elimination

This commit also implements a simple optimization to eliminate
`isdefined` call by checking load-fowardability.
This optimization may be especially useful to eliminate extra allocation
involved with a capturing closure, e.g.:
```julia
julia> callit(f, args...) = f(args...);

julia> function isdefined_elim()
           local arr::Vector{Any}
           callit() do
               arr = Any[]
           end
           return arr
       end;

julia> code_typed(isdefined_elim)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}
└──      goto JuliaLang#3 if not true
2 ─      goto JuliaLang#4
3 ─      $(Expr(:throw_undef_if_not, :arr, false))::Any
4 ┄      return %1
) => Vector{Any}
```

3. load-forwarding for non-eliminable but analyzable mutables

EA also allows us to forward loads even when the mutable allocation
can't be eliminated but still its fields are known precisely.
The load forwarding might be useful since it may derive new type information
that succeeding optimization passes can use (or just because it allows
simpler code transformations down the load):
```julia
julia> code_typed((Bool,String,)) do c, s
           r = Ref{Any}(s)
           if c
               return r[]::String # adce_pass! will further eliminate this type assert call also
           else
               return r
           end
       end
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = %new(Base.RefValue{Any}, s)::Base.RefValue{Any}
└──      goto JuliaLang#3 if not c
2 ─      return s
3 ─      return %1
) => Union{Base.RefValue{Any}, String}
```

---

Please refer to the newly added test cases for more examples.
Also, EA's alias analysis already succeeds to reason about arrays, and
so this EA-based SROA will hopefully be generalized for array SROA as well.
Co-authored-by: Shuhei Kadowaki <[email protected]>
@jpsamaroo jpsamaroo force-pushed the jps/finalizer-elision branch from 9714afc to ea0aaab Compare March 1, 2022 17:25
@yuyichao
Copy link
Contributor

yuyichao commented Mar 4, 2022

Note that calling finalize is expensive since it's not designed to be used this way. This is especially in code that uses finalizer a lot, which seems to be what this is targetting. In another word, this transformation will make your code run slower if it didn't run out of memory.

If you can prove that the object has only known finalizers, and you know what those are, you should be able to call the finalizer directly without going through the normal GC logic. If you can't prove that exception won't occur, which you probably won't be able to prove at this level, you can just set a flag in the GC to tell it to not run any currently registered finalizers (finalize may need to check this flag as well). If that's too much code to generate, you can simply add a new C API to pass in the finalizer directly so that you can skip the scan of the finalizer list and have that C API set appropriate flags for the GC.

Also, as I've said many time before, it is fairly easy to effectively let the GC know about these object. In most cases all what you need to do is to call GC.gc() when your allocation/file opening failed.

@AriMKatz
Copy link

AriMKatz commented Mar 7, 2022

@chflood any thoughts on this? (Particularly wrt to GPU)

@maleadt
Copy link
Member

maleadt commented Mar 8, 2022

the counter-argument against directly calling the finalizer thunk is that users may not have written their finalizers such that they can be safely called in the same scope as the allocation (for example, if the finalizer takes a non-reentrant lock that is already held in the allocation scope).

FWIW, for CUDA.jl it'd be advantageous to free in the same scope, because IIUC that means using the same task, which has the effect that memory operations can be ordered against the task-local stream. If they get executed in the finalizer's task, that means using a global stream and ordering against all streams.

Are packages currently doing such resource handling in finalizers, given we don't have #35689?

@jpsamaroo
Copy link
Member Author

Are packages currently doing such resource handling in finalizers, given we don't have #35689?

Yes, MemPool.jl is using a global non-reentrant lock (taken from CUDAdrv/CUDAnative originally), which is taken during allocation, and during finalization. I wouldn't mind changing it to a regular ReentrantLock if this is considered to be an unsupported pattern (I'm not sure if it really needs to be non-reentrant anymore, now that we disable finalizers while taking locks; @krynju).

@chflood
Copy link
Member

chflood commented Mar 8, 2022

In principal I love the idea of escape analysis stack allocating objects that go away without garbage collector intervention however objects with finalizers pose special issues that I don't fully understand.

I'm still learning Julia so please indulge my questions. Does Julia have a rule about finalizers running exactly once like Java does? Can a finalizer bring an object back to life by stashing it somewhere? My concern is that the GC might accidentally run a finalizer again on a zombie object.

There are also memory model issues in Java as detailed here which may or may not be applicable to Julia. If you run the finalizer without some sort of memory barrier is it possible that instructions may be reordered in incorrect ways? GC provides that memory barrier.

@oscardssmith
Copy link
Member

We don't appear to document such a property, (or really anything about how finalizers are run). https://docs.julialang.org/en/v1.9-dev/base/base/#Base.finalizer and https://docs.julialang.org/en/v1.9-dev/manual/multi-threading/#Safe-use-of-Finalizers are the only places where they are documented at all. We should probably figure out what properties we want to guarantee and document them.

@jpsamaroo
Copy link
Member Author

It appears to me that the current implementation of finalize ends up calling all finalizers for the object directly (instead of queuing them for later), meaning that finalizers must be safe to execute immediately in allocation scope if this pass calls finalize. Is this something that we want to assume for finalizers? Or do we want to assume that finalizers must be executed outside of allocation scope, and thus change the approach in this PR to using a delayed approach?

@jpsamaroo jpsamaroo marked this pull request as draft March 8, 2022 19:08
@jpsamaroo jpsamaroo removed the needs tests Unit tests are required for this change label Mar 8, 2022
@jpsamaroo jpsamaroo force-pushed the jps/finalizer-elision branch from 25e4b37 to e708e23 Compare March 8, 2022 20:57
@yuyichao
Copy link
Contributor

yuyichao commented Mar 9, 2022

I agree with adding this fast-path; would it be reasonable to punt that to a future PR, or do you want to see that done here before this is considered for merge?

C api should be fairly straightforward as well. The issue with the pr as is is that it will introduce regression.

Can a finalizer bring an object back to life by stashing it somewhere?

yes.

My concern is that the GC might accidentally run a finalizer again on a zombie object.

no that is not supposed to happen, each finalizer will run only once. It is removed from the list before being called.

Is this something that we want to assume for finalizers?

not just that, finalizer can run at any time that a gc can run. There were proposals about running hem on separate thread but it is not done and still have issues regarding gc triggered on the finalizer thread.

@aviatesk aviatesk force-pushed the avi/EASROA branch 2 times, most recently from 9c84ddc to cdef102 Compare March 23, 2022 07:11
Keno added a commit that referenced this pull request May 11, 2022
This is a variant of the eager-finalization idea
(e.g. as seen in #44056), but with a focus on the mechanism
of finalizer insertion, since I need a similar pass downstream.
Integration of EscapeAnalysis is left to #44056.

My motivation for this change is somewhat different. In particular,
I want to be able to insert finalize call such that I can
subsequently SROA the mutable object. This requires a couple
design points that are more stringent than the pass from #44056,
so I decided to prototype them as an independent PR. The primary
things I need here that are not seen in #44056 are:

- The ability to forgo finalizer registration with the runtime
  entirely (requires additional legality analyis)
- The ability to inline the registered finalizer at the deallocation
  point (to enable subsequent SROA)

To this end, adding a finalizer is promoted to a builtin
that is recognized by inference and inlining (such that inference
can produce an inferred version of the finalizer for inlining).

The current status is that this fixes the minimal example I wanted
to have work, but does not yet extend to the motivating case I had.
Nevertheless, I felt that this was a good checkpoint to synchronize
with other efforts along these lines.

Currently working demo:

```
julia> const total_deallocations = Ref{Int}(0)
Base.RefValue{Int64}(0)

julia> mutable struct DoAlloc
               function DoAlloc()
                   this = new()
                       Core._add_finalizer(this, function(this)
                               global total_deallocations[] += 1
                       end)
                       return this
               end
       end

julia> function foo()
               for i = 1:1000
                       DoAlloc()
               end
       end
foo (generic function with 1 method)

julia> @code_llvm foo()
;  @ REPL[3]:1 within `foo`
define void @julia_foo_111() #0 {
top:
  %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:2 within `foo`
  %0 = add i64 %.promoted, 1000
;  @ REPL[3] within `foo`
  store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:4 within `foo`
  ret void
}
```
@Keno Keno mentioned this pull request May 11, 2022
Keno added a commit that referenced this pull request May 12, 2022
This is a variant of the eager-finalization idea
(e.g. as seen in #44056), but with a focus on the mechanism
of finalizer insertion, since I need a similar pass downstream.
Integration of EscapeAnalysis is left to #44056.

My motivation for this change is somewhat different. In particular,
I want to be able to insert finalize call such that I can
subsequently SROA the mutable object. This requires a couple
design points that are more stringent than the pass from #44056,
so I decided to prototype them as an independent PR. The primary
things I need here that are not seen in #44056 are:

- The ability to forgo finalizer registration with the runtime
  entirely (requires additional legality analyis)
- The ability to inline the registered finalizer at the deallocation
  point (to enable subsequent SROA)

To this end, adding a finalizer is promoted to a builtin
that is recognized by inference and inlining (such that inference
can produce an inferred version of the finalizer for inlining).

The current status is that this fixes the minimal example I wanted
to have work, but does not yet extend to the motivating case I had.
Nevertheless, I felt that this was a good checkpoint to synchronize
with other efforts along these lines.

Currently working demo:

```
julia> const total_deallocations = Ref{Int}(0)
Base.RefValue{Int64}(0)

julia> mutable struct DoAlloc
               function DoAlloc()
                   this = new()
                       Core._add_finalizer(this, function(this)
                               global total_deallocations[] += 1
                       end)
                       return this
               end
       end

julia> function foo()
               for i = 1:1000
                       DoAlloc()
               end
       end
foo (generic function with 1 method)

julia> @code_llvm foo()
;  @ REPL[3]:1 within `foo`
define void @julia_foo_111() #0 {
top:
  %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:2 within `foo`
  %0 = add i64 %.promoted, 1000
;  @ REPL[3] within `foo`
  store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:4 within `foo`
  ret void
}
```
Keno added a commit that referenced this pull request May 12, 2022
This is a variant of the eager-finalization idea
(e.g. as seen in #44056), but with a focus on the mechanism
of finalizer insertion, since I need a similar pass downstream.
Integration of EscapeAnalysis is left to #44056.

My motivation for this change is somewhat different. In particular,
I want to be able to insert finalize call such that I can
subsequently SROA the mutable object. This requires a couple
design points that are more stringent than the pass from #44056,
so I decided to prototype them as an independent PR. The primary
things I need here that are not seen in #44056 are:

- The ability to forgo finalizer registration with the runtime
  entirely (requires additional legality analyis)
- The ability to inline the registered finalizer at the deallocation
  point (to enable subsequent SROA)

To this end, adding a finalizer is promoted to a builtin
that is recognized by inference and inlining (such that inference
can produce an inferred version of the finalizer for inlining).

The current status is that this fixes the minimal example I wanted
to have work, but does not yet extend to the motivating case I had.
Nevertheless, I felt that this was a good checkpoint to synchronize
with other efforts along these lines.

Currently working demo:

```
julia> const total_deallocations = Ref{Int}(0)
Base.RefValue{Int64}(0)

julia> mutable struct DoAlloc
               function DoAlloc()
                   this = new()
                       Core._add_finalizer(this, function(this)
                               global total_deallocations[] += 1
                       end)
                       return this
               end
       end

julia> function foo()
               for i = 1:1000
                       DoAlloc()
               end
       end
foo (generic function with 1 method)

julia> @code_llvm foo()
;  @ REPL[3]:1 within `foo`
define void @julia_foo_111() #0 {
top:
  %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:2 within `foo`
  %0 = add i64 %.promoted, 1000
;  @ REPL[3] within `foo`
  store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:4 within `foo`
  ret void
}
```
Keno added a commit that referenced this pull request May 16, 2022
This is a variant of the eager-finalization idea
(e.g. as seen in #44056), but with a focus on the mechanism
of finalizer insertion, since I need a similar pass downstream.
Integration of EscapeAnalysis is left to #44056.

My motivation for this change is somewhat different. In particular,
I want to be able to insert finalize call such that I can
subsequently SROA the mutable object. This requires a couple
design points that are more stringent than the pass from #44056,
so I decided to prototype them as an independent PR. The primary
things I need here that are not seen in #44056 are:

- The ability to forgo finalizer registration with the runtime
  entirely (requires additional legality analyis)
- The ability to inline the registered finalizer at the deallocation
  point (to enable subsequent SROA)

To this end, adding a finalizer is promoted to a builtin
that is recognized by inference and inlining (such that inference
can produce an inferred version of the finalizer for inlining).

The current status is that this fixes the minimal example I wanted
to have work, but does not yet extend to the motivating case I had.
Nevertheless, I felt that this was a good checkpoint to synchronize
with other efforts along these lines.

Currently working demo:

```
julia> const total_deallocations = Ref{Int}(0)
Base.RefValue{Int64}(0)

julia> mutable struct DoAlloc
               function DoAlloc()
                   this = new()
                       Core._add_finalizer(this, function(this)
                               global total_deallocations[] += 1
                       end)
                       return this
               end
       end

julia> function foo()
               for i = 1:1000
                       DoAlloc()
               end
       end
foo (generic function with 1 method)

julia> @code_llvm foo()
;  @ REPL[3]:1 within `foo`
define void @julia_foo_111() #0 {
top:
  %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:2 within `foo`
  %0 = add i64 %.promoted, 1000
;  @ REPL[3] within `foo`
  store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:4 within `foo`
  ret void
}
```
Keno added a commit that referenced this pull request May 16, 2022
This is a variant of the eager-finalization idea
(e.g. as seen in #44056), but with a focus on the mechanism
of finalizer insertion, since I need a similar pass downstream.
Integration of EscapeAnalysis is left to #44056.

My motivation for this change is somewhat different. In particular,
I want to be able to insert finalize call such that I can
subsequently SROA the mutable object. This requires a couple
design points that are more stringent than the pass from #44056,
so I decided to prototype them as an independent PR. The primary
things I need here that are not seen in #44056 are:

- The ability to forgo finalizer registration with the runtime
  entirely (requires additional legality analyis)
- The ability to inline the registered finalizer at the deallocation
  point (to enable subsequent SROA)

To this end, adding a finalizer is promoted to a builtin
that is recognized by inference and inlining (such that inference
can produce an inferred version of the finalizer for inlining).

The current status is that this fixes the minimal example I wanted
to have work, but does not yet extend to the motivating case I had.
Nevertheless, I felt that this was a good checkpoint to synchronize
with other efforts along these lines.

Currently working demo:

```
julia> const total_deallocations = Ref{Int}(0)
Base.RefValue{Int64}(0)

julia> mutable struct DoAlloc
               function DoAlloc()
                   this = new()
                       Core._add_finalizer(this, function(this)
                               global total_deallocations[] += 1
                       end)
                       return this
               end
       end

julia> function foo()
               for i = 1:1000
                       DoAlloc()
               end
       end
foo (generic function with 1 method)

julia> @code_llvm foo()
;  @ REPL[3]:1 within `foo`
define void @julia_foo_111() #0 {
top:
  %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:2 within `foo`
  %0 = add i64 %.promoted, 1000
;  @ REPL[3] within `foo`
  store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:4 within `foo`
  ret void
}
```
Keno added a commit that referenced this pull request May 25, 2022
This is a variant of the eager-finalization idea
(e.g. as seen in #44056), but with a focus on the mechanism
of finalizer insertion, since I need a similar pass downstream.
Integration of EscapeAnalysis is left to #44056.

My motivation for this change is somewhat different. In particular,
I want to be able to insert finalize call such that I can
subsequently SROA the mutable object. This requires a couple
design points that are more stringent than the pass from #44056,
so I decided to prototype them as an independent PR. The primary
things I need here that are not seen in #44056 are:

- The ability to forgo finalizer registration with the runtime
  entirely (requires additional legality analyis)
- The ability to inline the registered finalizer at the deallocation
  point (to enable subsequent SROA)

To this end, adding a finalizer is promoted to a builtin
that is recognized by inference and inlining (such that inference
can produce an inferred version of the finalizer for inlining).

The current status is that this fixes the minimal example I wanted
to have work, but does not yet extend to the motivating case I had.
Nevertheless, I felt that this was a good checkpoint to synchronize
with other efforts along these lines.

Currently working demo:

```
julia> const total_deallocations = Ref{Int}(0)
Base.RefValue{Int64}(0)

julia> mutable struct DoAlloc
               function DoAlloc()
                   this = new()
                       Core._add_finalizer(this, function(this)
                               global total_deallocations[] += 1
                       end)
                       return this
               end
       end

julia> function foo()
               for i = 1:1000
                       DoAlloc()
               end
       end
foo (generic function with 1 method)

julia> @code_llvm foo()
;  @ REPL[3]:1 within `foo`
define void @julia_foo_111() #0 {
top:
  %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:2 within `foo`
  %0 = add i64 %.promoted, 1000
;  @ REPL[3] within `foo`
  store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:4 within `foo`
  ret void
}
```
Keno added a commit that referenced this pull request May 25, 2022
This is a variant of the eager-finalization idea
(e.g. as seen in #44056), but with a focus on the mechanism
of finalizer insertion, since I need a similar pass downstream.
Integration of EscapeAnalysis is left to #44056.

My motivation for this change is somewhat different. In particular,
I want to be able to insert finalize call such that I can
subsequently SROA the mutable object. This requires a couple
design points that are more stringent than the pass from #44056,
so I decided to prototype them as an independent PR. The primary
things I need here that are not seen in #44056 are:

- The ability to forgo finalizer registration with the runtime
  entirely (requires additional legality analyis)
- The ability to inline the registered finalizer at the deallocation
  point (to enable subsequent SROA)

To this end, adding a finalizer is promoted to a builtin
that is recognized by inference and inlining (such that inference
can produce an inferred version of the finalizer for inlining).

The current status is that this fixes the minimal example I wanted
to have work, but does not yet extend to the motivating case I had.
Nevertheless, I felt that this was a good checkpoint to synchronize
with other efforts along these lines.

Currently working demo:

```
julia> const total_deallocations = Ref{Int}(0)
Base.RefValue{Int64}(0)

julia> mutable struct DoAlloc
               function DoAlloc()
                   this = new()
                       Core._add_finalizer(this, function(this)
                               global total_deallocations[] += 1
                       end)
                       return this
               end
       end

julia> function foo()
               for i = 1:1000
                       DoAlloc()
               end
       end
foo (generic function with 1 method)

julia> @code_llvm foo()
;  @ REPL[3]:1 within `foo`
define void @julia_foo_111() #0 {
top:
  %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:2 within `foo`
  %0 = add i64 %.promoted, 1000
;  @ REPL[3] within `foo`
  store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:4 within `foo`
  ret void
}
```
Keno added a commit that referenced this pull request May 25, 2022
This is a variant of the eager-finalization idea
(e.g. as seen in #44056), but with a focus on the mechanism
of finalizer insertion, since I need a similar pass downstream.
Integration of EscapeAnalysis is left to #44056.

My motivation for this change is somewhat different. In particular,
I want to be able to insert finalize call such that I can
subsequently SROA the mutable object. This requires a couple
design points that are more stringent than the pass from #44056,
so I decided to prototype them as an independent PR. The primary
things I need here that are not seen in #44056 are:

- The ability to forgo finalizer registration with the runtime
  entirely (requires additional legality analyis)
- The ability to inline the registered finalizer at the deallocation
  point (to enable subsequent SROA)

To this end, adding a finalizer is promoted to a builtin
that is recognized by inference and inlining (such that inference
can produce an inferred version of the finalizer for inlining).

The current status is that this fixes the minimal example I wanted
to have work, but does not yet extend to the motivating case I had.
Nevertheless, I felt that this was a good checkpoint to synchronize
with other efforts along these lines.

Currently working demo:

```
julia> const total_deallocations = Ref{Int}(0)
Base.RefValue{Int64}(0)

julia> mutable struct DoAlloc
               function DoAlloc()
                   this = new()
                       Core._add_finalizer(this, function(this)
                               global total_deallocations[] += 1
                       end)
                       return this
               end
       end

julia> function foo()
               for i = 1:1000
                       DoAlloc()
               end
       end
foo (generic function with 1 method)

julia> @code_llvm foo()
;  @ REPL[3]:1 within `foo`
define void @julia_foo_111() #0 {
top:
  %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:2 within `foo`
  %0 = add i64 %.promoted, 1000
;  @ REPL[3] within `foo`
  store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:4 within `foo`
  ret void
}
```
Keno added a commit that referenced this pull request May 29, 2022
This is a variant of the eager-finalization idea
(e.g. as seen in #44056), but with a focus on the mechanism
of finalizer insertion, since I need a similar pass downstream.
Integration of EscapeAnalysis is left to #44056.

My motivation for this change is somewhat different. In particular,
I want to be able to insert finalize call such that I can
subsequently SROA the mutable object. This requires a couple
design points that are more stringent than the pass from #44056,
so I decided to prototype them as an independent PR. The primary
things I need here that are not seen in #44056 are:

- The ability to forgo finalizer registration with the runtime
  entirely (requires additional legality analyis)
- The ability to inline the registered finalizer at the deallocation
  point (to enable subsequent SROA)

To this end, adding a finalizer is promoted to a builtin
that is recognized by inference and inlining (such that inference
can produce an inferred version of the finalizer for inlining).

The current status is that this fixes the minimal example I wanted
to have work, but does not yet extend to the motivating case I had.
Nevertheless, I felt that this was a good checkpoint to synchronize
with other efforts along these lines.

Currently working demo:

```
julia> const total_deallocations = Ref{Int}(0)
Base.RefValue{Int64}(0)

julia> mutable struct DoAlloc
               function DoAlloc()
                   this = new()
                       Core._add_finalizer(this, function(this)
                               global total_deallocations[] += 1
                       end)
                       return this
               end
       end

julia> function foo()
               for i = 1:1000
                       DoAlloc()
               end
       end
foo (generic function with 1 method)

julia> @code_llvm foo()
;  @ REPL[3]:1 within `foo`
define void @julia_foo_111() #0 {
top:
  %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:2 within `foo`
  %0 = add i64 %.promoted, 1000
;  @ REPL[3] within `foo`
  store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:4 within `foo`
  ret void
}
```
Keno added a commit that referenced this pull request Jun 7, 2022
* Eager finalizer insertion

This is a variant of the eager-finalization idea
(e.g. as seen in #44056), but with a focus on the mechanism
of finalizer insertion, since I need a similar pass downstream.
Integration of EscapeAnalysis is left to #44056.

My motivation for this change is somewhat different. In particular,
I want to be able to insert finalize call such that I can
subsequently SROA the mutable object. This requires a couple
design points that are more stringent than the pass from #44056,
so I decided to prototype them as an independent PR. The primary
things I need here that are not seen in #44056 are:

- The ability to forgo finalizer registration with the runtime
  entirely (requires additional legality analyis)
- The ability to inline the registered finalizer at the deallocation
  point (to enable subsequent SROA)

To this end, adding a finalizer is promoted to a builtin
that is recognized by inference and inlining (such that inference
can produce an inferred version of the finalizer for inlining).

The current status is that this fixes the minimal example I wanted
to have work, but does not yet extend to the motivating case I had.
Nevertheless, I felt that this was a good checkpoint to synchronize
with other efforts along these lines.

Currently working demo:

```
julia> const total_deallocations = Ref{Int}(0)
Base.RefValue{Int64}(0)

julia> mutable struct DoAlloc
               function DoAlloc()
                   this = new()
                       Core._add_finalizer(this, function(this)
                               global total_deallocations[] += 1
                       end)
                       return this
               end
       end

julia> function foo()
               for i = 1:1000
                       DoAlloc()
               end
       end
foo (generic function with 1 method)

julia> @code_llvm foo()
;  @ REPL[3]:1 within `foo`
define void @julia_foo_111() #0 {
top:
  %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:2 within `foo`
  %0 = add i64 %.promoted, 1000
;  @ REPL[3] within `foo`
  store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16
;  @ REPL[3]:4 within `foo`
  ret void
}
```

* rm redundant copy

Co-authored-by: Shuhei Kadowaki <[email protected]>
@jpsamaroo
Copy link
Member Author

Abandoned in favor of #45272

@jpsamaroo jpsamaroo closed this Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) GC Garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants