-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove typeinfer lock altogether #46825
Conversation
GPUCompiler was using these. Can you explain if and how we need to do locking from there now? |
If GPUCompiler is using the typeinf lock for type inference alone, it should no longer need to do that. If it was using it as a proxy for the codegen lock, there currently isn't a replacement, so I can just add the definitions back in (perhaps with a rename to make it more obvious?) |
We were also using this to lock codegen. But aren't the thread-safe context/modules enough for that? |
Once the LLVM IR has been generated, there should be no need for locks besides the context lock itself. I can't guarantee that all accesses to codeinst fields are safe outside a lock though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it is a simple change needed here now, almost NFC. This make the code look cleaner though.
@maleadt Do you still want the lock methods present or should we merge this as-is? Hopefully in a couple of PRs the codegen lock will entirely dissipate away, so it would be good to identify any external dependencies on it. |
You tell me, I really don't know anymore at this point what we should and shouldn't lock in GPUCompiler.jl. FWIW, we used to use the typinf lock to protect everything involving type inference (jl_type_intersection_with_env, jl_specializations_get_linfo) and codegen (jl_create_native). IIUC the former isn't needed anymore, and the latter takes an OrcThreadSafeModule.
But that's |
482afa6
to
976d642
Compare
I also don't think this is necessary anymore, but it might be safer to drop the lock synchronously with julia dropping the lock in codegen around these functions (#46836) rather than having to do it as part of this PR. So I've added the typeinf lock functions back to avoid breaking GPUCompiler here, though they're uncalled by base.
Internally create_native will take the lock, but the lock doesn't need to be held while calling the function. Also, the lock is released before function return, and the context lock is sufficient to protect the LLVM IR from the return onwards. |
f025f7f
to
1c47b73
Compare
@vtjnash Could you review this PR again? I've added some code to deal with type inference recursion handling which should probably be backported to 1.9. |
JL_DLLEXPORT void jl_typeinf_lock_begin(void) | ||
{ | ||
JL_LOCK(&typeinf_lock); | ||
//Although this is claiming to be a typeinfer lock, it is actually | ||
//affecting the codegen lock count, not type inference's inferencing count | ||
jl_task_t *ct = jl_current_task; | ||
ct->reentrant_codegen++; | ||
} | ||
|
||
JL_DLLEXPORT void jl_typeinf_lock_end(void) | ||
{ | ||
jl_task_t *ct = jl_current_task; | ||
ct->reentrant_codegen--; | ||
JL_UNLOCK(&typeinf_lock); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we instead just remove this now?
JL_DLLEXPORT void jl_typeinf_lock_begin(void) | |
{ | |
JL_LOCK(&typeinf_lock); | |
//Although this is claiming to be a typeinfer lock, it is actually | |
//affecting the codegen lock count, not type inference's inferencing count | |
jl_task_t *ct = jl_current_task; | |
ct->reentrant_codegen++; | |
} | |
JL_DLLEXPORT void jl_typeinf_lock_end(void) | |
{ | |
jl_task_t *ct = jl_current_task; | |
ct->reentrant_codegen--; | |
JL_UNLOCK(&typeinf_lock); | |
} |
(and related exports in jl_exported_funcs.inc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since GPUCompiler is depending on those for type inference, I'd rather keep them around until we completely drop locking around our type inference, just in case we locate any hidden bugs along the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
But apparently this broke CI in many "interesting" ways however? |
I think most of them are now fixed, but why is the analyzegc check newly failing? I don't see any particular reason that the changes would cause a value to become unrooted? |
It is a heuristic-driven pass, so unrelated changes can result in it exploring a portion of the execution graph that it previously skipped (or exploring it differently) |
161c25c
to
bc88b43
Compare
bc88b43
to
5e1936d
Compare
It looks like this PR might have broken CI on master due to a "semantic conflict"? CI was all green on this PR, but when I look at 113efb6 on master, I see failures on i686-w64-mingw32, x86_64-w64-mingw32, and x86_64-apple-darwin. On each platform, the failure is in the
I have triggered a re-run of each of the failed jobs; let's see if the test failures persist on the re-run. |
Seems like PkgEval should have been run on this before merging... and we probably should not be breaking GPUCompiler willy nilly these days even if this didn't also break CI |
If type inference is now thread-safe, then we shouldn't need to be accessing this lock, we only need to hold the codegen lock or module lock as-needed. @pchintalapudi does that sound correct? |
That failure looks like a real intermittent failure (I probably got the timer reporting logic wrong) which should be a quick fix.
At least for me, the GPUCompiler tests seem to be passing on master? Also, none of the changes that I expect would affect GPUCompiler here were included in the final PR.
As per #46825 (comment), if you're just accessing the LLVM IR then yes the context lock is sufficient, but if you're accessing the codeinst as well some of those fields might be modified unsafely. Incidentally, the context lock is probably also necessary in addition to being sufficient if the context is being provided by the Julia compiler. |
I've opened an issue for the CI failure: #47710 |
* Remove typeinfer lock altogether * Don't remove the typeinf lock functions * Track reentrancy in current task state * Fix up some git status * Initialize task variables * Promise that jl_typeinf_func is rooted somewhere (cherry picked from commit 113efb6)
This is the last section of code that refers to the old type inference lock, so now it gets its own mutex to itself.