fix(sigaction): prevent infinite loop in signal handler chaining#437
fix(sigaction): prevent infinite loop in signal handler chaining#437
Conversation
When intercepting sigaction calls from other libraries (e.g., wasmtime), we were returning our handler as oldact. This caused infinite loops: profiler -> wasmtime -> profiler -> wasmtime -> ... Fix: Save original (JVM's) handlers in protectSignalHandlers() BEFORE installing ours, then return those saved handlers as oldact. Now the chain is: profiler -> wasmtime -> JVM Also: - Add sigaction_interception_ut.cpp test to catch this bug - Wire gtest to run before Java tests in testdebug - Remove broken test_tlsPriming.cpp (referenced removed APIs) - Add getSigactionHook() stub for macOS Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Fixes a SIGSEGV/SIGBUS sigaction-interposition bug that could cause infinite recursion when chaining signal handlers (e.g., profiler ↔ wasmtime), by ensuring intercepted sigaction(..., oldact) returns the original JVM handlers rather than the profiler’s handlers.
Changes:
- Save original (JVM) SIGSEGV/SIGBUS handlers before installing profiler handlers, and return those originals as
oldactfrom the Linuxsigactionhook. - Refactor crash-handler path to use an internal “handled vs chain” return and ensure chaining occurs cleanly when not handled.
- Add a focused gtest to catch signal-handler chaining regressions; adjust Gradle wiring so gtest runs before Java tests; remove a broken TLS priming test.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| ddprof-lib/src/main/cpp/os_linux.cpp | Save original JVM handlers and return them in sigaction_hook to prevent recursion loops. |
| ddprof-lib/src/main/cpp/profiler.cpp | Adjust SIGSEGV/SIGBUS handlers to use crashHandlerInternal and chain correctly when unhandled; reorder protectSignalHandlers(). |
| ddprof-lib/src/main/cpp/profiler.h | Add crashHandlerInternal declaration. |
| ddprof-lib/src/main/cpp/os_macos.cpp | Provide getSigactionHook() stub returning null on macOS. |
| ddprof-lib/src/test/cpp/sigaction_interception_ut.cpp | Add unit tests validating oldact correctness and no infinite-loop chaining. |
| build-logic/conventions/.../ProfilerTestPlugin.kt | Make Java test* tasks depend on corresponding gtest* tasks (run C++ tests first). |
| ddprof-lib/src/test/cpp/test_tlsPriming.cpp | Remove broken/obsolete test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add missing <cstring> include for memset - Fix comment to match int return type (0 = not handled, non-zero = handled) Co-Authored-By: Claude Opus 4.5 <[email protected]>
CI Test ResultsRun: #23652258968 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-03-27 15:21:55 UTC |
Extend sigaction interception to cover query-only calls (sigaction(SIGSEGV/SIGBUS, nullptr, &oldact)): return the saved JVM handler instead of ours, so callers that store oldact and later chain to it don't loop back into our handler. Fix intermittent SIGABRT in debug builds on aarch64 GraalVM: extend crashProtectionActive() with a VMThread::isExceptionActive() fallback so the cast_to() assert no longer fires for threads without ProfiledThread TLS when setjmp crash protection is already active in walkVM. Add OS::resetSignalHandlersForTesting() to prevent static state from leaking between sigaction interception unit tests. Add ASCII flow diagram to sigaction_hook documenting the full handler chain and interception cases. Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
* fix(sigaction): prevent infinite loop in signal handler chaining When intercepting sigaction calls from other libraries (e.g., wasmtime), we were returning our handler as oldact. This caused infinite loops: profiler -> wasmtime -> profiler -> wasmtime -> ... Fix: Save original (JVM's) handlers in protectSignalHandlers() BEFORE installing ours, then return those saved handlers as oldact. Now the chain is: profiler -> wasmtime -> JVM Also: - Add sigaction_interception_ut.cpp test to catch this bug - Wire gtest to run before Java tests in testdebug - Remove broken test_tlsPriming.cpp (referenced removed APIs) - Add getSigactionHook() stub for macOS Co-Authored-By: Claude Opus 4.5 <[email protected]> * fix: address PR review comments - Add missing <cstring> include for memset - Fix comment to match int return type (0 = not handled, non-zero = handled) Co-Authored-By: Claude Opus 4.5 <[email protected]> * fix: close query-only sigaction loop and fix debug-mode SIGABRT Extend sigaction interception to cover query-only calls (sigaction(SIGSEGV/SIGBUS, nullptr, &oldact)): return the saved JVM handler instead of ours, so callers that store oldact and later chain to it don't loop back into our handler. Fix intermittent SIGABRT in debug builds on aarch64 GraalVM: extend crashProtectionActive() with a VMThread::isExceptionActive() fallback so the cast_to() assert no longer fires for threads without ProfiledThread TLS when setjmp crash protection is already active in walkVM. Add OS::resetSignalHandlersForTesting() to prevent static state from leaking between sigaction interception unit tests. Add ASCII flow diagram to sigaction_hook documenting the full handler chain and interception cases. Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.5 <[email protected]> (cherry picked from commit 312fc25)
|
Backported to |
… (#447) * fix(sigaction): prevent infinite loop in signal handler chaining When intercepting sigaction calls from other libraries (e.g., wasmtime), we were returning our handler as oldact. This caused infinite loops: profiler -> wasmtime -> profiler -> wasmtime -> ... Fix: Save original (JVM's) handlers in protectSignalHandlers() BEFORE installing ours, then return those saved handlers as oldact. Now the chain is: profiler -> wasmtime -> JVM Also: - Add sigaction_interception_ut.cpp test to catch this bug - Wire gtest to run before Java tests in testdebug - Remove broken test_tlsPriming.cpp (referenced removed APIs) - Add getSigactionHook() stub for macOS * fix: address PR review comments - Add missing <cstring> include for memset - Fix comment to match int return type (0 = not handled, non-zero = handled) * fix: close query-only sigaction loop and fix debug-mode SIGABRT Extend sigaction interception to cover query-only calls (sigaction(SIGSEGV/SIGBUS, nullptr, &oldact)): return the saved JVM handler instead of ours, so callers that store oldact and later chain to it don't loop back into our handler. Fix intermittent SIGABRT in debug builds on aarch64 GraalVM: extend crashProtectionActive() with a VMThread::isExceptionActive() fallback so the cast_to() assert no longer fires for threads without ProfiledThread TLS when setjmp crash protection is already active in walkVM. Add OS::resetSignalHandlersForTesting() to prevent static state from leaking between sigaction interception unit tests. Add ASCII flow diagram to sigaction_hook documenting the full handler chain and interception cases. --------- (cherry picked from commit 312fc25) Co-authored-by: Claude Opus 4.5 <[email protected]>
What does this PR do?:
Fix infinite loop bug in signal handler chaining when wasmtime (or similar libraries) is present.
When intercepting sigaction calls from other libraries, we were returning our handler as
oldact, causing an infinite loop:Now we save and return the original (JVM's) handler, so the chain terminates correctly:
Also fixes an intermittent SIGABRT in debug builds on aarch64 GraalVM:
crashProtectionActive()now falls back toVMThread::isExceptionActive()for threads withoutProfiledThreadTLS, so thecast_to()debug assert no longer fires when setjmp crash protection is already active inwalkVM.Motivation:
Gradle builds were hanging with 94% CPU usage in the dd-trace-processor thread, stuck in an infinite signal handler loop when wasmtime was present. Intermittent SIGABRT was seen on
test-linux-glibc-aarch64 (17-graal, debug)CI jobs.Additional Notes:
Changes:
protectSignalHandlers()BEFORE installing oursoldactinsigaction_hook()for both install and query-only callscrashProtectionActive()withVMThread::isExceptionActive()fallbacksigaction_interception_ut.cpptest to catch this bugOS::resetSignalHandlersForTesting()for test isolationtestDebugtest_tlsPriming.cpp(referenced removed APIs)getSigactionHook()stub for macOSsigaction_hookdocumenting the handler chainHow to test the change?:
sigaction_interception_utvalidates the fix./gradlew :ddprof-lib:gtestDebug- all tests pass./gradlew :ddprof-test:testDebug- gtest now runs before Java testsFor Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.5 [email protected]