Skip to content

fix(sigaction): prevent infinite loop in signal handler chaining#437

Merged
jbachorik merged 5 commits intomainfrom
jb/sigaction_patch_1
Mar 27, 2026
Merged

fix(sigaction): prevent infinite loop in signal handler chaining#437
jbachorik merged 5 commits intomainfrom
jb/sigaction_patch_1

Conversation

@jbachorik
Copy link
Copy Markdown
Collaborator

@jbachorik jbachorik commented Mar 24, 2026

What does this PR do?:
Fix infinite loop bug in signal handler chaining when wasmtime (or similar libraries) is present.

When intercepting sigaction calls from other libraries, we were returning our handler as oldact, causing an infinite loop:

profiler -> wasmtime -> profiler -> wasmtime -> ...

Now we save and return the original (JVM's) handler, so the chain terminates correctly:

profiler -> wasmtime -> JVM

Also fixes an intermittent SIGABRT in debug builds on aarch64 GraalVM: crashProtectionActive() now falls back to VMThread::isExceptionActive() for threads without ProfiledThread TLS, so the cast_to() debug assert no longer fires when setjmp crash protection is already active in walkVM.

Motivation:
Gradle builds were hanging with 94% CPU usage in the dd-trace-processor thread, stuck in an infinite signal handler loop when wasmtime was present. Intermittent SIGABRT was seen on test-linux-glibc-aarch64 (17-graal, debug) CI jobs.

Additional Notes:
Changes:

  • Save original handlers in protectSignalHandlers() BEFORE installing ours
  • Return saved JVM handlers as oldact in sigaction_hook() for both install and query-only calls
  • Extend crashProtectionActive() with VMThread::isExceptionActive() fallback
  • Add sigaction_interception_ut.cpp test to catch this bug
  • Add OS::resetSignalHandlersForTesting() for test isolation
  • Wire gtest to run before Java tests in testDebug
  • Remove broken test_tlsPriming.cpp (referenced removed APIs)
  • Add getSigactionHook() stub for macOS
  • Add ASCII flow diagram to sigaction_hook documenting the handler chain

How to test the change?:

  • New unit test sigaction_interception_ut validates the fix
  • Run ./gradlew :ddprof-lib:gtestDebug - all tests pass
  • Run ./gradlew :ddprof-test:testDebug - gtest now runs before Java tests

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-14144

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 [email protected]

When intercepting sigaction calls from other libraries (e.g., wasmtime),
we were returning our handler as oldact. This caused infinite loops:
  profiler -> wasmtime -> profiler -> wasmtime -> ...

Fix: Save original (JVM's) handlers in protectSignalHandlers() BEFORE
installing ours, then return those saved handlers as oldact. Now the
chain is: profiler -> wasmtime -> JVM

Also:
- Add sigaction_interception_ut.cpp test to catch this bug
- Wire gtest to run before Java tests in testdebug
- Remove broken test_tlsPriming.cpp (referenced removed APIs)
- Add getSigactionHook() stub for macOS

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a SIGSEGV/SIGBUS sigaction-interposition bug that could cause infinite recursion when chaining signal handlers (e.g., profiler ↔ wasmtime), by ensuring intercepted sigaction(..., oldact) returns the original JVM handlers rather than the profiler’s handlers.

Changes:

  • Save original (JVM) SIGSEGV/SIGBUS handlers before installing profiler handlers, and return those originals as oldact from the Linux sigaction hook.
  • Refactor crash-handler path to use an internal “handled vs chain” return and ensure chaining occurs cleanly when not handled.
  • Add a focused gtest to catch signal-handler chaining regressions; adjust Gradle wiring so gtest runs before Java tests; remove a broken TLS priming test.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
ddprof-lib/src/main/cpp/os_linux.cpp Save original JVM handlers and return them in sigaction_hook to prevent recursion loops.
ddprof-lib/src/main/cpp/profiler.cpp Adjust SIGSEGV/SIGBUS handlers to use crashHandlerInternal and chain correctly when unhandled; reorder protectSignalHandlers().
ddprof-lib/src/main/cpp/profiler.h Add crashHandlerInternal declaration.
ddprof-lib/src/main/cpp/os_macos.cpp Provide getSigactionHook() stub returning null on macOS.
ddprof-lib/src/test/cpp/sigaction_interception_ut.cpp Add unit tests validating oldact correctness and no infinite-loop chaining.
build-logic/conventions/.../ProfilerTestPlugin.kt Make Java test* tasks depend on corresponding gtest* tasks (run C++ tests first).
ddprof-lib/src/test/cpp/test_tlsPriming.cpp Remove broken/obsolete test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add missing <cstring> include for memset
- Fix comment to match int return type (0 = not handled, non-zero = handled)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@jbachorik jbachorik marked this pull request as ready for review March 25, 2026 13:10
@jbachorik jbachorik requested a review from a team as a code owner March 25, 2026 13:10
Copy link
Copy Markdown
Contributor

@rkennke rkennke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@dd-octo-sts
Copy link
Copy Markdown

dd-octo-sts bot commented Mar 27, 2026

CI Test Results

Run: #23652258968 | Commit: aad1b86 | Duration: 26m 9s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-03-27 15:21:55 UTC

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 27, 2026

Integration Tests

All 40 integration tests passed

📊 Dashboard · 👷 Pipeline · 📦 d6623689

jbachorik and others added 2 commits March 27, 2026 10:35
Extend sigaction interception to cover query-only calls
(sigaction(SIGSEGV/SIGBUS, nullptr, &oldact)): return the saved JVM
handler instead of ours, so callers that store oldact and later chain
to it don't loop back into our handler.

Fix intermittent SIGABRT in debug builds on aarch64 GraalVM: extend
crashProtectionActive() with a VMThread::isExceptionActive() fallback
so the cast_to() assert no longer fires for threads without ProfiledThread
TLS when setjmp crash protection is already active in walkVM.

Add OS::resetSignalHandlersForTesting() to prevent static state from
leaking between sigaction interception unit tests.

Add ASCII flow diagram to sigaction_hook documenting the full handler
chain and interception cases.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
@jbachorik jbachorik merged commit 312fc25 into main Mar 27, 2026
84 checks passed
@jbachorik jbachorik deleted the jb/sigaction_patch_1 branch March 27, 2026 15:38
@github-actions github-actions bot added this to the 1.41.0 milestone Mar 27, 2026
jbachorik added a commit that referenced this pull request Mar 27, 2026
* fix(sigaction): prevent infinite loop in signal handler chaining

When intercepting sigaction calls from other libraries (e.g., wasmtime),
we were returning our handler as oldact. This caused infinite loops:
  profiler -> wasmtime -> profiler -> wasmtime -> ...

Fix: Save original (JVM's) handlers in protectSignalHandlers() BEFORE
installing ours, then return those saved handlers as oldact. Now the
chain is: profiler -> wasmtime -> JVM

Also:
- Add sigaction_interception_ut.cpp test to catch this bug
- Wire gtest to run before Java tests in testdebug
- Remove broken test_tlsPriming.cpp (referenced removed APIs)
- Add getSigactionHook() stub for macOS

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* fix: address PR review comments

- Add missing <cstring> include for memset
- Fix comment to match int return type (0 = not handled, non-zero = handled)

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* fix: close query-only sigaction loop and fix debug-mode SIGABRT

Extend sigaction interception to cover query-only calls
(sigaction(SIGSEGV/SIGBUS, nullptr, &oldact)): return the saved JVM
handler instead of ours, so callers that store oldact and later chain
to it don't loop back into our handler.

Fix intermittent SIGABRT in debug builds on aarch64 GraalVM: extend
crashProtectionActive() with a VMThread::isExceptionActive() fallback
so the cast_to() assert no longer fires for threads without ProfiledThread
TLS when setjmp crash protection is already active in walkVM.

Add OS::resetSignalHandlersForTesting() to prevent static state from
leaking between sigaction interception unit tests.

Add ASCII flow diagram to sigaction_hook documenting the full handler
chain and interception cases.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>

---------

Co-authored-by: Claude Opus 4.5 <[email protected]>
(cherry picked from commit 312fc25)
@jbachorik
Copy link
Copy Markdown
Collaborator Author

Backported to release/1.40._ via #447

jbachorik added a commit that referenced this pull request Mar 27, 2026
… (#447)

* fix(sigaction): prevent infinite loop in signal handler chaining

When intercepting sigaction calls from other libraries (e.g., wasmtime),
we were returning our handler as oldact. This caused infinite loops:
  profiler -> wasmtime -> profiler -> wasmtime -> ...

Fix: Save original (JVM's) handlers in protectSignalHandlers() BEFORE
installing ours, then return those saved handlers as oldact. Now the
chain is: profiler -> wasmtime -> JVM

Also:
- Add sigaction_interception_ut.cpp test to catch this bug
- Wire gtest to run before Java tests in testdebug
- Remove broken test_tlsPriming.cpp (referenced removed APIs)
- Add getSigactionHook() stub for macOS



* fix: address PR review comments

- Add missing <cstring> include for memset
- Fix comment to match int return type (0 = not handled, non-zero = handled)



* fix: close query-only sigaction loop and fix debug-mode SIGABRT

Extend sigaction interception to cover query-only calls
(sigaction(SIGSEGV/SIGBUS, nullptr, &oldact)): return the saved JVM
handler instead of ours, so callers that store oldact and later chain
to it don't loop back into our handler.

Fix intermittent SIGABRT in debug builds on aarch64 GraalVM: extend
crashProtectionActive() with a VMThread::isExceptionActive() fallback
so the cast_to() assert no longer fires for threads without ProfiledThread
TLS when setjmp crash protection is already active in walkVM.

Add OS::resetSignalHandlersForTesting() to prevent static state from
leaking between sigaction interception unit tests.

Add ASCII flow diagram to sigaction_hook documenting the full handler
chain and interception cases.



---------


(cherry picked from commit 312fc25)

Co-authored-by: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants