Skip to content

[WIP] Add events for critical path analysis#440

Draft
kaahos wants to merge 3 commits intomainfrom
paul.fournillon/critical_path
Draft

[WIP] Add events for critical path analysis#440
kaahos wants to merge 3 commits intomainfrom
paul.fournillon/critical_path

Conversation

@kaahos
Copy link
Copy Markdown
Contributor

@kaahos kaahos commented Mar 25, 2026

What does this PR do?:

Adds the data needed to reconstruct the span dependency graph to perform critical path analysis.

Changes:

  • Add parentSpanId and real duration to the existing datadog.Endpoint JFR event, enabling reconstruction of the span dependency graph from JFR recordings
  • Add new datadog.SpanNode event that records per-span DAG nodes with spanId, parentSpanId, rootSpanId, timing, and encoded operation/resource names
  • Add new datadog.TaskBlock event that records blocking intervals (start/end ticks, blocker identity, which span was blocked, and which span unblocked it)

Motivation:

The profiler links wall-clock and CPU samples to spans via spanId, showing where time is spent within each task. This answers which code is hot, but not which code contributes to request latency. Correctly attributing latency to the right spans requires knowing the dependency structure, which then allows identifying the critical path through it.

Critical path analysis requires two things: the span dependency DAG (parent-child edges between spans) and accurate span durations. With both, it is possible to reconstruct the critical path per request, attribute latency to the spans that actually determine it, and predict which operations would yield the most latency improvement if optimized.

Additional Notes:

  • Old recordTraceRoot overloads (without parentSpanId/startTicks) are marked @deprecated but remain functional for backward compatibility. They now delegate to the new native method with zeroed-out values, so existing dd-trace-java callers continue to work until they adopt the new signature.

How to test the change?:

  • ./gradlew buildDebug: compiles cleanly
  • ./gradlew testDebug: existing tests pass (deprecated overloads are backward-compatible)
  • Verify JFR output contains parentSpanId and non-zero duration in datadog.Endpoint events
  • Verify datadog.SpanNode and datadog.TaskBlock events appear when called from a test tracer
  • End-to-end: run with dd-trace-java (the critical_path branch), parse JFR, reconstruct span DAG and verify parent-child edges

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: [JIRA-XXXX]

Unsure? Have a question? Request a review!

@dd-octo-sts
Copy link
Copy Markdown

dd-octo-sts bot commented Mar 25, 2026

CI Test Results

Run: #23557709532 | Commit: 0f27579 | Duration: 12m 5s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-03-25 18:49:54 UTC

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 25, 2026

Integration Tests

All 40 integration tests passed

📊 Dashboard · 👷 Pipeline · 📦 63b44a5f

@kaahos kaahos changed the title Add events for critical path analysis [WIP] Add events for critical path analysis Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant