Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(opentelemetry): Stop looking at propagation context for span creation #14481

Merged
merged 5 commits into from
Dec 3, 2024

Conversation

mydea
Copy link
Member

@mydea mydea commented Nov 26, 2024

This PR changes the behavior of the OTEL-based Node SDK to ignore the propagation context when starting spans.

Previously, when you called startSpan and there was no incoming trace, we would ensure that the new span has the trace ID + span ID from the propagation context.

This has a few problems:

  1. Multiple parallel root spans will continue the same virtual trace, instead of having separate traces.
  2. This is really invalid in OTEL, as we have to provide a span ID and cannot really tell it to use a specific trace ID out of the box. Because of this, we had to add a bunch of special handling to ensure we can differentiate real and fake parent span IDs properly.

This PR fixes this by simply not looking at this anymore. For TWP and error marking the propagation context is still used as before, only for new spans is there a difference.

I also added docs explaining how trace propagation in node works now:

node-sdk-trace-propagation-3

@mydea mydea self-assigned this Nov 26, 2024
Copy link
Contributor

github-actions bot commented Nov 26, 2024

size-limit report 📦

Path Size % Change Change
@sentry/browser 23.12 KB +0.12% +27 B 🔺
@sentry/browser - with treeshaking flags 21.85 KB +0.07% +15 B 🔺
@sentry/browser (incl. Tracing) 35.51 KB +0.03% +8 B 🔺
@sentry/browser (incl. Tracing, Replay) 72.4 KB +0.02% +10 B 🔺
@sentry/browser (incl. Tracing, Replay) - with treeshaking flags 62.88 KB +0.02% +11 B 🔺
@sentry/browser (incl. Tracing, Replay with Canvas) 76.71 KB +0.02% +9 B 🔺
@sentry/browser (incl. Tracing, Replay, Feedback) 89.17 KB +0.01% +8 B 🔺
@sentry/browser (incl. Feedback) 39.86 KB +0.05% +17 B 🔺
@sentry/browser (incl. sendFeedback) 27.74 KB +0.07% +19 B 🔺
@sentry/browser (incl. FeedbackAsync) 32.55 KB +0.07% +20 B 🔺
@sentry/react 25.81 KB +0.06% +15 B 🔺
@sentry/react (incl. Tracing) 38.41 KB +0.02% +6 B 🔺
@sentry/vue 27.26 KB +0.05% +12 B 🔺
@sentry/vue (incl. Tracing) 37.31 KB +0.03% +10 B 🔺
@sentry/svelte 23.27 KB +0.08% +19 B 🔺
CDN Bundle 24.32 KB +0.02% +3 B 🔺
CDN Bundle (incl. Tracing) 37.21 KB +0.04% +13 B 🔺
CDN Bundle (incl. Tracing, Replay) 72.09 KB +0.02% +14 B 🔺
CDN Bundle (incl. Tracing, Replay, Feedback) 77.43 KB +0.02% +15 B 🔺
CDN Bundle - uncompressed 71.45 KB +0.01% +2 B 🔺
CDN Bundle (incl. Tracing) - uncompressed 110.48 KB -0.03% -26 B 🔽
CDN Bundle (incl. Tracing, Replay) - uncompressed 223.55 KB -0.02% -26 B 🔽
CDN Bundle (incl. Tracing, Replay, Feedback) - uncompressed 236.77 KB -0.02% -26 B 🔽
@sentry/nextjs (client) 38.72 KB +0.02% +7 B 🔺
@sentry/sveltekit (client) 36.06 KB +0.02% +5 B 🔺
@sentry/node 134.83 KB -0.19% -260 B 🔽
@sentry/node - without tracing 96.84 KB -0.31% -300 B 🔽
@sentry/aws-serverless 109.16 KB -0.25% -275 B 🔽

View base workflow run

Copy link

codecov bot commented Nov 26, 2024

❌ 6 Tests Failed:

Tests completed Failed Passed Skipped
657 6 651 31
View the top 3 failed tests by shortest run time
errors.test.ts Sends graphql exception to Sentry
Stack Traces | 0.05s run time
errors.test.ts:74:5 Sends graphql exception to Sentry
errors.test.ts Sends unexpected exception to Sentry if thrown in module that was registered before Sentry
Stack Traces | 0.102s run time
errors.test.ts:76:5 Sends unexpected exception to Sentry if thrown in module that was registered before Sentry
errors.test.ts Sends unexpected exception to Sentry if thrown in module with local filter
Stack Traces | 0.149s run time
errors.test.ts:40:5 Sends unexpected exception to Sentry if thrown in module with local filter

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

@mydea mydea force-pushed the fn/ignorePropagationContextOtelSpans branch from 9e88097 to 4b096ed Compare November 26, 2024 14:57
mydea added a commit that referenced this pull request Nov 27, 2024
Noticed this while working on
#14481.

The way we try-catched the astro
server request code lead to the http.server span not being attached to
servers correctly - we had a try-catch block _outside_ of the
`startSpan` call, where we sent caught errors to sentry. but any error
caught this way would not have an active span (because by the time the
`catch` part triggers, `startSpan` is over), and thus the http.server
span would not be attached to the error. By moving this try-catch inside
of the `startSpan` call, we can correctly assign the span to errors. I
also tried to add some tests to this - there is still a problem in there
which the tests show, which I'll look at afterwards (and/or they may get
fixed by #14481)
@mydea mydea force-pushed the fn/ignorePropagationContextOtelSpans branch from 4b096ed to 0362bd0 Compare November 28, 2024 09:53
// Also ensure sampling decision is correctly inferred
// In core, we use `spanIsSampled`, which just looks at the trace flags
// but in OTEL, we use a slightly more complex logic to be able to differntiate between unsampled and deferred sampling
if (hasTracingEnabled()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kind of not exactly what the name implies this does, but I think it's OK to add this here too. Otherwise, we'll have to export another, new method here and use this in the Node SDK initOtel, which seems not really worth it here 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename hasTracingEnabled anyway, which we can do during v9

@mydea mydea force-pushed the fn/ignorePropagationContextOtelSpans branch 4 times, most recently from fee464e to dae27c9 Compare November 29, 2024 08:57
@@ -26,6 +26,7 @@ test('Sends exception to Sentry', async ({ baseURL }) => {
expect(errorEvent.contexts?.trace).toEqual({
trace_id: expect.stringMatching(/[a-f0-9]{32}/),
span_id: expect.stringMatching(/[a-f0-9]{16}/),
parent_span_id: expect.stringMatching(/[a-f0-9]{16}/),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not really know why this was not here before, but IMHO it was incorrect? These errors (same for the other nest E2E tests), as you'll usually have the http.server span and then inside it some e.g. route handler span or so, so the active span at the time of the error should usually have a parent_span_id. So I'd say this fixes incorrect (?) behavior 🤔

// But our default node-fetch spans are not emitted
expect(scopeSpans.length).toEqual(2);
expect(scopeSpans.length).toEqual(3);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also incorrect before, spans from startSpan() did not end up here, although they should. This was probably because of incorrect propagation in this scenario.

@@ -26,6 +26,6 @@ describe('awsIntegration', () => {
});

test('should auto-instrument aws-sdk v2 package.', done => {
createRunner(__dirname, 'scenario.js').expect({ transaction: EXPECTED_TRANSCATION }).start(done);
createRunner(__dirname, 'scenario.js').ignore('event').expect({ transaction: EXPECTED_TRANSCATION }).start(done);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated, but saw this flaking every now and then so decided to just ignore events here.

@mydea mydea force-pushed the fn/ignorePropagationContextOtelSpans branch from b7f5861 to c0e8566 Compare November 29, 2024 12:11
@mydea mydea marked this pull request as ready for review November 29, 2024 12:11
@mydea mydea force-pushed the fn/ignorePropagationContextOtelSpans branch from a2cbf94 to dee24ea Compare November 29, 2024 13:18
@mydea mydea force-pushed the fn/ignorePropagationContextOtelSpans branch from dee24ea to f61a993 Compare December 2, 2024 08:05
// Also ensure sampling decision is correctly inferred
// In core, we use `spanIsSampled`, which just looks at the trace flags
// but in OTEL, we use a slightly more complex logic to be able to differntiate between unsampled and deferred sampling
if (hasTracingEnabled()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename hasTracingEnabled anyway, which we can do during v9

@mydea mydea merged commit c8e81d5 into develop Dec 3, 2024
153 checks passed
@mydea mydea deleted the fn/ignorePropagationContextOtelSpans branch December 3, 2024 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants