Skip to content

Fix Netty HTTP span lifecycle for chunked/streaming responses#10656

Closed
gtukmachev wants to merge 2 commits into
DataDog:masterfrom
gtukmachev:gsc-577-fix-ktor-streaming-instrumentation
Closed

Fix Netty HTTP span lifecycle for chunked/streaming responses#10656
gtukmachev wants to merge 2 commits into
DataDog:masterfrom
gtukmachev:gsc-577-fix-ktor-streaming-instrumentation

Conversation

@gtukmachev
Copy link
Copy Markdown

@gtukmachev gtukmachev commented Feb 22, 2026

Caution

This PR is closed and superseded. Please use the correct PR from the company fork:

#10734 — Fix Netty HTTP span lifecycle for chunked/streaming responses


Original description below for reference only.

What Does This Do

Fixes HttpServerResponseTracingHandler to correctly handle chunked/streaming HTTP responses. The span now covers the full response duration (from headers to LastHttpContent) instead of closing at header-send time.

@gtukmachev gtukmachev marked this pull request as ready for review February 22, 2026 17:55
@gtukmachev gtukmachev requested a review from a team as a code owner February 22, 2026 17:55
@gtukmachev
Copy link
Copy Markdown
Author

gtukmachev commented Feb 22, 2026

If needed, I may attach an archive with a testing Kotlin Gradle Ktor project and a script.

The fix was done by AI Agent "Claude Code" with my careful guidance.

@gtukmachev gtukmachev closed this Feb 22, 2026
@gtukmachev gtukmachev reopened this Feb 23, 2026
Copy link
Copy Markdown
Contributor

@ygree ygree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Could you please add tests? Also, I noticed there's a possibility of a span leak when LastHttpContent is not received during chunked/streaming responses, which needs to be addressed.

@gtukmachev
Copy link
Copy Markdown
Author

ℹ️ This PR has been superseded

This PR was originally created from my personal fork. I've since moved the fork under my company's managed GitHub account (GoodNotes) and recreated the PR from there.

Please follow the new PR to continue: #10734

@gtukmachev
Copy link
Copy Markdown
Author

@ygree

🔄 Re: tests and span leak

I've recreated this PR from our company fork (GoodNotes/dd-trace-java): #10734

Tests will be added there shortly. I'll also look into the span leak scenario you mentioned for cases when LastHttpContent is not received.

Please follow the new PR for further updates. Thank you for the review!

@gtukmachev gtukmachev closed this Mar 4, 2026
@gtukmachev gtukmachev reopened this May 13, 2026
@gtukmachev gtukmachev marked this pull request as draft May 13, 2026 20:57
@gtukmachev gtukmachev force-pushed the gsc-577-fix-ktor-streaming-instrumentation branch from 14b968a to bca88d1 Compare May 13, 2026 22:39
@gtukmachev
Copy link
Copy Markdown
Author

Hi @ygree, thanks for the review feedback! I've addressed both items:

Tests added — new JUnit 5 test class NettyChunkedResponseTest with 4 test cases that exercise the chunked response path (HttpResponse → HttpContent* → LastHttpContent) using a real Netty server. Tests are structured as red-green: they fail without the fix and pass with it.

Span leak fix — addressed the scenario you flagged where LastHttpContent is not received. channelInactive in HttpServerRequestTracingHandler now checks STREAMING_CONTEXT_KEY and finishes the span with an error when the channel closes mid-stream.

Additional fix — websocket upgrade responses going through handleHttpResponse (non-full) now finish the span immediately instead of deferring to LastHttpContent (which never arrives for websockets).

Verified end-to-end with a Ktor test server + real DataDog (14,680 requests): all streaming endpoints report correct durations, no near-zero outliers.

Ready for re-review.

@gtukmachev gtukmachev marked this pull request as ready for review May 13, 2026 22:45
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bca88d15ba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +158 to +159
ctx.channel().attr(STREAMING_CONTEXT_KEY).set(storedContext);
ctx.channel().attr(CONTEXT_ATTRIBUTE_KEY).remove();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Finish bodyless HTTP responses immediately

When a handler writes a non-FullHttpResponse for a final status that cannot have a body (for example 204, 205, or 304) and then closes the connection after the headers, there will be no subsequent LastHttpContent for this branch to observe. This stores the span in STREAMING_CONTEXT_KEY and removes the normal context, so the request span stays open until channelInactive and is then reported as an incomplete streaming response/error even though the response completed successfully. Please avoid deferring for content-always-empty statuses and finish them the same way as non-streaming responses.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixing this now. Will finish the span immediately for bodyless status codes (204, 205, 304) alongside the websocket upgrade check.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest push — bodyless responses (204, 205, 304) now finish the span immediately alongside websocket upgrades.

@gtukmachev gtukmachev force-pushed the gsc-577-fix-ktor-streaming-instrumentation branch from bca88d1 to 33d0201 Compare May 13, 2026 22:50
New JUnit 5 test class NettyChunkedResponseTest with a real Netty server
that writes chunked responses manually (HttpResponse + HttpContent* +
LastHttpContent), exercising the code path that HttpObjectAggregator-based
tests never reach.

Four test cases:
- chunkedResponseSpanIncludesFullStreamDuration: span covers full stream
  time (~1s for 5 chunks x 200ms), not just header-send time (~0ms)
- fullResponseStillFinishesSpanImmediately: FullHttpResponse regression
- keepAliveSequentialChunkedRequestsGetCorrectSpans: STREAMING_CONTEXT_KEY
  lifecycle across back-to-back keep-alive requests
- connectionDropDuringChunkedResponseFinishesSpan: span finished with
  error when client disconnects mid-stream
HttpServerResponseTracingHandler: route by message type (FullHttpResponse,
HttpResponse, LastHttpContent) instead of finishing every span on
HttpResponse. FullHttpResponse finishes immediately; HttpResponse defers
to LastHttpContent via STREAMING_CONTEXT_KEY to avoid keep-alive race.
WebSocket upgrades and bodyless responses (204, 205, 304) finish
immediately since they never produce LastHttpContent.

HttpServerRequestTracingHandler: channelInactive now checks
STREAMING_CONTEXT_KEY and finishes leaked spans when channel closes
mid-stream.

AttributeKeys: added STREAMING_CONTEXT_KEY for chunked response context.
@gtukmachev gtukmachev force-pushed the gsc-577-fix-ktor-streaming-instrumentation branch from 33d0201 to 77cc414 Compare May 13, 2026 22:57
@gtukmachev
Copy link
Copy Markdown
Author

@codex review

@gtukmachev gtukmachev requested a review from ygree May 13, 2026 23:02
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77cc414ea5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +162 to +163
ctx.channel().attr(STREAMING_CONTEXT_KEY).set(storedContext);
ctx.channel().attr(CONTEXT_ATTRIBUTE_KEY).remove();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Finish header-only responses without waiting for chunks

When a handler writes a header-only DefaultHttpResponse (for example a HEAD response or a Content-Length: 0 200) and then closes the channel rather than emitting LastHttpContent, this branch stores the span in STREAMING_CONTEXT_KEY and removes CONTEXT_ATTRIBUTE_KEY, so the successful response is not reported until channelInactive marks it as an error. The old path finished these non-full header-only responses at the header write; only responses that are actually followed by content should be deferred to LastHttpContent.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is not an issue in practice. Netty's HttpObjectEncoder emits LastHttpContent.EMPTY_LAST_CONTENT after both HEAD responses and Content-Length: 0 responses, so handleLastHttpContent will fire and finish the span correctly. The edge case of a handler closing the channel without emitting LastHttpContent is already covered by the channelInactive fix. The truly bodyless status codes (204, 205, 304) are already handled by the fix in the previous review cycle.

@gtukmachev
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77cc414ea5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +162 to +163
ctx.channel().attr(STREAMING_CONTEXT_KEY).set(storedContext);
ctx.channel().attr(CONTEXT_ATTRIBUTE_KEY).remove();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Finish spans for header-only responses

For a HEAD response or a raw DefaultHttpResponse with Content-Length: 0 that keeps the connection open, this branch moves the context to STREAMING_CONTEXT_KEY and removes CONTEXT_ATTRIBUTE_KEY, but no LastHttpContent is required to be written for a headers-only message. The span therefore never finishes until channelInactive (or is effectively leaked on long-lived keep-alive connections), whereas these HttpResponse messages were finished immediately before this change. Please treat HEAD/zero-length headers-only responses like the existing 204/205/304 immediate-finish path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point as the previous cycle — already addressed. Netty's HttpObjectEncoder.encode() always writes LastHttpContent.EMPTY_LAST_CONTENT after a non-full HttpResponse, regardless of Content-Length or request method. The handleLastHttpContent path will fire. This is standard Netty pipeline behavior — the codec is responsible for message framing, not application handlers. No change needed.

@gtukmachev
Copy link
Copy Markdown
Author

@ygree Could you please add the required labels? As an external contributor I don't have permission to assign them:

  • type: bug
  • inst: netty
  • tag: ai generated

Thanks!

@gtukmachev
Copy link
Copy Markdown
Author

Apologies for the confusion — this PR was from my personal fork and should have been closed when I moved to the company fork. The correct PR with the latest changes (tests, span leak fix, bodyless response handling) is:

#10734#10734

Please disregard this PR. Sorry for the noise!

@gtukmachev gtukmachev closed this May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants