Skip to content

fix(provider): skip empty reasoning_content to preserve KV cache hits#28352

Open
nilo85 wants to merge 2 commits into
anomalyco:devfrom
nilo85:feature/fix-reasoning-content-cache
Open

fix(provider): skip empty reasoning_content to preserve KV cache hits#28352
nilo85 wants to merge 2 commits into
anomalyco:devfrom
nilo85:feature/fix-reasoning-content-cache

Conversation

@nilo85
Copy link
Copy Markdown

@nilo85 nilo85 commented May 19, 2026

Issue for this PR

Closes #19081

Type of change

  • Bug fix

What does this PR do?

Historical assistant messages without reasoning were getting reasoning_content: "" forwarded to the LLM API. An empty string changes the token stream, breaking KV cache prefix matching — observed as 0% cache hits on 196K token prompts in production captures.

The interleaved reasoning block in normalizeMessages always set providerOptions.openaiCompatible[field] to reasoningText even when empty. The fix adds a guard so the field is only set when there's actual reasoning text. Messages without reasoning now keep the field absent, preserving KV cache prefix matching.

How did you verify your code works?

  • Unit tests added (226 pass) covering empty reasoning and no-reasoning cases
  • Direct API test against llama-swap: WITH reasoning_content → 97.2% cache hit, WITHOUT → 0% cache hit
  • Multi-turn KV cache test: Turn 1 (0%), Turn 2 (52.6%), Turn 3 (91.0%)

Screenshots / recordings

N/A — backend-only change, no UI.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

nilo85 added 2 commits May 19, 2026 15:56
Historical assistant messages without reasoning were getting
reasoning_content: "" forwarded to the LLM API, breaking KV cache
prefix matching (0% cache hits on 196K token prompts).

Only set the interleaved field when reasoningText is non-empty.
Adds two tests:
- empty reasoning part does not set reasoning_content field
- assistant without reasoning parts keeps message unchanged
@github-actions github-actions Bot added needs:compliance This means the issue will auto-close after 2 hours. needs:issue labels May 19, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@nilo85 nilo85 marked this pull request as ready for review May 19, 2026 14:24
@github-actions github-actions Bot removed needs:compliance This means the issue will auto-close after 2 hours. needs:issue labels May 19, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Potential Duplicate Found:

PR #28346: fix(llm): forward reasoning_content in experimental OpenAI Chat assistant messages
#28346

Why it's related: This PR directly addresses the inverse scenario of the current PR. While #28352 fixes the issue of empty reasoning_content breaking cache by skipping it, #28346 involves forwarding reasoning_content in assistant messages. These PRs are closely related and likely touch the same code paths in message normalization and provider handling. Verify these aren't addressing conflicting concerns or if one supersedes the other.

@nilo85
Copy link
Copy Markdown
Author

nilo85 commented May 19, 2026

image

Both fixes should be relevant as they cover same issue but different paths to get there

nilo85 added a commit to nilo85/opencode that referenced this pull request May 19, 2026
Add detection for when messages sent to the LLM differ between turns,
which breaks KV cache prefix matching. Logs a warning with a formatted
diff showing what changed and why.

Catches issues like empty reasoning_content being added to messages
that previously had none (PR anomalyco#28352), as well as any other message
mutations during transformation.
nilo85 added a commit to nilo85/opencode that referenced this pull request May 19, 2026
Add detection for when messages sent to the LLM differ between turns,
which breaks KV cache prefix matching. Logs a warning with a formatted
diff showing what changed and why.

Catches issues like empty reasoning_content being added to messages
that previously had none (PR anomalyco#28352), as well as any other message
mutations during transformation.
nilo85 added a commit to nilo85/opencode that referenced this pull request May 19, 2026
Add detection for when messages sent to the LLM differ between turns,
which breaks KV cache prefix matching. Logs a warning with a formatted
diff showing what changed and why.

Catches issues like empty reasoning_content being added to messages
that previously had none (PR anomalyco#28352), as well as any other message
mutations during transformation.
nilo85 added a commit to nilo85/opencode that referenced this pull request May 19, 2026
Add detection for when messages sent to the LLM differ between turns,
which breaks KV cache prefix matching. Logs a warning with a formatted
diff showing what changed and why.

Catches issues like empty reasoning_content being added to messages
that previously had none (PR anomalyco#28352), as well as any other message
mutations during transformation.
nilo85 added a commit to nilo85/opencode that referenced this pull request May 19, 2026
Add detection for when messages sent to the LLM differ between turns,
which breaks KV cache prefix matching. Logs a warning with a formatted
diff showing what changed and why.

Catches issues like empty reasoning_content being added to messages
that previously had none (PR anomalyco#28352), as well as any other message
mutations during transformation.
nilo85 added a commit to nilo85/opencode that referenced this pull request May 19, 2026
Add detection for when messages sent to the LLM differ between turns,
which breaks KV cache prefix matching. Logs a warning with a formatted
diff showing what changed and why.

Catches issues like empty reasoning_content being added to messages
that previously had none (PR anomalyco#28352), as well as any other message
mutations during transformation.
nilo85 added a commit to nilo85/opencode that referenced this pull request May 19, 2026
Add detection for when messages sent to the LLM differ between turns,
which breaks KV cache prefix matching. Logs a warning with a formatted
diff showing what changed and why.

Catches issues like empty reasoning_content being added to messages
that previously had none (PR anomalyco#28352), as well as any other message
mutations during transformation.
nilo85 added a commit to nilo85/opencode that referenced this pull request May 19, 2026
Add detection for when messages sent to the LLM differ between turns,
which breaks KV cache prefix matching. Logs a warning with a formatted
diff showing what changed and why.

Catches issues like empty reasoning_content being added to messages
that previously had none (PR anomalyco#28352), as well as any other message
mutations during transformation.
nilo85 added a commit to nilo85/opencode that referenced this pull request May 19, 2026
Add detection for when messages sent to the LLM differ between turns,
which breaks KV cache prefix matching. Logs a warning with a formatted
diff showing what changed and why.

Catches issues like empty reasoning_content being added to messages
that previously had none (PR anomalyco#28352), as well as any other message
mutations during transformation.
nilo85 added a commit to nilo85/opencode that referenced this pull request May 19, 2026
Add detection for when messages sent to the LLM differ between turns,
which breaks KV cache prefix matching. Logs a warning with a formatted
diff showing what changed and why.

Catches issues like empty reasoning_content being added to messages
that previously had none (PR anomalyco#28352), as well as any other message
mutations during transformation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reasoning_content stripped from assistant messages on replay, causing KV cache invalidation on local inference

1 participant