Skip to content

fix: rune-safe truncation and dead-code cleanup#2920

Merged
dgageot merged 12 commits into
docker:mainfrom
dgageot:board/fix-issues
May 29, 2026
Merged

fix: rune-safe truncation and dead-code cleanup#2920
dgageot merged 12 commits into
docker:mainfrom
dgageot:board/fix-issues

Conversation

@dgageot

@dgageot dgageot commented May 29, 2026

Copy link
Copy Markdown
Member

Summary

A focused pass of small, high-confidence bug fixes found while reviewing the codebase. Two categories:

Logic / dead code

  • evaluation: progress bar width was always 10min(max(termWidth-80, 10), 10) made the max dead, so the bar never scaled with the terminal. Restored a sensible upper clamp.
  • wasm: removed an if/else with identical branches in runAgentLoop's default-agent selection (the len(Agents)==1 check was meaningless).

UTF-8 truncation (byte-index slicing could split a multi-byte rune → / invalid bytes)

The project already standardizes on rune-aware truncation ([]rune, ansi.Truncate, truncateRunes), but several spots still sliced by byte. Fixed in:

  • evaluation running-task name
  • TUI file picker, session browser, working-dir picker, tab titles, search-query preview
  • OpenAPI operation descriptions
  • wasm tool-output preview
  • filesystem grep search-result preview window (snap to rune boundaries)
  • rag semantic-embedding input — could feed invalid UTF-8 (encoded as U+FFFD) to the embedding API; now clamped to valid UTF-8 while preserving the byte budget

Validation

  • go build ./... and GOOS=js GOARCH=wasm go build ./cmd/wasm/
  • go vet + golangci-lint: clean on all touched packages
  • Tests pass for every touched package

All changes are behavior-preserving for ASCII input and consistent with existing conventions. Each fix is an individual commit.

dgageot added 12 commits May 29, 2026 07:18
min(max(termWidth-80, 10), 10) always evaluated to 10, making the
max() dead and the bar a fixed width regardless of terminal size.
Raise the upper clamp so the bar scales with the terminal again.
Slicing the task name by byte index could split a multi-byte UTF-8
sequence and emit a replacement glyph. Truncate on rune boundaries
like the rest of the codebase.
Byte-index slicing could split a multi-byte UTF-8 character in
non-ASCII file names. Truncate on rune boundaries.
Session titles are user/LLM-generated and may contain non-ASCII
characters; byte-index slicing could split a multi-byte UTF-8
sequence. Truncate on rune boundaries.
Byte-index slicing could split a multi-byte UTF-8 character in
non-ASCII directory names. Truncate on rune boundaries.
Byte-index slicing could split a multi-byte UTF-8 character in
non-ASCII tab titles. Truncate on rune boundaries.
A search query containing non-ASCII characters could be sliced
mid-rune by the byte-index truncation. Truncate on rune boundaries.
OpenAPI descriptions can contain non-ASCII text; slicing at byte 200
could split a multi-byte UTF-8 sequence. Truncate on rune boundaries.
In runAgentLoop the default-agent selection had identical then/else
bodies, making the len(Agents)==1 check meaningless. Collapse to the
single assignment.
truncateOutput is documented to cap at maxLen characters but sliced by
byte, which could split a multi-byte UTF-8 sequence in tool output.
Truncate on rune boundaries.
The grep preview window was sliced at byte offsets (matchStart-20 /
matchEnd+20), which could split a multi-byte UTF-8 sequence and emit a
replacement glyph. Snap the window to rune boundaries.
The semantic-embeddings input was truncated at a fixed byte offset,
which could cut a multi-byte UTF-8 sequence in half and feed invalid
bytes (encoded as U+FFFD) to the embedding API. Clamp to valid UTF-8
while preserving the byte budget.
@dgageot dgageot requested a review from a team as a code owner May 29, 2026 05:27

@aheritier aheritier left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

The rune-safe truncation fixes are correct and consistent with the project's existing conventions. The dead-code removal in runAgentLoop is safe — both branches of the old if/else were identical. All high/medium findings were pre-existing and not introduced or worsened by this PR.

One low-severity edge case noted inline.

}
for end < len(preview) && !utf8.RuneStart(preview[end]) {
end++
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] End rune-boundary snap may leave an incomplete UTF-8 sequence at the tail

The end-snapping loop advances forward while the byte is a UTF-8 continuation byte, stopping when end == len(preview). If matchEnd+20 lands exactly mid-rune and the rest of the sequence is beyond the slice boundary, the loop exits at len(preview) and preview[start:end] can still end with an incomplete multi-byte sequence.

for end < len(preview) && !utf8.RuneStart(preview[end]) {
    end++
}

The start side correctly walks backward to find a rune boundary, but the end side can only walk forward up to len(preview) — it has no way to detect that it stopped mid-rune. Snapping end backward (like start) would be safer:

for end > start && !utf8.RuneStart(preview[end-1]) {
    // actually: snap end backward to last rune boundary
}

Or simply use string([]rune(preview[start:end])) to drop any trailing incomplete sequence.

Impact is low — this only occurs if the match region falls at the very end of a multi-byte character in a long line.

@docker-agent docker-agent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

PR: fix: rune-safe truncation and dead-code cleanup

This PR correctly fixes byte-level string truncation across multiple files by converting to []rune slices before slicing, preventing splits of multi-byte UTF-8 sequences. The dead-code removal in runAgentLoop (collapsing an if/else with identical branches) is also a clean, correct simplification.

Verification summary:

  • strings.ToValidUTF8(s[:maxLen], "") in semantic_embeddings.go: ✅ Correct — dropping the partial rune at a hard byte-boundary is the only safe option; the resulting string is valid UTF-8 and within budget.
  • []rune truncation in working_dir_picker.go, file_picker.go, session_browser.go, tabbar/tab.go, searchfilescontent.go, openapi.go, runtime_wasm.go: ✅ All correctly fixed.
  • barWidth cap raised from 10→30 in progress.go: ✅ Correct — the old cap of 10 meant the bar was always clamped to 10 chars regardless of terminal width.
  • UTF-8 boundary snapping in filesystem.go preview: ✅ Correct use of utf8.RuneStart to expand/contract to valid boundaries.
  • Dead-code removal (agentName if/else with identical branches): ✅ No behavior change.

No bugs found in the changed code. All changes are improvements.

@docker-agent docker-agent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

This PR makes three well-scoped categories of fixes:

  1. Dead-code removal (runtime_wasm.go): The redundant if/else with identical branches is correctly collapsed. The existing nil/empty-slice panic potential on rt.cfg.Agents[0] is pre-existing and not introduced here.

  2. Progress-bar clamp fix (evaluation/progress.go): Changing the upper bound from 10 to 30 in min(max(termWidth-80, 10), 30) correctly restores dynamic scaling. The formula is sound — minimum bar width remains 10 (small terminals), maximum is 30 (wide terminals).

  3. Rune-safe truncation (9 sites): All conversions from byte-index slicing to []rune slicing are correct. The strings.ToValidUTF8(s[:n], "") pattern in the RAG embedding path correctly strips incomplete trailing sequences. The utf8.RuneStart boundary-snapping in filesystem.go is safe — no out-of-bounds access is possible.

Minor observation (pkg/tools/builtin/filesystem/filesystem.go, line 1205): The end-adjustment loop (for end < len(preview) && !utf8.RuneStart(preview[end]) { end++ }) can expand the preview window slightly past matchEnd+20 when continuation bytes happen to fall exactly at the cut-point. This is harmless (no panic, no data corruption), and the alternative would be to shrink instead of expand — which would also work but could leave a dangling continuation byte in the other direction. Worth knowing but not a blocking issue.

All changes are consistent with existing project conventions and the PR description is accurate.

@aheritier aheritier added area/rag For work/issues that have to do with the RAG features area/tools For features/issues/fixes related to the usage of built-in and MCP tools area/tui For features/issues/fixes related to the TUI kind/fix PR fixes a bug (maps to fix: commit prefix) labels May 29, 2026
@dgageot dgageot merged commit 1eb93d4 into docker:main May 29, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/rag For work/issues that have to do with the RAG features area/tools For features/issues/fixes related to the usage of built-in and MCP tools area/tui For features/issues/fixes related to the TUI kind/fix PR fixes a bug (maps to fix: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants