fix: rune-safe truncation and dead-code cleanup#2920
Conversation
min(max(termWidth-80, 10), 10) always evaluated to 10, making the max() dead and the bar a fixed width regardless of terminal size. Raise the upper clamp so the bar scales with the terminal again.
Slicing the task name by byte index could split a multi-byte UTF-8 sequence and emit a replacement glyph. Truncate on rune boundaries like the rest of the codebase.
Byte-index slicing could split a multi-byte UTF-8 character in non-ASCII file names. Truncate on rune boundaries.
Session titles are user/LLM-generated and may contain non-ASCII characters; byte-index slicing could split a multi-byte UTF-8 sequence. Truncate on rune boundaries.
Byte-index slicing could split a multi-byte UTF-8 character in non-ASCII directory names. Truncate on rune boundaries.
Byte-index slicing could split a multi-byte UTF-8 character in non-ASCII tab titles. Truncate on rune boundaries.
A search query containing non-ASCII characters could be sliced mid-rune by the byte-index truncation. Truncate on rune boundaries.
OpenAPI descriptions can contain non-ASCII text; slicing at byte 200 could split a multi-byte UTF-8 sequence. Truncate on rune boundaries.
In runAgentLoop the default-agent selection had identical then/else bodies, making the len(Agents)==1 check meaningless. Collapse to the single assignment.
truncateOutput is documented to cap at maxLen characters but sliced by byte, which could split a multi-byte UTF-8 sequence in tool output. Truncate on rune boundaries.
The grep preview window was sliced at byte offsets (matchStart-20 / matchEnd+20), which could split a multi-byte UTF-8 sequence and emit a replacement glyph. Snap the window to rune boundaries.
The semantic-embeddings input was truncated at a fixed byte offset, which could cut a multi-byte UTF-8 sequence in half and feed invalid bytes (encoded as U+FFFD) to the embedding API. Clamp to valid UTF-8 while preserving the byte budget.
aheritier
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The rune-safe truncation fixes are correct and consistent with the project's existing conventions. The dead-code removal in runAgentLoop is safe — both branches of the old if/else were identical. All high/medium findings were pre-existing and not introduced or worsened by this PR.
One low-severity edge case noted inline.
| } | ||
| for end < len(preview) && !utf8.RuneStart(preview[end]) { | ||
| end++ | ||
| } |
There was a problem hiding this comment.
[LOW] End rune-boundary snap may leave an incomplete UTF-8 sequence at the tail
The end-snapping loop advances forward while the byte is a UTF-8 continuation byte, stopping when end == len(preview). If matchEnd+20 lands exactly mid-rune and the rest of the sequence is beyond the slice boundary, the loop exits at len(preview) and preview[start:end] can still end with an incomplete multi-byte sequence.
for end < len(preview) && !utf8.RuneStart(preview[end]) {
end++
}The start side correctly walks backward to find a rune boundary, but the end side can only walk forward up to len(preview) — it has no way to detect that it stopped mid-rune. Snapping end backward (like start) would be safer:
for end > start && !utf8.RuneStart(preview[end-1]) {
// actually: snap end backward to last rune boundary
}Or simply use string([]rune(preview[start:end])) to drop any trailing incomplete sequence.
Impact is low — this only occurs if the match region falls at the very end of a multi-byte character in a long line.
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
PR: fix: rune-safe truncation and dead-code cleanup
This PR correctly fixes byte-level string truncation across multiple files by converting to []rune slices before slicing, preventing splits of multi-byte UTF-8 sequences. The dead-code removal in runAgentLoop (collapsing an if/else with identical branches) is also a clean, correct simplification.
Verification summary:
strings.ToValidUTF8(s[:maxLen], "")insemantic_embeddings.go: ✅ Correct — dropping the partial rune at a hard byte-boundary is the only safe option; the resulting string is valid UTF-8 and within budget.[]runetruncation inworking_dir_picker.go,file_picker.go,session_browser.go,tabbar/tab.go,searchfilescontent.go,openapi.go,runtime_wasm.go: ✅ All correctly fixed.barWidthcap raised from 10→30 inprogress.go: ✅ Correct — the old cap of 10 meant the bar was always clamped to 10 chars regardless of terminal width.- UTF-8 boundary snapping in
filesystem.gopreview: ✅ Correct use ofutf8.RuneStartto expand/contract to valid boundaries. - Dead-code removal (
agentNameif/else with identical branches): ✅ No behavior change.
No bugs found in the changed code. All changes are improvements.
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
This PR makes three well-scoped categories of fixes:
-
Dead-code removal (
runtime_wasm.go): The redundantif/elsewith identical branches is correctly collapsed. The existing nil/empty-slice panic potential onrt.cfg.Agents[0]is pre-existing and not introduced here. -
Progress-bar clamp fix (
evaluation/progress.go): Changing the upper bound from10to30inmin(max(termWidth-80, 10), 30)correctly restores dynamic scaling. The formula is sound — minimum bar width remains 10 (small terminals), maximum is 30 (wide terminals). -
Rune-safe truncation (9 sites): All conversions from byte-index slicing to
[]runeslicing are correct. Thestrings.ToValidUTF8(s[:n], "")pattern in the RAG embedding path correctly strips incomplete trailing sequences. Theutf8.RuneStartboundary-snapping infilesystem.gois safe — no out-of-bounds access is possible.
Minor observation (pkg/tools/builtin/filesystem/filesystem.go, line 1205): The end-adjustment loop (for end < len(preview) && !utf8.RuneStart(preview[end]) { end++ }) can expand the preview window slightly past matchEnd+20 when continuation bytes happen to fall exactly at the cut-point. This is harmless (no panic, no data corruption), and the alternative would be to shrink instead of expand — which would also work but could leave a dangling continuation byte in the other direction. Worth knowing but not a blocking issue.
All changes are consistent with existing project conventions and the PR description is accurate.
Summary
A focused pass of small, high-confidence bug fixes found while reviewing the codebase. Two categories:
Logic / dead code
evaluation: progress bar width was always10—min(max(termWidth-80, 10), 10)made themaxdead, so the bar never scaled with the terminal. Restored a sensible upper clamp.wasm: removed anif/elsewith identical branches inrunAgentLoop's default-agent selection (thelen(Agents)==1check was meaningless).UTF-8 truncation (byte-index slicing could split a multi-byte rune →
�/ invalid bytes)The project already standardizes on rune-aware truncation (
[]rune,ansi.Truncate,truncateRunes), but several spots still sliced by byte. Fixed in:evaluationrunning-task namewasmtool-output previewfilesystemgrep search-result preview window (snap to rune boundaries)ragsemantic-embedding input — could feed invalid UTF-8 (encoded as U+FFFD) to the embedding API; now clamped to valid UTF-8 while preserving the byte budgetValidation
go build ./...andGOOS=js GOARCH=wasm go build ./cmd/wasm/go vet+golangci-lint: clean on all touched packagesAll changes are behavior-preserving for ASCII input and consistent with existing conventions. Each fix is an individual commit.