Enforce single-line records in sweep state CSV writers#2684
Merged
Conversation
The sweep state files use merge=union so parallel agents can append rows. Union merge is line-based, but the writers used csv.DictWriter with QUOTE_MINIMAL, which writes a newline-containing note as a quoted field spanning multiple physical lines. Concurrent appends then split those records on their internal newlines, producing phantom rows and duplicate headers (repaired once already in #2679). Add an _oneline() sanitizer to every writer (six sweep-*.md commands plus the deep-sweep reset path) that collapses embedded CR/LF to " | " right before writerow, so each record is always one physical line regardless of note content. This enforces the invariant in code instead of relying on the unenforced "single-line notes" placeholder hint.
brendancol
commented
May 29, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Enforce single-line records in sweep state CSV writers
This is a tooling change to instruction templates in .claude/commands/, so the spatial-domain checks (algorithms, backend parity, dask/cupy, NaN, benchmarks, README matrix) do not apply. The review covers the sanitizer logic and the writer change.
Blockers
None.
Suggestions
None blocking.
Nits
- The
"notes": "<single-line notes ...>"placeholder hints are now redundant, since_oneline()enforces single-line output regardless of what the agent writes. Not worth a churny edit across files, but a future pass could simplify those hints to just describe the content and drop the "replace newlines" instruction. .claude/commands/sweep-style.mdis untracked and not on this branch, so it keeps the old unguarded writer. The PR body already flags this. It needs the same_oneline()change when it is committed.
What looks good
- The replace order (
\r\nbefore\r/\n) avoids double-expanding a CRLF into" | | ". Verified. - Applying the sanitizer to every field via the dict comprehension, rather than just
notes, is cheap defense in depth. - Nice side effect: because existing rows are read back through
DictReaderand rewritten through_oneline(), the next write self-heals any rows that were already corrupted with embedded newlines. deep-sweep's reset path gets the same guard, so a reset also normalizes.
Verification performed
- A note containing
\n,\r\n, and\rwrites to one physical line and round-trips throughDictReader. - Control: the pre-fix writer produced 4 physical lines for the same note.
- A simulated
merge=unionof two concurrent single-line writes produced clean records with no phantom rows.
2 tasks
brendancol
added a commit
that referenced
this pull request
Jun 8, 2026
* Drop merge=union for sweep state CSVs (#2754) .gitattributes registered .claude/sweep-*-state.csv and .codex/sweep-*-state.csv with merge=union. These files are keyed by module (one row per module) with a single header, not an append-only log. The union driver concatenates both sides of a conflicting hunk instead of raising a conflict. When a branch was based on a stale copy, the merge silently produced duplicate header lines and duplicate module rows. It happened on main with the #2712 squash merge: the file went from 9 lines to 43. The earlier single-line-record fix (#2680/#2684) kept embedded newlines from splitting a record, but union still duplicated whole rows and still mixed formats across stale branches. Switch both paths to merge=text (git's built-in 3-way text driver) so a real divergence surfaces as a normal conflict to resolve by hand rather than corrupting the file silently. I considered -merge (always-conflict, take-one-side) but rejected it: in a git sandbox it silently dropped the other branch's rows, trading one silent-corruption mode for a silent-data-loss mode. Also updates the merge-driver references across the sweep command docs to match, and adds a git-sandbox regression test that reproduces the stale-branch merge and asserts it conflicts instead of unioning. The single-line-record writer stays (clean diffs, predictable conflicts); only its rationale comment changed to drop the removed driver name. * Address review: tighten merge regression test (#2754) The slope-row count assertion did not distinguish a conflict from a union (both leave two slope, lines), so it implied protection it did not provide. Replace it with a header-count assertion: union duplicates the header line, a conflict keeps one. Drop the unused tmp_path parameter from the gitattributes check, and note in _make_repo that merge=text is git's built-in driver rather than a custom one.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2680
What
Adds an
_oneline()sanitizer to every sweep state-CSV writer so each record is always one physical line, even when a note contains newlines.Why
The
.claude/sweep-*-state.csvfiles usemerge=unionso parallel sweep agents can append rows. Union merge is line-based, but the writers usedcsv.DictWriterwithQUOTE_MINIMAL, which is valid CSV yet writes a newline-containing note as a quoted field spanning several physical lines. Concurrent appends then split those records on their internal newlines, producing phantom rows and duplicate headers. PR #2679 repaired one such corruption by hand; this stops it recurring at the source.The previous
"single-line notes or empty"placeholder was a hint, not enforcement, so nothing prevented a multi-line note.How
Each writer (six
sweep-*.mdcommands plus thedeep-sweepreset path) now routes every field through_oneline(), which replaces CR/LF with" | "right beforewriterow.Verification
Simulated the patched writer and a union merge:
\n,\r\n, and\rwrites to exactly one physical line and round-trips throughcsv.DictReader.No pytest test is added: these are instruction templates in
.claude/commands/, not importable code, so there is no unit to import inxrspatial/tests/. Verification is the simulation above.Scope note
.claude/commands/sweep-style.mdis untracked in the working tree and not part of this branch, so it is not patched here. It needs the same_oneline()change when it is committed.