Skip to content

Drop merge=union for sweep state CSVs (#2754)#3043

Merged
brendancol merged 3 commits into
mainfrom
issue-2754
Jun 8, 2026
Merged

Drop merge=union for sweep state CSVs (#2754)#3043
brendancol merged 3 commits into
mainfrom
issue-2754

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Summary

.gitattributes registered the sweep state CSVs (.claude/sweep-*-state.csv, .codex/sweep-*-state.csv) with merge=union. These files are keyed by module (one row per module, single header), not an append-only log. The union driver concatenates both sides of a conflicting hunk instead of conflicting, so a branch based on a stale copy merges into duplicate header lines and duplicate module rows. It happened on main with the #2712 squash merge (9 lines to 43). The earlier single-line-record fix (#2680/#2684) stopped embedded newlines from splitting records, but union still duplicated whole rows and still mixed formats across stale branches.

This is Option 1 from the issue: drop merge=union.

Changes

  • .gitattributes: switch both sweep-CSV globs to merge=text (git's built-in 3-way text driver), so a real divergence surfaces as a normal conflict to resolve by hand instead of silently unioning.
  • Update the merge-driver references across the seven sweep command docs and deep-sweep so they describe the new behavior. The single-line _oneline writer stays; only its rationale comment dropped the removed driver name.
  • Add xrspatial/tests/test_sweep_state_csv_merge_2754.py: a git-sandbox regression test that copies the repo .gitattributes, reproduces the stale-branch merge from the issue, and asserts the merge conflicts (with markers, no duplicated rows) rather than unioning. Verified red under merge=union and green under merge=text.

Tradeoff considered

I tested -merge (always conflict, take one side) as an alternative. In the same sandbox it silently dropped the other branch's rows, trading silent corruption for silent data loss. merge=text keeps the divergence visible as a real conflict, which is the point.

Backend coverage

Not applicable. This is a repo-tooling / git-attributes change with no xrspatial runtime surface.

Test plan

  • pytest xrspatial/tests/test_sweep_state_csv_merge_2754.py passes
  • Confirmed the test fails when merge=union is restored (red/green)

Closes #2754

.gitattributes registered .claude/sweep-*-state.csv and
.codex/sweep-*-state.csv with merge=union. These files are keyed by
module (one row per module) with a single header, not an append-only
log. The union driver concatenates both sides of a conflicting hunk
instead of raising a conflict. When a branch was based on a stale copy,
the merge silently produced duplicate header lines and duplicate module
rows. It happened on main with the #2712 squash merge: the file went
from 9 lines to 43.

The earlier single-line-record fix (#2680/#2684) kept embedded newlines
from splitting a record, but union still duplicated whole rows and still
mixed formats across stale branches.

Switch both paths to merge=text (git's built-in 3-way text driver) so a
real divergence surfaces as a normal conflict to resolve by hand rather
than corrupting the file silently. I considered -merge (always-conflict,
take-one-side) but rejected it: in a git sandbox it silently dropped the
other branch's rows, trading one silent-corruption mode for a
silent-data-loss mode.

Also updates the merge-driver references across the sweep command docs
to match, and adds a git-sandbox regression test that reproduces the
stale-branch merge and asserts it conflicts instead of unioning. The
single-line-record writer stays (clean diffs, predictable conflicts);
only its rationale comment changed to drop the removed driver name.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 8, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Drop merge=union for sweep state CSVs (#2754)

This is a repo-tooling / git-attributes change. There's no xrspatial runtime code in the diff, so the geospatial-correctness, backend-parity, dask, and cupy sections of the checklist don't apply. I focused on whether the fix is correct and whether the regression test actually guards it.

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/tests/test_sweep_state_csv_merge_2754.py:131: the assertion merged.count("slope,") <= 2 does not distinguish a conflict from a union. I checked it: under the old merge=union driver the merged file also contains exactly two slope, lines (slope,2026-06-01 and slope,2026-01-01), so the count is 2 in both the broken and the fixed case. The real guards are the returncode != 0 check and the conflict-marker check, which are correct. Either drop the slope-count assertion or make it catch the union shape (for example, assert exactly one header line, since union duplicates the header). As written it implies protection it doesn't provide.

Nits (optional improvements)

  • xrspatial/tests/test_sweep_state_csv_merge_2754.py:84: test_gitattributes_sweep_csv_not_union(tmp_path) accepts tmp_path but never uses it. Drop the parameter.
  • The test shells out to the system git and reads the repo's real .gitattributes. If a contributor's global git config defined a custom driver literally named text, merge=text would resolve to it instead of the built-in. Remote edge case, not worth guarding, but a one-line comment noting text is git's built-in driver name would help.

What looks good

  • merge=text is the right call. The PR body documents that -merge was tested and rejected because it silently dropped a side's rows, which is the correct reason to avoid it.
  • The doc updates are thorough: every stale "auto-merge without conflict" claim across the seven sweep files and deep-sweep was corrected, and the single-line _oneline writer was kept (it's still useful) with only its comment updated.
  • Red/green was verified both ways (fails under union, passes under text), which is what makes this test worth having.

Checklist

  • Fix matches the issue's Option 1
  • Regression test reproduces the reported scenario
  • Test verified red under the old behavior
  • [n/a] Backend parity / dask / cupy / NaN handling (no runtime code)
  • [n/a] README feature matrix (no new function)
  • Docs updated to match the new merge behavior

The slope-row count assertion did not distinguish a conflict from a
union (both leave two slope, lines), so it implied protection it did not
provide. Replace it with a header-count assertion: union duplicates the
header line, a conflict keeps one. Drop the unused tmp_path parameter
from the gitattributes check, and note in _make_repo that merge=text is
git's built-in driver rather than a custom one.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 3617d3a)

All three findings from the previous pass are addressed. No new findings.

  • Suggestion (weak slope-count assertion): fixed. The assertion now checks that the merged file has exactly one header line. Union duplicates the header, a conflict keeps one, so this actually distinguishes the two cases. Re-verified red under merge=union and green under merge=text.
  • Nit (unused tmp_path): fixed. Removed from test_gitattributes_sweep_csv_not_union.
  • Nit (custom-driver edge case): addressed with a comment in _make_repo noting merge=text is git's built-in driver and the sandbox defines no custom driver of that name.

Nothing else outstanding. Only the dismissed/addressed items remain.

@brendancol brendancol merged commit 002f97a into main Jun 8, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

merge=union on .claude/sweep-*-state.csv silently corrupts the file on merge

1 participant