Problem
.gitattributes sets merge=union for .claude/sweep-*-state.csv:
.claude/sweep-*-state.csv merge=union
The union driver concatenates both sides of a conflicting hunk instead of raising a conflict. For an append-only log that would be fine, but these state files are keyed by module (one row per module) and carry multi-line quoted notes. When a sweep branch based on an older copy of the CSV is merged, union produces:
- duplicate header lines
- duplicate module rows
- multi-line
notes fields split across physical lines
It happened on main with the #2712 squash merge: the file went from 9 lines to 43, with the header repeated three times and six modules duplicated. The branch was based on the older multi-line format while main had been compacted to single-line CRLF rows, so the two formats unioned into garbage. Repaired in #2753.
Because the corruption happens inside the merge driver, it cannot be prevented from the PR branch side — resolving the branch locally does not stop GitHub from re-running union at merge time.
Why it keeps recurring
Each sweep writes one row per module. Two sweep branches that both touch the CSV, or one branch based on a stale format, will union rather than conflict. The earlier newline-collapse fix (#2679 / #2684) made each record a single physical line, which helps, but union still duplicates whole rows and still mixes formats when a branch is stale.
Options
- Drop
merge=union for these files. Conflicts then surface normally and get resolved (deduped by module key) before merge. More visible conflicts, but no silent corruption.
- Custom merge driver that merges by the
module key — last-writer-wins per row, single header. More work, but keeps merges automatic and correct.
- Post-merge normalization: a hook or CI step that re-sorts, de-dups by module, and collapses any multi-line records, so union output is always cleaned afterward.
Option 1 is the smallest change and removes the failure mode outright. Filing for a decision; the data repair is already in #2753.
Problem
.gitattributessetsmerge=unionfor.claude/sweep-*-state.csv:The union driver concatenates both sides of a conflicting hunk instead of raising a conflict. For an append-only log that would be fine, but these state files are keyed by module (one row per module) and carry multi-line quoted
notes. When a sweep branch based on an older copy of the CSV is merged, union produces:notesfields split across physical linesIt happened on
mainwith the #2712 squash merge: the file went from 9 lines to 43, with the header repeated three times and six modules duplicated. The branch was based on the older multi-line format whilemainhad been compacted to single-line CRLF rows, so the two formats unioned into garbage. Repaired in #2753.Because the corruption happens inside the merge driver, it cannot be prevented from the PR branch side — resolving the branch locally does not stop GitHub from re-running union at merge time.
Why it keeps recurring
Each sweep writes one row per module. Two sweep branches that both touch the CSV, or one branch based on a stale format, will union rather than conflict. The earlier newline-collapse fix (#2679 / #2684) made each record a single physical line, which helps, but union still duplicates whole rows and still mixes formats when a branch is stale.
Options
merge=unionfor these files. Conflicts then surface normally and get resolved (deduped by module key) before merge. More visible conflicts, but no silent corruption.modulekey — last-writer-wins per row, single header. More work, but keeps merges automatic and correct.Option 1 is the smallest change and removes the failure mode outright. Filing for a decision; the data repair is already in #2753.