Skip to content

merge=union on .claude/sweep-*-state.csv silently corrupts the file on merge #2754

@brendancol

Description

@brendancol

Problem

.gitattributes sets merge=union for .claude/sweep-*-state.csv:

.claude/sweep-*-state.csv merge=union

The union driver concatenates both sides of a conflicting hunk instead of raising a conflict. For an append-only log that would be fine, but these state files are keyed by module (one row per module) and carry multi-line quoted notes. When a sweep branch based on an older copy of the CSV is merged, union produces:

  • duplicate header lines
  • duplicate module rows
  • multi-line notes fields split across physical lines

It happened on main with the #2712 squash merge: the file went from 9 lines to 43, with the header repeated three times and six modules duplicated. The branch was based on the older multi-line format while main had been compacted to single-line CRLF rows, so the two formats unioned into garbage. Repaired in #2753.

Because the corruption happens inside the merge driver, it cannot be prevented from the PR branch side — resolving the branch locally does not stop GitHub from re-running union at merge time.

Why it keeps recurring

Each sweep writes one row per module. Two sweep branches that both touch the CSV, or one branch based on a stale format, will union rather than conflict. The earlier newline-collapse fix (#2679 / #2684) made each record a single physical line, which helps, but union still duplicates whole rows and still mixes formats when a branch is stale.

Options

  1. Drop merge=union for these files. Conflicts then surface normally and get resolved (deduped by module key) before merge. More visible conflicts, but no silent corruption.
  2. Custom merge driver that merges by the module key — last-writer-wins per row, single header. More work, but keeps merges automatic and correct.
  3. Post-merge normalization: a hook or CI step that re-sorts, de-dups by module, and collapses any multi-line records, so union output is always cleaned afterward.

Option 1 is the smallest change and removes the failure mode outright. Filing for a decision; the data repair is already in #2753.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions