diff --git a/.claude/skills/bug-fix/SKILL.md b/.claude/skills/bug-fix/SKILL.md index 9ef4eb95..7f2ba2c5 100644 --- a/.claude/skills/bug-fix/SKILL.md +++ b/.claude/skills/bug-fix/SKILL.md @@ -219,14 +219,19 @@ artifact first then the code. (`fix(): ` for the fix; `test(): ` for the regression test if you split commits). 7. **Tangential-observations sweep.** Per CLAUDE.md - "Tangential discoveries", capture every unrelated issue you noticed - while tracing — as `bug_` / `chore_` / `infra_` - idea files in the same branch. Bug fixes routinely surface adjacent - bugs; if you don't capture them now, they're gone. + "Tangential discoveries — fix inline by default, defer rarely", + resolve every unrelated issue you noticed while tracing — either + fix it inline on this same branch (the default, when the path is + <60 min + same subsystem + no product/operator decision needed) or, + when inline fix genuinely fails the rubric, capture as + `bug_` / `chore_` / `infra_` idea files in the + same branch. Bug fixes routinely surface adjacent bugs; the inline + fix is almost always cheaper than the deferred-and-never-fixed + idea file. Don't carry noticings in conversation memory. **Phase 5 gate:** `bug_fix.md` is written, the fix is committed, the regression test fails on `main` and passes on the branch, the test -suite is green locally, tangential issues are captured. +suite is green locally, tangential issues are resolved (inline or captured). ### Phase 6 — Handoff (default) OR auto-ship (--ship) diff --git a/.claude/skills/impl-execute/SKILL.md b/.claude/skills/impl-execute/SKILL.md index 14ebf85d..87b18ab9 100644 --- a/.claude/skills/impl-execute/SKILL.md +++ b/.claude/skills/impl-execute/SKILL.md @@ -51,7 +51,7 @@ after all stories complete: item must be tracked before any post-impl work begins, so none are implicitly skipped): - Extract deferred work (Step 1) - Documentation updates (Step 2) - - Tangential observations sweep (Step 2.5) — BLOCKING; capture every noticed-but-uncaptured issue per CLAUDE.md tangential-discoveries rule + - Tangential observations sweep (Step 2.5) — BLOCKING; fix-inline-by-default per CLAUDE.md tangential-discoveries rule (capture as idea file only when inline fix actually fails the rubric) - Guide impact assessment + guide-gen run (Step 3) — MANDATORY gate - Push + open PR (Step 4) - Monitor CI (Step 5) @@ -89,7 +89,7 @@ Ad-hoc mode (`/impl-execute --ad-hoc`) is for changes that don't warrant `/pipel - Post-implementation Step 0a (worktree pre-flight — surfaces locked sibling worktrees that would block branch ops in Step 8 finalization or Step 9 cleanup; same risk applies to ad-hoc fixes). - Post-implementation Step 0b.1 (audit-event coverage audit — small fixes can still introduce new tenant-visible mutations). - Post-implementation Step 2 (`state.md` update if completion-snapshot or active-priorities changed; only if applicable). -- Post-implementation Step 2.5 (**tangential-observations sweep — BLOCKING**). Bug fixes routinely surface unrelated bugs; capture them as idea files before push. +- Post-implementation Step 2.5 (**tangential-observations sweep — BLOCKING**). Bug fixes routinely surface unrelated bugs; fix them inline on the current branch by default (per the CLAUDE.md fix-inline-by-default rule), capture as idea files only when inline fix actually fails the rubric. - Post-implementation Step 3 (guide impact assessment — MANDATORY GATE if frontend was touched). - Post-implementation Step 4 (push + `gh pr create` with the standard Summary / Test plan template). - Post-implementation Step 5 (CI watch). @@ -576,24 +576,24 @@ Commit documentation updates. ### Step 2.5: Tangential observations sweep — BLOCKING -Per the [tangential-discoveries rule in CLAUDE.md](../../../CLAUDE.md#tangential-discoveries--capture-as-idea-files-immediately), every issue you noticed during this implementation that was NOT part of the current plan must be captured as an idea file before push — not held in conversation memory, not deferred to "later in the PR description". +Per the [tangential-discoveries rule in CLAUDE.md](../../../CLAUDE.md#tangential-discoveries--fix-inline-by-default-defer-rarely), every issue you noticed during this implementation that was NOT part of the current plan must be **resolved** before push — either fixed inline on this branch (the default) or, when inline fix genuinely fails the CLAUDE.md rubric, captured as an idea file. What you cannot do is hold the noticing in conversation memory or defer it vaguely to "later in the PR description." -This step is a safety net for Rule #3 ("Capture, don't carry"). The discipline is to capture inline as you notice; this sweep is the last chance to flush anything you missed. +This step is a safety net for Rule #3 (fix-inline-by-default). The discipline is to fix-or-capture inline as you notice; this sweep is the last chance to flush anything you missed. Walk back through this implementation session and ask: -1. **Did I `git stash` to verify a failure was pre-existing on `main`?** That's a tangential bug — it pre-existed and I confirmed it. If I didn't already file an idea, file one now. -2. **Did I see a test fail on first run, then pass on re-run, and "just re-run" without investigating?** That's a test-isolation bug. File an idea. -3. **Did I defer test coverage with "no framework available" / "manual smoke at staging" / "out of scope for a bug fix"?** That's an infra gap. File an idea. -4. **Did I read code that referenced a broken/stale path** (a TODO comment older than 6 months, a deprecated symbol still in use, an env var with no documentation)? Note it. -5. **Did I think "I should fix that someday" about anything**, even briefly? File the someday now. -6. **Did I notice during implementation any operator-judgment-shaped question that has no canonical answer in the current docs?** Examples: *"What happens if I X?"* / *"Should I trust Y in case Z?"* / *"My pipeline shows N — is that a bug or expected?"* If yes, either (a) draft the entry directly under `ui/src/lib/faq.ts` in this PR (preferred — adds the answer where operators will look for it), OR (b) file a focused `chore_faq_/idea.md` capturing the question + draft answer + the spec/AC citation that backs the answer. Tooltips and the glossary are NOT the right surface — they're definitional, not judgment-shaped. +1. **Did I `git stash` to verify a failure was pre-existing on `main`?** That's a tangential bug — it pre-existed and I confirmed it. Estimate the fix path; if <60 min and on the same subsystem, fix inline now. Else file an idea. +2. **Did I see a test fail on first run, then pass on re-run, and "just re-run" without investigating?** That's a test-isolation bug. Estimate the fix path; if <60 min, fix inline now. Else file an idea. +3. **Did I defer test coverage with "no framework available" / "manual smoke at staging" / "out of scope for a bug fix"?** Almost always wrong — first instances of common test infra are <60 min per CLAUDE.md's failure-mode words table. Try the inline fix first; only file an idea if you actually hit the 60-min ceiling. +4. **Did I read code that referenced a broken/stale path** (a TODO comment older than 6 months, a deprecated symbol still in use, an env var with no documentation)? If the fix is <30 lines and on-topic for the current PR, fix inline. Else note it. +5. **Did I think "I should fix that someday" about anything**, even briefly? Don't defer "someday" — write the implementation path now and decide: fix inline or file idea. "Someday" is the trap. +6. **Did I notice during implementation any operator-judgment-shaped question that has no canonical answer in the current docs?** Examples: *"What happens if I X?"* / *"Should I trust Y in case Z?"* / *"My pipeline shows N — is that a bug or expected?"* If yes, the default action is **add the entry to `ui/src/lib/faq.ts` in this PR** (preferred — adds the answer where operators will look for it). Only file `chore_faq_/idea.md` if the FAQ entry actually requires a product decision you can't make unilaterally. Tooltips and the glossary are NOT the right surface for judgment-shaped questions — they're definitional, not judgment-shaped. -For each, create `docs/00_overview/planned_features//_/idea.md` per [feature_templates/idea-template.md](../../../docs/00_overview/planned_features/feature_templates/idea-template.md). **`` defaults to the current active-release bucket** — read the active release from [`state.md`](../../../state.md) (today **MVP2 → `02_mvp2/`**). A bug/chore tripped over while building the current MVP is almost always that MVP's concern, so it goes there by default. File elsewhere only when the target is clearly a *different* release (`03_mvp3/`, `04_ga/`, `99_backlog/`); `00_unsure/` is reserved for the genuinely-rare "can't tell which release" case, not the default. Origin field MUST point at the PR or story that surfaced the observation, so the trace stays intact. +For each tangential discovery that survives the inline-fix gate, create `docs/00_overview/planned_features//_/idea.md` per [feature_templates/idea-template.md](../../../docs/00_overview/planned_features/feature_templates/idea-template.md). **`` defaults to the current active-release bucket** — read the active release from [`state.md`](../../../state.md) (today **MVP2 → `02_mvp2/`**). A bug/chore tripped over while building the current MVP is almost always that MVP's concern, so it goes there by default. File elsewhere only when the target is clearly a *different* release (`03_mvp3/`, `04_ga/`, `99_backlog/`); `00_unsure/` is reserved for the genuinely-rare "can't tell which release" case, not the default. Origin field MUST point at the PR or story that surfaced the observation, AND cite the specific CLAUDE.md rubric row that justified the deferral (not "out of scope" — that phrase has been the cover for too many over-deferrals). -**If you have nothing to file, state explicitly: "Tangential observations sweep: none found." in your end-of-step summary.** Silence is suspicious — the sweep is supposed to find things. +**If you have nothing to file AND nothing to fix inline, state explicitly: "Tangential observations sweep: none found." in your end-of-step summary.** Silence is suspicious — the sweep is supposed to find things. -Commit the idea files (separate `docs(planned): capture in-flight-noticed issues` commit on the same branch is acceptable). +For inline fixes during the sweep: commit on the current branch with `fix(scope): tangential — ` style messages, and mention each in the PR body's "Also fixes:" list. For captured ideas (the exception): a single `docs(planned): capture deferred discoveries` commit on the same branch is acceptable. ### Step 3: Guide impact assessment — MANDATORY GATE @@ -1031,7 +1031,7 @@ Some stories involve manual configuration outside the codebase (GitHub App regis 1. **Never commit to main.** Always use a feature branch. 2. **Never skip a verification gate.** If lint fails, fix it. If tests fail, fix them. No `--no-verify`. -3. **Never implement beyond the story scope. Capture, don't carry.** If you see a bug or improvement opportunity that's orthogonal to the current story, do NOT fix it in this story's commit AND do NOT just "note it for later" in conversation memory. **Create an idea file immediately** at `docs/00_overview/planned_features//_/idea.md` (`` defaults to the **current active-release bucket** per [`state.md`](../../../state.md) — today `02_mvp2/`; `00_unsure/` only for the genuinely-rare can't-tell-which-release case) per the [tangential-discoveries protocol in CLAUDE.md](../../../CLAUDE.md#tangential-discoveries--capture-as-idea-files-immediately). Idea files surface in `/pipeline --status` and persist across sessions; chat-noticings evaporate. Step 1.5 below ("Tangential observations sweep") flushes any uncaptured noticings before push as a safety net, but the discipline is to capture inline as you notice. +3. **Fix tangential discoveries inline by default; capture only when inline is genuinely impossible.** If you see a bug or improvement opportunity that's orthogonal to the current story, **first ask: can I fix it on this branch in <60 minutes?** If yes, fix it inline — add an adjacent commit on the current branch (or fold into the current commit if the diff is tiny and on-topic) with a message that names the tangential discovery (e.g. `fix(scope): tangential — (noticed during Story X.Y)`). Note it in the PR body's "Also fixes:" section. This is the default. Only when inline fix actually fails the rubric (cross-subsystem + >250 LOC, requires product/UX decision, requires operator-environment change) do you fall back to filing an idea file at `docs/00_overview/planned_features//_/idea.md` per the [tangential-discoveries protocol in CLAUDE.md](../../../CLAUDE.md#tangential-discoveries--fix-inline-by-default-defer-rarely). What you must NOT do: hold the noticing in conversation memory ("I'll come back to it"), or vaguely defer to "later in the PR description". The "later" agent has less context than you do right now; over-deferral has been the project's failure mode (canonical example: PR #383's first instinct was to file `infra_solr_smoke_data_dir_perms` instead of fixing the three-line YAML; the user caught it with "stop pushing off fixing these"). Step 1.5 below ("Tangential observations sweep") flushes any uncaptured noticings before push as a safety net. 4. **Always read before editing.** Never modify a file you haven't read in this session. 5. **Always use GPT-5.5 for cross-model review.** Model ID: `gpt-5.5`. Never substitute gpt-4o. 6. **Always update the plan tracker** after completing a story. diff --git a/CLAUDE.md b/CLAUDE.md index db18869b..154b62d9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -486,13 +486,24 @@ When fixing a bug, follow this sequence: For ad-hoc fixes that don't warrant `/pipeline` scaffolding, use `/impl-execute --ad-hoc` to ship the change through the standard review/merge ceremony (pre-push gate → push → PR → CI watch → Gemini adjudication → optional GPT-5.5 review). See `.claude/skills/impl-execute/SKILL.md` "Ad-hoc mode behavior." -## Tangential discoveries — capture as idea files immediately +## Tangential discoveries — fix inline by default, defer rarely When working on any task (feature, bug fix, refactor, doc update), you will routinely notice **other** problems that aren't part of the current scope: a pre-existing test failure that you waved through, a flaky test you re-ran without investigating, an infrastructure gap that forced you to defer coverage, a stale runbook entry, a dead-code branch, etc. -**Do not** carry these in working memory or "mention them in the PR description". Conversation memory evaporates between sessions and PR descriptions don't get re-read. +**The default is: fix it inline on the current branch.** Capture-as-idea-file is the exception, reserved for cases where inline fix is genuinely impossible (different subsystem + significant LOC, requires a product decision, requires an operator-environment change you can't make). The historical failure mode at this project has been **over-deferring** — agents file an idea file when the actual fix is one line of YAML or a single-character grep pattern correction, then the idea sits in `02_mvp2/` for weeks while the bug stays live. **Don't do that.** Fix it now and note the tangential discovery in the commit body. -**Do** create an idea file the moment you notice the issue: +**The ordering of operations when you notice a tangential issue:** + +1. **Write down the implementation path in two-three sentences** (which file, which line, what edit). If you can't, you don't understand the issue well enough yet — keep reading code. +2. **Estimate the path in minutes.** If you can't estimate concretely (file count × diff size × verification cycles), the "high overhead" intuition is probably wrong; just try it and abort if you cross 60 minutes. +3. **If <60 minutes AND no cross-subsystem fork AND no product/operator decision → fix inline.** Commit on the current branch with a message that names the tangential discovery (e.g. `fix(scope): tangential — (noticed during )`). Mention it in the PR body as "Also fixes: " — that's the audit trail, not an idea file. +4. **Only if inline fix fails one of the gates above → capture as idea file** per the recipe below. Most tangential discoveries do NOT survive these gates; that's the point. + +**Canonical example of the right reflex** — during `infra_solr_smoke_stability` PR #383, the first CI run surfaced a Solr filesystem-perms crash that the spec's locked lever cascade never anticipated. The fix was three lines of YAML (`mkdir + chown + make up`). The agent's first instinct was to file `infra_solr_smoke_data_dir_perms/idea.md` as a follow-up. **Wrong reflex.** Deferring would have meant another full PR cycle (idea-preflight → spec-gen → impl-plan-gen → impl-execute) for a fix that took 30 seconds to write. The right move — and what shipped — was the inline commit with a D-log entry on the spec acknowledging the failure-mode discovery. The user's directive when they caught the deferral: "stop pushing off fixing these... fix now." + +**Anti-pattern to recognize in yourself:** "I'll just file this for later." The "later" agent has less context than you have right now. They'll have to re-derive what you already know, re-read the same code, re-run the same failing CI to confirm the symptom. That cost is real and almost always larger than the cost of just fixing it now. **Default lean: implement-now.** + +When inline fix genuinely doesn't apply (the rubric below sends you to "Idea file"), then file one: 1. Pick a folder name with the right prefix per [`docs/00_overview/planned_features/feature_templates/README.md`](docs/00_overview/planned_features/feature_templates/README.md): - `bug_` — pre-existing failure, regression, broken behavior @@ -501,16 +512,14 @@ When working on any task (feature, bug fix, refactor, doc update), you will rout 2. Write `docs/00_overview/planned_features///idea.md` following [`feature_templates/idea-template.md`](docs/00_overview/planned_features/feature_templates/idea-template.md). **`` defaults to the current active-release bucket** — read the active release from [`state.md`](state.md) ("Where the roadmap sits" / current focus; today **MVP2 → `02_mvp2/`**). A discovery tripped over while building the current MVP is almost always that MVP's concern, so it goes there by default — NOT into `00_unsure/`. Only file elsewhere when the target is clearly a *different* release: GA-hardening → `04_ga/`, Observable/MVP3 → `03_mvp3/`, defer-until-incident → `99_backlog/`. `00_unsure/` is reserved for the genuinely-rare "I truly can't tell which release" case. Include: - **Origin**: how you noticed it (which PR / phase gate / story / conversation) - **Problem**: what's wrong, with `file:line` citations where you can - - **Why deferred**: why you didn't fix it inline (almost always: "out of scope for current task") -3. Include the idea file in the same commit as the work that surfaced it (or a separate doc commit on the same branch). Don't wait for a "later cleanup PR" — the cleanup PR is the idea file. - -This rule applies even if the issue feels minor. A 3-line idea file with the right `bug_` / `chore_` / `infra_` prefix surfaces in `/pipeline status` and the next infra-sweep agent will find it. A noticing that lives only in a chat transcript is gone forever. + - **Why inline fix wasn't viable**: cite the specific rubric row below that sent you here (NOT "out of scope" — that phrase has been the cover for too many deferrals that should have been inline fixes) +3. Include the idea file in the same commit as the work that surfaced it (or a separate doc commit on the same branch). -**Anti-pattern to recognize in yourself:** "I'll just note this and come back to it later." Either fix it now (if it's truly inline-cheap) or capture the idea file now (if it's not). There is no "later" — the conversation will end. +The rule's spirit: **inline fix is the discipline; capture is the safety net.** A chat-transcript-only noticing is gone forever, so when inline fix really isn't viable, capture it — but interrogate the "really isn't viable" claim hard before accepting it. ### Inline-fix vs idea-file rubric -The historical failure mode here was *capturing too aggressively* — auto-creating idea files for medium-sized fixes that would have been 30–60 minutes inline. Apply this table when deciding, calibrated toward implement-now: +The rubric below is the authoritative decision tree. Most rows route to **inline fix** — that's intentional. Only the bottom two rows route to idea-file, and they share the same shape: the work can't be done in the current PR without breaking either the PR's review boundary or the operator's authorization scope. | Discovery shape | Action | |---|---| @@ -521,7 +530,7 @@ The historical failure mode here was *capturing too aggressively* — auto-creat | Fix requires a separate subsystem AND >250 LOC AND no immediate path to inline (different ORM model unrelated to the feature, different service entirely, different UI route family) | **Idea file.** Cross-subsystem mixing in one PR breaks reviewability. | | Fix requires a product/UX decision, third-party config, or an operator-environment change (env vars, mounted secrets, branch-protection settings, SaaS account) | **Idea file.** Can't be unilaterally implemented. | -**Default lean: implement-now, not capture-as-idea-file.** The cost of a deferred-and-never-fixed idea file is higher than the cost of a slightly mixed-scope PR. +**Default lean: implement-now, not capture-as-idea-file.** The cost of a deferred-and-never-fixed idea file is higher than the cost of a slightly mixed-scope PR. If the inline fix surfaces during CI watch (e.g. a smoke job catches a failure mode the spec didn't anticipate), apply the fix to the same branch BEFORE merge — CI will re-run automatically on the new commit. That's exactly how the PR-#383 Solr-perms fix shipped. ### Pre-defer diagnostic: write the path