From baa836dfda81e5f744b8de075e8b00f7bf798821 Mon Sep 17 00:00:00 2001 From: SoundMindsAI Date: Mon, 25 May 2026 17:56:04 -0400 Subject: [PATCH] docs: finalize chore_dashboard_regen_quoted_pr_false_positive after PR #253 Per impl-execute Step 7 finalization pattern (matches recent main e.g. 2a24fae4 docs: finalize chore_e2e_seed_acme_idea_obsolete after PR #250 (#252)). - git mv chore folder from planned_features -> implemented_features - implementation_plan.md status flipped + execution tracker filled - pipeline_status.md Implementation + Done sections added with full review counts (4 cycles / 9 findings) and Gemini adjudication link - state.md "Last updated" lead rewritten as 40th MVP1-era artifact - Captures the follow-on chore from final review (chore_dashboard_regen_priority4_dependency_cite_false_positive) Story 2.1 / FR-5. Closes the two-PR rollout. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/00_overview/DASHBOARD.md | 2 +- docs/00_overview/MVP1_DASHBOARD.md | 21 ++++---- docs/00_overview/dashboard.html | 2 +- .../feature_spec.md | 0 .../idea.md | 0 .../implementation_plan.md | 20 ++++---- .../pipeline_status.md | 45 +++++++++++++++++ docs/00_overview/mvp1_dashboard.html | 49 ++++++++++--------- .../pipeline_status.md | 29 ----------- state.md | 2 +- 10 files changed, 94 insertions(+), 76 deletions(-) rename docs/{02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive => 00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive}/feature_spec.md (100%) rename docs/{02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive => 00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive}/idea.md (100%) rename docs/{02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive => 00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive}/implementation_plan.md (96%) create mode 100644 docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/pipeline_status.md delete mode 100644 docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/pipeline_status.md diff --git a/docs/00_overview/DASHBOARD.md b/docs/00_overview/DASHBOARD.md index 7cc659c8..225655c6 100644 --- a/docs/00_overview/DASHBOARD.md +++ b/docs/00_overview/DASHBOARD.md @@ -6,7 +6,7 @@ _Top-level index across MVP1 → GA v1+ as of **2026-05-25**. Click a release na | Release | Theme | Progress | Status | |---|---|---|---| -| [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 79 / 81 scoped done · 17 remaining | **In progress** | +| [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 80 / 81 scoped done · 16 remaining | **In progress** | | [MVP1.5 / v0.1.5](MVP1_5_DASHBOARD.md) | Real Signals | 1 item(s) queued | **Held / queued** | | [MVP2 / v0.2](MVP2_DASHBOARD.md) | Observable | 1 / 1 scoped done · 1 remaining | **In progress** | | MVP3 / v0.3 | Production Stacks | — | **Not yet scoped** | diff --git a/docs/00_overview/MVP1_DASHBOARD.md b/docs/00_overview/MVP1_DASHBOARD.md index fd876ba1..21ae1232 100644 --- a/docs/00_overview/MVP1_DASHBOARD.md +++ b/docs/00_overview/MVP1_DASHBOARD.md @@ -20,20 +20,20 @@ Implementation in progress — resume to finish | Metric | Value | |---|---| -| Scoped items done | **79 / 81** (98%) — feat_/infra_/chore_/epic_ past idea stage | -| Pending work | **18** items (every not-done feat/infra/chore/bug across all priorities) | +| Scoped items done | **80 / 81** (99%) — feat_/infra_/chore_/epic_ past idea stage | +| Pending work | **17** items (every not-done feat/infra/chore/bug across all priorities) | | → P0 — do next | **0** unblocking / paying daily cost | | → P1 | **0** high-value, ready when P0 clears | -| → P2 (default) | 17 important to file, not blocking | +| → P2 (default) | 16 important to file, not blocking | | → Backlog | 1 captured for record, not planned | | Open bugs | 8 | -| Legacy "Path to MVP1" | 17 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) | +| Legacy "Path to MVP1" | 16 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) | | Backlog ideas | 1 idea-only feat/infra (not yet scoped into MVP1) | | In flight | 1 feature(s) actively shipping | ## Pipeline -### Done (98) +### Done (99) | Feature | Type | One-liner | Depends on | Status | |---|---|---|---|---| @@ -88,6 +88,7 @@ Implementation in progress — resume to finish | [chore_create_study_modal_e2e_stability](implemented_features/2026_05_20_chore_create_study_modal_e2e_stability/idea.md) | Chore | The Playwright smoke lane runs every `ui/tests/e2e/*.spec.ts` against a real-backend stack. The create-study modal's Step-1 cluster trigger (rendered by [`EntitySelect`](../../ui/src/components/common | — | [PR #161](https://github.com/SoundMindsAI/relyloop/pull/161) merged 2026-05-20 | | [chore_create_study_wizard_polish](implemented_features/2026_05_20_chore_create_study_wizard_polish/feature_spec.md) | Chore | Step 4 auto-fills from the template's `declared_params` with conservative ranges, rejects unknown/missing params at create time with new machine-readable error codes, and surfaces four new glossary en | — | [PR #157](https://github.com/SoundMindsAI/relyloop/pull/157) merged 2026-05-20 | | [chore_dashboard_pr_extraction_from_idea](implemented_features/2026_05_23_chore_dashboard_pr_extraction_from_idea/feature_spec.md) | Chore | Extend `_extract_pr_number` to accept the idea body as a fourth argument, and have `_load_implemented` read `idea.md` and pass it through. | — | [PR #221](https://github.com/SoundMindsAI/relyloop/pull/221) merged 2026-05-23 | +| [chore_dashboard_regen_quoted_pr_false_positive](implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/feature_spec.md) | Chore | Priority-3 fuzzy match no longer matches PR-merge phrases that live inside backtick-fenced segments. | — | [PR #253](https://github.com/SoundMindsAI/relyloop/pull/253) merged 2026-05-25 | | [chore_data_table_columnvisibility_tanstack](implemented_features/2026_05_19_chore_data_table_columnvisibility_tanstack/idea.md) | Chore | `feat_data_table_primitive` shipped with six known non-regression follow-up items captured only in chat transcripts. None block the PR but each is a real improvement that would otherwise evaporate whe | — | Complete | | [chore_detail_page_shell_primitive](implemented_features/2026_05_19_chore_detail_page_shell_primitive/idea.md) | Chore | Six of the seven `/{entity}/[id]` detail routes hand-roll the same three-state scaffold around their data query. The structure is **identical** down to the className strings, with two minor copy varia | — | Complete | | [chore_digest_worker_narrow_except](implemented_features/2026_05_14_chore_digest_worker_narrow_except/idea.md) | Chore | … | — | Complete | @@ -142,11 +143,9 @@ Implementation in progress — resume to finish |---|---|---|---|---|---|---| | 1 | P2 | [infra_agent_sibling_worktree_isolation](../02_product/planned_features/infra_agent_sibling_worktree_isolation/feature_spec.md) | Infra | Add a tight "Working in sibling worktrees" section to `CLAUDE.md` between `## Common Pitfalls` and `## Bug Fix Protocol` that catalogs which host paths are bind-mounted by the Compose stack (and there | — | [PR #249](https://github.com/SoundMindsAI/relyloop/pull/249) merged 2026-05-25 | -### Plan (1) +### Plan (0) -| # | Priority | Feature | Type | One-liner | Depends on | Status | -|---|---|---|---|---|---|---| -| 1 | P2 | [chore_dashboard_regen_quoted_pr_false_positive](../02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/feature_spec.md) | Chore | Priority-3 fuzzy match no longer matches PR-merge phrases that live inside backtick-fenced segments. | — | [PR #221](https://github.com/SoundMindsAI/relyloop/pull/221) merged 2026-05-25 | +_None._ ### Spec (0) @@ -184,8 +183,6 @@ graph LR classDef plan fill:#fef9c3,stroke:#854d0e,color:#854d0e; classDef spec fill:#dbeafe,stroke:#1e40af,color:#1e40af; classDef idea fill:#f1f5f9,stroke:#334155,color:#334155; - chore_dashboard_regen_quoted_pr_false_positive["dashboard regen quoted pr false positive"] - class chore_dashboard_regen_quoted_pr_false_positive plan; infra_agent_sibling_worktree_isolation["agent sibling worktree isolation"] class infra_agent_sibling_worktree_isolation implement; infra_foundation["foundation"] @@ -338,6 +335,8 @@ graph LR class feat_digest_executable_followups_swap_template done; feat_home_demo_reseed_endpoint["home demo reseed endpoint"] class feat_home_demo_reseed_endpoint done; + chore_dashboard_regen_quoted_pr_false_positive["dashboard regen quoted pr false positive"] + class chore_dashboard_regen_quoted_pr_false_positive done; chore_e2e_seed_acme_idea_obsolete["e2e seed acme idea obsolete"] class chore_e2e_seed_acme_idea_obsolete done; feat_study_baseline_trial["study baseline trial"] diff --git a/docs/00_overview/dashboard.html b/docs/00_overview/dashboard.html index 1b5081d6..dddb2f23 100644 --- a/docs/00_overview/dashboard.html +++ b/docs/00_overview/dashboard.html @@ -384,7 +384,7 @@

Releases

The Loop
-
79 / 81 scoped done · 17 remaining
+
80 / 81 scoped done · 16 remaining
In progress
diff --git a/docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/feature_spec.md b/docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/feature_spec.md similarity index 100% rename from docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/feature_spec.md rename to docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/feature_spec.md diff --git a/docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/idea.md b/docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/idea.md similarity index 100% rename from docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/idea.md rename to docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/idea.md diff --git a/docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/implementation_plan.md b/docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/implementation_plan.md similarity index 96% rename from docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/implementation_plan.md rename to docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/implementation_plan.md index 2cb34b21..5cab7a0c 100644 --- a/docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/implementation_plan.md +++ b/docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/implementation_plan.md @@ -1,7 +1,7 @@ # Implementation Plan — chore_dashboard_regen_quoted_pr_false_positive **Date:** 2026-05-25 -**Status:** Ready for Execution +**Status:** Complete (PR #253 merged 2026-05-25 as squash `20bcb36d`; PR B finalization in flight) **Primary spec:** [`feature_spec.md`](feature_spec.md) **Policy source(s):** [`CLAUDE.md`](../../../../CLAUDE.md) — two-PR finalization pattern, never `--no-verify`; [`impl-execute SKILL.md`](../../../../.claude/skills/impl-execute/SKILL.md) — Step 7 finalization @@ -319,14 +319,16 @@ No new test files (additive to the existing `test_dashboard_pr_extraction.py`). | Story | Status | Commit SHA | |---|---|---| -| 1.1 — Add `_strip_backtick_quoted_segments` helper | [ ] | — | -| 1.2 — Wire helper into `_extract_pr_number` priority-3 | [ ] | — | -| 1.3 — Add `TestBacktickStripPriority3` class (7 methods) | [ ] | — | -| 1.4 — Insert docstring note about backtick strip | [ ] | — | -| Epic 1 phase gate | [ ] | — | -| **PR A** | [ ] | — | -| 2.1 — git mv + state.md update | [ ] | — | -| **PR B** | [ ] | — | +| 1.1 — Add `_strip_backtick_quoted_segments` helper | [x] | `9d127fb9` | +| 1.2 — Wire helper into `_extract_pr_number` priority-3 | [x] | `65db54e2` | +| 1.3 — Add `TestBacktickStripPriority3` class (8 methods — AC-13 added per phase-gate) | [x] | `80190f43` + `f5dd98b3` (AC-13 + backref regex) | +| 1.4 — Insert docstring note about backtick strip | [x] | `e50537f5` | +| Epic 1 phase gate | [x] | GPT-5.5 cumulative review: 1 Medium accepted (`f5dd98b3` backref + AC-13) | +| Final GPT-5.5 review | [x] | 1 Medium accepted in part: spec/plan rewrite (`a18aba19`) + filed follow-on `chore_dashboard_regen_priority4_dependency_cite_false_positive` | +| Gemini Code Assist | [x] | 1 High accepted: double-backtick fix (`5b595bc9`) | +| **PR A** | [x] | [#253](https://github.com/SoundMindsAI/relyloop/pull/253) merged 2026-05-25T21:53:09Z as squash `20bcb36d` | +| 2.1 — git mv + state.md update | [x] | this finalization commit | +| **PR B** | [ ] | in flight | ## 8) Open questions diff --git a/docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/pipeline_status.md b/docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/pipeline_status.md new file mode 100644 index 00000000..74c307f0 --- /dev/null +++ b/docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/pipeline_status.md @@ -0,0 +1,45 @@ +# Pipeline Status — chore_dashboard_regen_quoted_pr_false_positive + +## Idea +- Status: Complete +- File: idea.md +- /idea-preflight verdict (2026-05-25): Ready after 3-edit patch (line 572→581 drift, PR-TBD→PR #221 for sibling chore, "Why deferred" status clarification) + +## Spec +- Status: Approved +- Date: 2026-05-25 +- File: feature_spec.md +- Cross-model review: GPT-5.5 converged after 3 cycles + - Cycle 1: 1 Low finding (AC-7 missed single-line triple-backtick fences) — accepted, added AC-12 + - Cycle 2: 2 Low findings (regex hint `+` would skip empty spans; 4 stale "6 tests" residuals) — both accepted, patched + - Cycle 3: 0 findings → stop rule satisfied +- Phases: 1 (single phase, two-PR rollout — see §3 Phase boundaries) +- FRs: 5 (FR-1 helper, FR-2 wire-in, FR-3 test class, FR-4 docstring, FR-5 post-merge finalization) +- ACs: 7 (AC-6 through AC-12) + +## Plan +- Status: Approved +- Date: 2026-05-25 +- File: implementation_plan.md +- Cross-model review: GPT-5.5 cycle 1 produced 2 findings (1 Low, 1 Medium); both accepted and patched (gate arithmetic 6→7; regex `\`{3,}` for spec-compliance with "3-or-more" fence delimiter). Cycle 2 = 0 findings → stop rule satisfied. +- Stories: 5 total across 2 epics (Epic 1 = Stories 1.1–1.4 in PR A; Epic 2 = Story 2.1 in PR B) +- Phases covered: single phase (two-PR rollout per spec §3) + +## Implementation +- Status: Complete +- Date: 2026-05-25 +- PR A (content): [#253](https://github.com/SoundMindsAI/relyloop/pull/253) — merged 2026-05-25T21:53:09Z as squash `20bcb36d` +- Stories completed: 4 (1.1 helper, 1.2 wire-in, 1.3 test class, 1.4 docstring) + Epic 1 phase-gate fix +- Tests: 8 in TestBacktickStripPriority3 (AC-6..AC-13); 36 total in test_dashboard_pr_extraction.py; 1434+ in full backend unit suite +- CI: 6/7 green; 1 pre-existing failure (`smoke (operator-path tutorial flow)` — captured in `bug_smoke_dashboard_demo_state_locator_missing`; same as PR #250) +- Cross-model reviews: + - spec-gen: 3 GPT-5.5 cycles converged + - impl-plan-gen: 2 GPT-5.5 cycles converged + - Epic 1 phase-gate: 1 Medium finding (naive `{3,}` regex would miss 4-backtick outer with inner 3-backtick) → accepted, backref `(`{3,}).*?\1` + new AC-13 test + - Final review: 1 Medium finding (self-triggering spec/plan examples + remaining priority-4 false positive) → accepted in part (spec/plan rewritten); priority-4 deferred as follow-on chore [`chore_dashboard_regen_priority4_dependency_cite_false_positive`](../chore_dashboard_regen_priority4_dependency_cite_false_positive/idea.md) +- Gemini Code Assist: 1 High finding (double-backtick inline spans missed by Pass-B regex) → accepted, fix shipped in `5b595bc9` using Gemini's suggested backref `(`{1,2})[^\n]*?\1` +- PR B (finalization): in flight — this branch (`docs/finalize-chore-dashboard-regen-quoted-pr-false-positive`) + +## Done +- Status: Pending (PR B merge) +- Folder moved to `docs/00_overview/implemented_features/2026_05_25_chore_dashboard_regen_quoted_pr_false_positive/` via FR-5 in this PR B. diff --git a/docs/00_overview/mvp1_dashboard.html b/docs/00_overview/mvp1_dashboard.html index 3eaaec66..c1098f25 100644 --- a/docs/00_overview/mvp1_dashboard.html +++ b/docs/00_overview/mvp1_dashboard.html @@ -397,13 +397,13 @@

MVP1 Progress

Scoped items done
-
79 / 81
-
98% of feat_/infra_/chore_/epic_ items past idea stage
-
+
80 / 81
+
99% of feat_/infra_/chore_/epic_ items past idea stage
+
Pending work
-
18
+
17
every not-done feat/infra/chore/bug across all priorities
@@ -425,7 +425,7 @@

MVP1 Progress

P2 (default)
-
17
+
16
important to file, not blocking
@@ -435,7 +435,7 @@

MVP1 Progress

Legacy "Path to MVP1"
-
17
+
16
scoped not-done + bugs + chore-ideas only (excludes feat/infra ideas)
@@ -680,19 +680,7 @@

Spec 0

-

Plan 1

- -
- -
- Chore - P2 - PR #221 merged 2026-05-25 -
-
Priority-3 fuzzy match no longer matches PR-merge phrases that live inside backtick-fenced segments.
- - -
+

Plan 0

@@ -714,7 +702,7 @@

Implementing 1

-

Done 98

+

Done 99

@@ -1379,6 +1367,19 @@

Done 98

+
+ +
+ Chore + + PR #253 merged 2026-05-25 +
+
Priority-3 fuzzy match no longer matches PR-merge phrases that live inside backtick-fenced segments.
+ + +
+ +
@@ -2002,8 +2003,6 @@

Dependency graph (feat_ + infra_)

classDef plan fill:#fef9c3,stroke:#854d0e,color:#854d0e; classDef spec fill:#dbeafe,stroke:#1e40af,color:#1e40af; classDef idea fill:#f1f5f9,stroke:#334155,color:#334155; - chore_dashboard_regen_quoted_pr_false_positive["dashboard regen quoted pr false positive"] - class chore_dashboard_regen_quoted_pr_false_positive plan; infra_agent_sibling_worktree_isolation["agent sibling worktree isolation"] class infra_agent_sibling_worktree_isolation implement; infra_foundation["foundation"] @@ -2156,6 +2155,8 @@

Dependency graph (feat_ + infra_)

class feat_digest_executable_followups_swap_template done; feat_home_demo_reseed_endpoint["home demo reseed endpoint"] class feat_home_demo_reseed_endpoint done; + chore_dashboard_regen_quoted_pr_false_positive["dashboard regen quoted pr false positive"] + class chore_dashboard_regen_quoted_pr_false_positive done; chore_e2e_seed_acme_idea_obsolete["e2e seed acme idea obsolete"] class chore_e2e_seed_acme_idea_obsolete done; feat_study_baseline_trial["study baseline trial"] @@ -2217,8 +2218,6 @@

Dependency graph (feat_ + infra_)

classDef plan fill:#fef9c3,stroke:#854d0e,color:#854d0e; classDef spec fill:#dbeafe,stroke:#1e40af,color:#1e40af; classDef idea fill:#f1f5f9,stroke:#334155,color:#334155; - chore_dashboard_regen_quoted_pr_false_positive["dashboard regen quoted pr false positive"] - class chore_dashboard_regen_quoted_pr_false_positive plan; infra_agent_sibling_worktree_isolation["agent sibling worktree isolation"] class infra_agent_sibling_worktree_isolation implement; infra_foundation["foundation"] @@ -2371,6 +2370,8 @@

Dependency graph (feat_ + infra_)

class feat_digest_executable_followups_swap_template done; feat_home_demo_reseed_endpoint["home demo reseed endpoint"] class feat_home_demo_reseed_endpoint done; + chore_dashboard_regen_quoted_pr_false_positive["dashboard regen quoted pr false positive"] + class chore_dashboard_regen_quoted_pr_false_positive done; chore_e2e_seed_acme_idea_obsolete["e2e seed acme idea obsolete"] class chore_e2e_seed_acme_idea_obsolete done; feat_study_baseline_trial["study baseline trial"] diff --git a/docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/pipeline_status.md b/docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/pipeline_status.md deleted file mode 100644 index 1d60ca91..00000000 --- a/docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/pipeline_status.md +++ /dev/null @@ -1,29 +0,0 @@ -# Pipeline Status — chore_dashboard_regen_quoted_pr_false_positive - -## Idea -- Status: Complete -- File: idea.md -- /idea-preflight verdict (2026-05-25): Ready after 3-edit patch (line 572→581 drift, PR-TBD→PR #221 for sibling chore, "Why deferred" status clarification) - -## Spec -- Status: Approved -- Date: 2026-05-25 -- File: feature_spec.md -- Cross-model review: GPT-5.5 converged after 3 cycles - - Cycle 1: 1 Low finding (AC-7 missed single-line triple-backtick fences) — accepted, added AC-12 - - Cycle 2: 2 Low findings (regex hint `+` would skip empty spans; 4 stale "6 tests" residuals) — both accepted, patched - - Cycle 3: 0 findings → stop rule satisfied -- Phases: 1 (single phase, two-PR rollout — see §3 Phase boundaries) -- FRs: 5 (FR-1 helper, FR-2 wire-in, FR-3 test class, FR-4 docstring, FR-5 post-merge finalization) -- ACs: 7 (AC-6 through AC-12) - -## Plan -- Status: Approved -- Date: 2026-05-25 -- File: implementation_plan.md -- Cross-model review: GPT-5.5 cycle 1 produced 2 findings (1 Low, 1 Medium); both accepted and patched (gate arithmetic 6→7; regex `\`{3,}` for spec-compliance with "3-or-more" fence delimiter). Cycle 2 = 0 findings → stop rule satisfied. -- Stories: 5 total across 2 epics (Epic 1 = Stories 1.1–1.4 in PR A; Epic 2 = Story 2.1 in PR B) -- Phases covered: single phase (two-PR rollout per spec §3) - -## Implementation -- Status: Not started diff --git a/state.md b/state.md index 8eae0a99..0352e375 100644 --- a/state.md +++ b/state.md @@ -2,7 +2,7 @@ > Read this first. Snapshots the active branch, what just shipped, what's in flight, what's queued, and where the project currently sits in the MVP1 → GA roadmap. Updated whenever a feature lands or a priority shifts. -**Last updated:** 2026-05-25 (after `chore_e2e_seed_acme_idea_obsolete` admin-merged into `main` as PR #250 squash `05f3d486` — 39th MVP1-era artifact. Doc-only chore that closes the OBE'd `chore_e2e_seed_acme_helper_dead` idea (Option A — in-place `**Status:**` line edit at line 4, dashboard regen picks up the closure via `_extract_status_line` at `scripts/build_mvp1_dashboard.py:213`) and refreshes `ui/tests/e2e/helpers/coverage-audit.md` to 9-of-9 helper coverage (the `seedAcmeProductsChain` "0 specs — currently uncalled" framing was OBE'd by commit `2cbcb93b` which wired the helper into `ui/tests/e2e/guides/06_create_and_monitor_study.spec.ts`). **5-cycle / 13-finding cross-model review**: spec-gen 3 cycles (`**Status (...):**` regex mismatch, state.md scope drift, residual "prepend" contradictions, two-PR rollout shape); impl-plan-gen 2 cycles (dashboard-regen anti-pattern unenforceable given pre-commit hook, story numbering, file-count, AC-2 substring checks); Epic 1 phase-gate 0 findings; final review cycle 1 caught stale-base after sibling-worktree Phase 2 merged mid-flight → rebased onto `bfa8799f` with `-X ours` for dashboard conflicts; final review cycle 2 = 0 findings. Gemini Code Assist posted 3 line-level findings on `feature_spec.md` claiming 3-up paths should be 4-up — all 3 rejected with empirical `ls -d` counter-evidence (hunk-isolated path-counting false positives). CI smoke failed (pre-existing 5+ main pushes failing the same way; captured in `bug_smoke_dashboard_demo_state_locator_missing`); other 6 checks green. Two-PR rollout: PR #250 = content (FRs 1–4), PR B = this finalization commit (FR-5 folder move). **Earlier today** `infra_agent_sibling_worktree_isolation` Phase 1 + Phase 2 admin-merged into `main` as PR #249 squash `22f878f` — 38th MVP1-era artifact. Adds the `## Working in sibling worktrees` section to CLAUDE.md + 5-test regression suite locking its invariants + `scripts/run-tests-in-worktree.sh` automation + `make test-worktree` + 8-test smoke + runbook. Two tangentials captured: `chore_state_md_size_compression` and `bug_dockerfile_venv_root_owned_after_user_switch`. Phase 3 deferred per `phase3_idea.md`. 11-cycle / 39-finding cross-model review; all accepted. **Earlier today** `feat_study_baseline_trial` admin-merged as PR #245 squash `53be6c63` — 36th MVP1-era artifact. Implements the deferred Phase 2 of `feat_pr_metric_confidence` (PR #180): orchestrator runs a single non-Optuna baseline trial before Optuna via the 4-tier params resolver (parent_proposal → parent_study → operator-supplied → template-defaults), persists it as a real `Trial` row with `is_baseline=TRUE` + `optuna_trial_number=-1` sentinel, stamps `studies.baseline_trial_id` + `baseline_metric` via the new `services.study_state.stamp_baseline_trial` chokepoint (FR-12). Existing data-driven consumers flip automatically from "vs runner-up" to "vs baseline" — confidence per-query outcomes (FR-4), auto-followup chain gate (FR-5, now direction-aware — closes a latent minimize-direction bug as a side effect), digest narrative framing, PR body, ConfidencePanel label. New trials-table "Show baseline trial" UI toggle (FR-9). Migration 0020 adds `studies.baseline_trial_id` VARCHAR(36) NULL + `trials.is_baseline` BOOLEAN NOT NULL DEFAULT FALSE + partial unique index `uq_trials_study_baseline_complete` (defense layer 2 of the 3-layer resume-race guard per D-16). **3-cycle spec + 3-cycle plan + 1 CI-fix round.** CI-fix root cause: `test_study_cancel` hung 8min in backend-full because `_wait_for_baseline_trial_by_*` didn't check `study.status` for cancel (production bug — without this, operator-initiated cancel mid-baseline would wait the full `_BASELINE_WAIT_FLOOR_S` 60s before noticing); fixed by adding `_study_cancelled()` helper to bail on every poll tick + extending `_InProcessPool.enqueue_job` to dispatch `run_baseline_trial` inline + monkeypatching `_BASELINE_WAIT_FLOOR_S` = 2.0 in `_running_orchestrator`. Gemini Code Assist: 4 findings, all rejected with cited counter-evidence (3 High duplicates were false positives on `create_trial`'s `**fields: object` signature; 1 Medium was a redundant check already enforced at `FloatParam.model_validator`). **Admin-merged** because smoke is still red on the orthogonal pre-existing dashboard-banner E2E (same `dashboard.spec.ts` + `dashboard-reseed.spec.ts` failures from PR #232/#234/#236), and `ui/tests/e2e/` is untouched by this branch. **Alembic head advanced 0019 → 0020.** Only this finalization docs PR remains. Earlier: `feat_study_clone_from_previous` merged into `main` as PR #243 squash `34118ade` — admin-merged because the pre-existing dashboard demo-state-locator smoke regression (`bug_smoke_dashboard_demo_state_locator_missing`) is still blocking the smoke gate. The feature ships the full clone flow end-to-end: backend `parent_study_id` field + early-placement validation (`PARENT_STUDY_NOT_FOUND` 404, `PARENT_STUDY_WRONG_CLUSTER` 422) + persistence (no migration — column was already present from `0003_study_lifecycle_schema.py`); frontend `PrefillValues` widening + `buildPrefillFromStudy` helper + "Clone study" button on `StudyActionBar` (with running-source confirmation `AlertDialog` per FR-11) + cloned-from banner in `CreateStudyModal` (UI-only `cloneSource` never reaches the wire per D-12); deep-link `?clone_from=` reader on `/studies` with one-shot `useRef` guard + automatic re-arm on `cloneFromId` change (Gemini PR #243 #1 fix); 22 new frontend vitest cases + 7 backend integration cases + 1 Playwright real-backend E2E spec. **15 FRs / 17 ACs** all covered. **3-cycle spec + 3-cycle plan + 1-cycle Epic-1 phase-gate + 1-cycle Epic-2 phase-gate + 1-cycle final-pass + 1-cycle Gemini** = all reviews adjudicated (15 findings total, 2 accepted+fixed, 1 deferred-as-non-regression, 12 rejected with cited counter-evidence or resolved-by-merge). **Two tangential bug ideas surfaced:** [`bug_datatable_col_vis_density_localstorage_undefined_jsdom`](docs/02_product/planned_features/bug_datatable_col_vis_density_localstorage_undefined_jsdom/idea.md) (pre-existing vitest localStorage failures) + [`bug_smoke_dashboard_demo_state_locator_missing`](docs/02_product/planned_features/bug_smoke_dashboard_demo_state_locator_missing/idea.md) (pre-existing smoke regression on dashboard demo-state locators — same failure reproduces on main run #26397500888). **Follow-up:** [`feat_study_clone_narrow_bounds`](docs/02_product/planned_features/feat_study_clone_narrow_bounds/idea.md) (smart-rewrite of search-space bounds around the source's winner trial) remains in `planned_features/` for future scoping. Earlier: after `bug_demo_clusters_unreachable_in_healthz` merged into `main` as PR #236 squash `70b2ae46` — admin-merged because the pre-existing dashboard banner E2E failure still blocks smoke. Closes BOTH smoke-cascade `/healthz` observability bugs (PR #234 + PR #236) surfaced during the PR #232 unblock. **The fix:** new `run_cluster_health_warmup_background` service module spawned from the FastAPI lifespan hook + FR-7 fix to `get_or_probe_health`'s `CredentialsMissing` branch (now writes synthetic unreachable to cache instead of returning without caching). Within ~5s of API startup, `/healthz` reports truthful `elasticsearch_clusters` aggregate counts. **3-cycle spec + 3-cycle plan + 1-cycle phase-gate + 1-cycle final cross-model review** = 4 cycles total of GPT-5.5 review (32 findings, all accepted). **3 CI fix rounds after PR open:** (1) per-page session lifecycle refactor to release asyncpg connections before HTTP probes; (2) env-var gate `RELYLOOP_DISABLE_STARTUP_WARMUP=1` for integration tests to avoid asyncio interleaving with the latent webhook merge-handler row-lock race; (3) `monkeypatch.delenv` for unit test isolation. **Notable tangential bug captured:** [`bug_webhook_concurrent_merge_race_timing_sensitive/idea.md`](docs/02_product/planned_features/bug_webhook_concurrent_merge_race_timing_sensitive/idea.md) — real production-correctness bug in the webhook merge handler's row-lock, deterministically reproducible by adding ANY second lifespan task, masked on main today by pure asyncio-scheduling luck; the next feature that adds a lifespan task will trip it. P2 next-ticket. **Architecture doc** updated with three-path cache-population subsection (registration / lazy on-demand / startup warmup) + race-window caveat. **No new migration; no /healthz response shape change.** Earlier: after `bug_openai_capability_check_incapable_on_valid_key` merged into `main` as PR #234 squash `d69189db` — admin-merged because the pre-existing `bug_demo_clusters_unreachable_in_healthz` failure still blocks the smoke gate at the dashboard-banner E2E layer; this PR fixes the OTHER smoke-cascade bug (the openai capability observability gap). `/healthz` `openai_capabilities` block now carries 5 required fields: the existing `chat / function_calling / structured_output` plus new `models_endpoint: Literal["ok","fail","untested"]` and required-but-nullable `models_endpoint_status_code: int | None`. `_probe_models_endpoint` return contract widened to `tuple[bool, int | None]`; status code captured only on `>= 400` HTTP failure (never on success or network errors — and the response body is NEVER captured, only the integer status, per CLAUDE.md Absolute Rule #10). Cached `CapabilityResult.models_endpoint` schema stays 2-valued — `"untested"` only widens on the response model. Backwards compat verified: pre-fix Redis cache rows deserialize cleanly via Pydantic optional-field defaulting. Spec converged at GPT-5.5 cycle 3 (13 findings, 12 accepted + 1 rejected with counter-evidence); plan at cycle 3 (17 findings all accepted); phase-gate at cycle 1 (3 findings, 2 accepted + 1 rejected — dashboard regen is the auto-run pre-commit hook); final review at cycle 1 (1 Low finding deferred as non-regression — test-helper type hints matching existing file convention). Gemini Code Assist: clean review, zero findings. Tests: +15 cases (7 in test_capability_check.py incl `TestSecurityRedaction` for AC-10 + 5 in test_health.py incl AC-10 end-to-end through `check_capabilities` → Redis JSON round-trip → /healthz + 1 defensive in test_probes.py + 2 in test_health_contract.py). Architecture doc updated with success/failure response examples + repo-secret-vs-`.env` divergence note + cascade explanation. Remaining smoke-cascade item: [`bug_demo_clusters_unreachable_in_healthz`](docs/02_product/planned_features/bug_demo_clusters_unreachable_in_healthz/idea.md) (P2). Earlier: after `feat_home_demo_reseed_endpoint` merged into `main` as PR #228 squash `ad6ff826`. Dev-only `POST /api/v1/_test/demo/reseed` endpoint that wipes the 10 demo Postgres tables + 4 ES/OS indices and re-seeds the 4 demo scenarios from `scripts/seed_meaningful_demos.py`. Dashboard now renders a "Reset to demo state" disclosure inside `StartHereChecklist` whenever all three first-run signals are false. Architecture: dual httpx clients (api + engine) + session-level Postgres advisory lock on a dedicated pinned `AsyncConnection` + NO outer wall-clock timeout (per-call HTTP ceiling only) + TRUNCATE-commits-before-self-call invariant (AC-13) + cleanup-on-failure pass via a fresh DB connection. 14 GPT-5.5 spec cycles + 14 plan cycles to convergence; 2 Gemini Medium findings + 1 GPT-5.5 High + 2 Medium accepted, 1 Medium rejected as stale, 1 Low deferred. The High-severity fix: the in-container OpenSearch port resolver was mapping `localhost:9201` → `opensearch:9201` but the OS container actually listens on `:9200` inside the Compose network (the host `:9201` is just the port-mapping to avoid colliding with ES on the host). Tests: backend unit+contract 1560 pass (+45 vs the prior baseline); 10 integration tests at `backend/tests/integration/test_demo_seeding{,_timeout}.py` covering AC-1..AC-5 + AC-12..AC-16 (skip outside CI service containers); 21 dashboard vitest cases; 1 Playwright spec at `ui/tests/e2e/dashboard-reseed.spec.ts`. New runbook at [`docs/03_runbooks/demo-reseed-debugging.md`](docs/03_runbooks/demo-reseed-debugging.md). Tangential capture: [`bug_vitest_jsdom_localstorage_failures/idea.md`](docs/02_product/planned_features/bug_vitest_jsdom_localstorage_failures/idea.md) — 31 pre-existing vitest failures in 4 files all touching `window.localStorage`, confirmed unrelated to this PR by stashing the feature branch and reproducing on baseline. **Alembic head unchanged at `0019_digests_suggested_followups_jsonb`** — no schema change.) Prior update: 2026-05-23 (after `chore_study_default_stop_conditions` merged into `main` as PR #215 squash `370c87d9` — **first MVP1.0-cleanup chore shipped** in the operator's stated "finish MVP1.0 before MVP1.5" sweep. Frontend-only chore (~175 LOC production code + 12 vitest cases): pre-fills `max_trials = 200` in the create-study modal's Step-5 form (FR-1, baked into `useForm` `defaultValues`), adds a 4-button Stop-condition preset selector (Focused 50 / Standard 200 / Deep 1000 / Custom) above the numeric inputs (FR-2..FR-4 + FR-7), refreshes `study.max_trials` + `study.time_budget_min` glossary copy with dimensionality-keyed framing and adds a new `study.preset` glossary entry (FR-5 + FR-8), updates the chat orchestrator system prompt to recommend `max_trials=200` by default with the dimensionality scaling guidance (FR-6), and ships a 12-case vitest suite in [`ui/src/__tests__/components/studies/create-study-modal.stop-conditions.test.tsx`](ui/src/__tests__/components/studies/create-study-modal.stop-conditions.test.tsx) covering AC-1..AC-6 + 2 bug-guards + AC-8 + AC-10 + the type="button" check (FR-9). `activePreset` is derived purely from form values via `useMemo` (no `useState` + watcher `useEffect`); Custom click is a no-op (Custom == "values don't match any preset"). Cross-model review: spec converged at GPT-5.5 cycle 3 (12 findings — 11 accepted across cycles 1-3, 1 rejected with cited counter-evidence at `backend/app/api/errors.py:62, 118` re: VALIDATION_ERROR envelope claim); plan converged at cycle 2 (5 cycle-1 findings, all accepted; 0 cycle-2 findings); impl-diff GPT-5.5 cycle 1 raised 1 Medium finding (modal-open form-field reset gap, Radix Dialog mount-persistence bug), addressed in subsequent fix iterations; Gemini Code Assist 1 Medium finding (Defer — pre-existing form-state persistence across modal toggles, out of scope for this chore). **Late-stage E2E regression + root-cause fix**: `studies-create-builder.spec.ts:130` + `studies-create-target-dropdown.spec.ts:48` started failing against the production UI image — Playwright's `.fill('10')` on a non-empty Max trials input (200 default) was triggering a stray form-submit event before the test's explicit submit-button click, leaving the button stuck in `Submitting…` while Playwright retried the click against a vanishing button. Seven mechanical fix attempts (drop `form` from useEffect deps, modal-overflow scrolling, prev-open `useRef` gating, in-effect setValue removal, RHF subscription watcher, useMemo-derived activePreset, Enter-key suppression) didn't move the needle. The actual fix decouples submission from the form's `onSubmit` event entirely: `
e.preventDefault()}>` + the submit button changed from `type="submit"` to `type="button"` with `onClick={form.handleSubmit(onSubmit)}` — submission goes through exactly one path, the explicit submit-button click. Reproduced locally against the rebuilt production UI image (5/5 E2E green post-fix incl. both previously-failing ones). Also bundled: `seed.ts` ES_BASE switched from `localhost` to `127.0.0.1` (Node's IPv6-first resolver was hitting `::1` against an IPv4-only ES bind). Tests: UI vitest 98/98 studies + 730+ overall green; backend full-coverage green; smoke 70+/70 Playwright. CI: 7/7 jobs green on the final SHA. **Alembic head unchanged at `0017_proposals_last_polled_at`** — frontend-only chore at the application layer. Tangential capture: none — the seven ruled-out fix attempts are documented in the merged PR's commit history rather than as a deferred chore. Earlier: after `chore_reconciler_terminal_closed_no_poll` merged into `main` as PR #216 squash `95d4c414` — **Tier A of the reconciler polling-cost polish layer shipped**; predicated on the same-day `bug_pr_reconciler_blocked_by_closed_fallback` (PR #204) that widened the candidate set to include `pr_state='closed'` rows. New nullable `proposals.last_polled_at TIMESTAMPTZ` column via migration `0017_proposals_last_polled_at`. `list_pr_opened_proposals_for_reconcile` gains a 24-hour exclusion clause: rows with `pr_state='closed' AND last_polled_at > now() - interval '24 hours'` are excluded from each tick. The reconciler's `elif state == "closed":` branch now branches on the candidate's selection-time `pr_state` to avoid the webhook-reopen clobber race — selected-as-open candidates run the legacy `mark_proposal_pr_closed` transition (no stamp); selected-as-closed candidates skip the close helper entirely and call the new `stamp_proposal_last_polled_at` helper (whose defensive `WHERE pr_state='closed'` guard returns `None` as a benign no-op if a webhook flipped the row mid-tick). Effect: case-(b) (closed-without-merge) rows get polled at most once per 24 hours instead of once per tick — a ~288× reduction on the default 5-minute cadence under MVP1's single-worker Arq deployment. Case-(a) recovery is unaffected for first-observation rows; the narrow race where `(merged=false, closed)` was observed once before GitHub flipped to `merged=true` accepts a worst-case 24-hour latency increase (documented in spec §11 flow 3). Cross-model review: spec converged at GPT-5.5 cycle 5 (8 findings, all Accepted); plan converged at cycle 3 (3 findings — 2 Accept-Low + 1 Reject with cited counter-evidence at `proposal.py:513-523` re: pr_state filter that doesn't currently exist). Gemini Code Assist clean pass ("I have no feedback to provide"). Final GPT-5.5 review surfaced 1 Low finding (runbook path), rejected with cited counter-evidence: plan called for `pr-open-debugging.md` but the reconciler runbook is `webhook-debugging.md` (grep at implementation time confirmed zero reconciler refs in pr-open-debugging.md). Tier B (terminal `pr_closed_unmerged` status enum) explicitly out of scope per the spec — captured in [`idea.md`](docs/00_overview/implemented_features/2026_05_23_chore_reconciler_terminal_closed_no_poll/idea.md) §"Tier B" for future UX-brief gating. Tests: 16 new integration tests (9 repo + 7 worker — covering AC-2/3a/3b/4/5/6/7/8/9-race/9-reclose/10); existing `test_proposal_repo_webhook.py` + `test_pr_reconcile_config_repo_pointer.py` continue to pass; `test_migrations.py` head assertions bumped to `0017`; `test_migration_0016.py` pinned to specific revision via explicit downgrade/upgrade so it stays robust against future migrations. Migration round-trip verified clean against the shared Postgres. **Alembic head moved to `0017_proposals_last_polled_at`.** Tangential capture: [`chore_migration_test_head_brittleness/idea.md`](docs/02_product/planned_features/chore_migration_test_head_brittleness/idea.md) (P3) — `test_migrations.py:130,155` hardcodes the expected head version; every new migration requires a sympathy edit. Proposed fix: dynamic `_current_head()` helper reading from `alembic heads`. Earlier: after `bug_dashboard_banner_dismiss_persistence_flake` merged into `main` as PR #213 squash `a8b788c` — **fifth MVP1.0-cleanup bug shipped**; closes the last MVP1.0 `bug_*` item in the operator's stated "finish MVP1.0 before MVP1.5" sweep. The `Dismiss persists across reload (FR-7, AC-3)` Playwright test at [`ui/tests/e2e/dashboard.spec.ts:63`](ui/tests/e2e/dashboard.spec.ts#L63) flaked intermittently on CI smoke runs because it used `context.addInitScript` to clear the dismissed flag from localStorage — but init scripts run on every page initialization INCLUDING `page.reload()`, so the post-dismiss reload re-cleared the flag, racing with React hydration to decide whether the banner stayed hidden. The banner code at [`demo-data-banner.tsx:55-63`](ui/src/components/dashboard/demo-data-banner.tsx#L55-L63) was correct (conservative SSR snapshot returns `true` so banner starts hidden, then hydration reads localStorage) — the test was fighting the design. Race surfaced twice on PR #193 smoke CI on unrelated changes, confirming it was the test pattern at fault. Fix per preflight-locked Option A: replace `context.addInitScript` with a one-shot `page.evaluate` + `page.reload` sequence so the localStorage cleanup runs ONCE before the first user-facing assertion; the post-dismiss reload has no init script interfering with hydration. Option B (assert localStorage directly via `page.evaluate(() => window.localStorage.getItem(...))`) rejected because it would change the test's intent from "banner stays hidden" to "localStorage was written" and silently miss a future regression where storage writes succeed but the banner re-renders. Verification: 5/5 local Playwright runs of the changed test pass deterministically post-fix (vs latent race pre-fix). Cross-model review: Gemini Code Assist clean pass (0 findings); GPT-5.5 final review skipped per threshold (~15 LOC, test-only, no flagged subsystem). **Alembic head unchanged at `0016_config_repos_last_merged_proposal_id`** — frontend-test-only change. **MVP1.0 cleanup queue now bug-free; remaining items are 3 P2 chore_* + 1 Backlog chore_*.** Earlier: after `bug_dashboard_classifier_half_step_releases` merged into `main` as PR #211 squash `ab8674a` — **fourth MVP1.0-cleanup bug shipped** in the operator's stated "finish MVP1.0 before MVP1.5" sweep. Operator noticed via `/pipeline status` that `feat_ubi_judgments` (the MVP1.5 anchor) was ranking #1 in the MVP1.0 backlog despite the MVP1.5 release tier having been introduced 2026-05-23 via PR #200. Root cause: the dashboard regen script's `_target_release` classifier was never updated when MVP1.5 landed — three concrete gaps (`_RELEASE_SUFFIX_RE` integer-only `r"_mvp(\d+)$"` pattern; `_RELEASE_STATUS_RE` only matched `Held for MVPN` framing with integer N; `ROADMAP_RELEASES` had no `mvp1.5` row) plus a secondary input-scoping bug surfaced during implementation (`_load_planned` passed `status_line + " " + (idea or "")` to the classifier, so body prose that quoted release-tag phrases as documentation examples got misclassified — this bug's own idea.md was the self-collision canary). Fix extends `_RELEASE_SUFFIX_RE` to `r"_mvp(\d+(?:_\d+)?)$"` with `"1_5"` → `"1.5"` normalization, extends `_RELEASE_STATUS_RE` to match both `Held for MVPN` and `anchor for MVPN` / `anchor feature for MVPN` framings with integer-or-decimal captures, inserts `("mvp1.5", "MVP1.5 / v0.1.5", "Real Signals")` into `ROADMAP_RELEASES`, extracts a new `_release_filename_safe(release)` helper for dot→underscore filename normalization used by all 4 sites (file-write at `_dashboard_paths` + 3 link-render sites: `render_markdown` "rich local view" callout, `render_roadmap_html` cards, `render_roadmap_markdown` table cells), and scopes `_load_planned`'s classifier input to `status_line` only (matching the existing `_load_implemented` pattern). Folder rename mid-fix: `bug_dashboard_classifier_missing_mvp1_5` → `bug_dashboard_classifier_half_step_releases` because the original ended in `_mvp1_5` and triggered the new regex on itself. **idea.md's front-matter note documents the general rule: feature folders *about* a release shouldn't use the literal `mvp1_5` substring in their descriptive tail.** Cross-model review: Gemini Code Assist 4 findings (1 High root cause + 3 Medium consequences — all link-site normalization drift) all **accepted** in `d903558`; new test `test_no_raw_release_tag_in_link_renderers` reads the script source and forbids the drift pattern from recurring. GPT-5.5 final review 1 Low finding (helper-level test doesn't catch a regression at the caller) **accepted** in `9e3b095` — added `TestLoadPlannedReleaseScoping` class with 2 tmp_path fixtures exercising `_load_planned` directly. Tests: 1163 unit (was 1143 + 20 new in `test_dashboard_release_classifier.py` — 14 initial + 4 cycle-1 + 2 cycle-2 cases covering suffix recognition, status-line recognition, body-prose-not-matched, filename normalization, link-site drift sentinel, and caller-level scoping). End-to-end on the live filesystem post-fix: `mvp1: 96 features, mvp1.5: 1 features, mvp2: 5 features` (was `mvp1: 95, mvp2: 5`); new `MVP1_5_DASHBOARD.md` + `mvp1_5_dashboard.html` exist with `feat_ubi_judgments` as the only Idea row; `MVP1_DASHBOARD.md`'s Idea table dropped `feat_ubi_judgments`; the next MVP1.0 backlog item is now correctly surfaced. **Alembic head unchanged at `0016_config_repos_last_merged_proposal_id`** — purely a regen-script + dashboard data-layer change. Tangential capture (in-body note): folder-name convention should add a one-line rule to `feature_templates/README.md` about avoiding `_mvpN_M` in descriptive tails; deferred since this is the first instance. Earlier: after `bug_dashboard_depends_on_column_bloat` merged into `main` as PR #208 squash `8bb7148` — **third MVP1.0-cleanup bug shipped** in the operator's stated "finish MVP1.0 before MVP1.5" sweep. The MVP1 dashboard's "Depends on" column rendered impossibly-large lists for two shipped features (`feat_chat_agent` 46→**10** entries, `chore_tutorial_polish` 42→**11** entries) — including features that shipped weeks later and still-planned ideas. Root cause was NOT what the original idea.md claimed (the parser was already correctly scoped to the `- Depends on:` line); the actual bug lived in the `DEPS_ALL_BACKEND` sentinel-expansion block at [`scripts/build_mvp1_dashboard.py:707-714`](scripts/build_mvp1_dashboard.py), which expanded the "ALL prior backend features" / "ALL prior MVP1 features" prose marker against the **current snapshot** of all `infra_*`/`feat_*` folders without any time-ordering filter. Only two features use the marker (verified by `grep -rlE "ALL prior backend|all backend|ALL prior MVP" docs/`), so the fix surface is narrow. Fix extracted a module-level `_expand_transitive_deps(features)` helper + new `_merge_order_key()` tuple sort `(merged_date, pr_number, folder)`. For shipped features the expansion is filtered to peers strictly earlier in merge order; for planned features the full-snapshot expansion is preserved. Conservative-exclusion: anything with missing fields (no `merged_date` or no `pr_number`) sorts to end-of-time, so the helper excludes ambiguously-ordered peers rather than risk including post-shipment ones. Cross-model review: Gemini Code Assist 1 Medium finding **accepted** in `3261b06` — the self-dep guard `set(explicit) | scoped - {f.folder}` was Python-set-operator-precedence-bound to subtract only from `scoped`; moved the parens to `(set(explicit) | scoped) - {f.folder}` so self-refs from BOTH the explicit list and the sentinel expansion get dropped (aligned with the existing comment's stated intent — no real-data effect since no shipped feature lists itself). GPT-5.5 final review 1 Low finding **accepted** in `eecec9b` as doc tightening — bug_fix.md's "all other rows byte-identical" claim was overstated because the protocol-required tangential idea file (`chore_dashboard_pr_extraction_from_idea`) adds one Idea row. Tests: 1138 unit (was 1128 + 10 new in [`backend/tests/unit/scripts/test_dashboard_expand_transitive_deps.py`](backend/tests/unit/scripts/test_dashboard_expand_transitive_deps.py) — 6 expansion cases + 4 merge-order sort key cases); regression test fails on `main` with `ImportError: cannot import name '_expand_transitive_deps'`, passes on the branch. End-to-end regen confirmed bloat fix: feat_chat_agent's 10 deps are exactly the `infra_*`/`feat_*` folders shipped ≤2026-05-12 minus itself. Tangential capture: [`chore_dashboard_pr_extraction_from_idea`](docs/02_product/planned_features/chore_dashboard_pr_extraction_from_idea/idea.md) — `_extract_pr_number` only reads `pipe + plan + spec`, not `idea.md`, so legacy implemented features (e.g., `infra_frontend_stack_refresh`) that only have `idea.md` sort to end-of-day in `_merge_order_key` and get excluded from same-day peers' deps. Minor data gap, not a correctness regression. **Alembic head unchanged at `0016_config_repos_last_merged_proposal_id`** — purely a regen-script change at the developer-tooling layer. Earlier: after `bug_contract_test_stub_missing_target_filter_kwarg` merged into `main` as PR #206 squash `d3fbbce` — **second MVP1.0-cleanup bug shipped** in the operator's stated "finish MVP1.0 before MVP1.5" sweep. Two `_Stub` classes in [`backend/tests/contract/test_error_codes.py:195-206` + `:238-249`](backend/tests/contract/test_error_codes.py) crashed locally with `TypeError: _Stub.list_targets() got an unexpected keyword argument 'target_filter'` whenever Elasticsearch was reachable to the test process — pre-existing drift from `feat_cluster_target_filter` (PR #168, 2026-05-20) which extended the `SearchAdapter` Protocol + the production caller at [`clusters.py:359`](backend/app/api/v1/clusters.py#L359) without updating the contract-test stubs. CI didn't catch it because `_body()` hardcodes Compose-network DNS `base_url="http://elasticsearch:9200"` but GHA service containers bind at `localhost:9200`; cluster registration fails the verification step → both tests hit `pytest.skip("Could not register cluster — ES likely unreachable")` and the broken stubs are never exercised. Fix is 6 LOC: add `target_filter: str | None = None` to both stub signatures (the kwarg is unused — stubs raise immediately — it just needs to be accepted to match the Protocol at [`protocol.py:131-136`](backend/app/adapters/protocol.py#L131-L136)). The crash happened BEFORE the test's actual `TARGETS_FORBIDDEN` / `CLUSTER_UNREACHABLE` envelope assertions, so the tests were effectively dead — they no longer verified what they claimed to verify. Cross-model review: Gemini Code Assist 2 Medium findings (both `**_kwargs` suggestions for future-proofing) — both **deferred** with cited rationale (the idea's "Anti-pattern note" explicitly chose explicit named kwargs over `**kwargs` to keep drift detection LOUD; the systemic "shared `_BaseStubAdapter` synced via `typing.Protocol` + `mypy --strict`" fix is held as a future chore contingent on drift recurrence — `**_kwargs` would silence exactly the failure mode this PR catches). GPT-5.5 final review skipped per threshold (14 LOC, 2 files, no flagged subsystem). Tests: backend contract suite **291 passed** (was 282 passed + 2 failed pre-fix); the 2 previously-failing tests now exercise the actual envelope assertions and pass. **Alembic head unchanged at `0016_config_repos_last_merged_proposal_id`** — purely test-only change. Tangential observations sweep: none found. Earlier: after `bug_pr_reconciler_blocked_by_closed_fallback` merged into `main` as PR #204 squash `a0ca5b9` — **first MVP1.0-cleanup bug shipped** in the operator's stated "finish MVP1.0 before MVP1.5" sweep. The PR reconciler can now recover proposals stranded in `(pr_opened, closed)` by the webhook's `merged_at=null` eventual-consistency fallback: widened `list_pr_opened_proposals_for_reconcile` to include `pr_state='closed'` candidates, added `mark_proposal_pr_merged_from_closed` doing the atomic `(pr_opened, closed) → (pr_merged, merged)` UPDATE, branched the reconciler on `proposal.pr_state` to route to the right helper, and added a `pr_reconcile_recovered_eventual_consistency` INFO log for operator grep handles. FR-3a pointer-update fires from both paths. Genuinely-closed-unmerged proposals (case b — operator closed without merge) now also enter the candidate set; they become benign no-ops via the existing `mark_proposal_pr_closed` `pr_state='open'` guard but get re-polled every reconciler tick — captured as [`chore_reconciler_terminal_closed_no_poll/idea.md`](docs/02_product/planned_features/chore_reconciler_terminal_closed_no_poll/idea.md) (P2 polish, ~50 LOC Tier A: add `last_polled_at` column + exclude recently-polled closed rows). Skill chain: `/idea-preflight` rewrote the Problem section after discovering the prior diagnosis was incomplete — the actual primary blocker is the candidate-query filter, not the WHERE clause in `mark_proposal_pr_merged` (the reconciler never even sees fallback-closed proposals because they're filtered out at candidacy); `/bug-fix` produced [`bug_fix.md`](docs/00_overview/implemented_features/2026_05_23_bug_pr_reconciler_blocked_by_closed_fallback/bug_fix.md) locking Option B (new repo helper) over Option A (two-UPDATE reopen+merge) for single-conditional-UPDATE parity with every other `mark_proposal_pr_*` helper; `/impl-execute --ad-hoc` ran the standard ceremony — pre-push gate green, 7/7 CI checks pass, Gemini Code Assist clean ("I have no feedback to provide"), GPT-5.5 final review 1 Low finding (accepted-partial in `7613aab` — bug_fix.md tangential-observations section flipped from "None" to record the chore link; rejected-partial — "returns BOT" in dashboard was the regen script's standard 200-char truncation, not corruption). Regression test pivoted from negative-documentation to positive recovery + new case-(b) no-op lock: `test_reconciler_recovers_fallback_closed_proposal` + `test_reconciler_noops_on_genuinely_closed_unmerged` in [`test_pr_reconcile_config_repo_pointer.py`](backend/tests/integration/test_pr_reconcile_config_repo_pointer.py). Verified via stash-revert that the recovery test fails on `main` (`candidates=0` — fallback-closed rows invisible to candidate query). Runbook §8 paragraph at [`webhook-debugging.md`](docs/03_runbooks/webhook-debugging.md) flipped from "Known limitation" to "Eventual-consistency recovery" with the new helper named. **Alembic head unchanged at `0016_config_repos_last_merged_proposal_id`** — purely additive at the application layer. Tests: 1128 unit (unchanged), 3 reconciler-pointer integration tests passing, 54 reconciler/webhook integration sweep clean, 235 contract. Earlier: after MVP1.5 / v0.1.5 "Real Signals" tier introduced to the canonical release matrix via PR #200 squash `594f7b4` — new interstitial release between MVP1 and MVP2, anchored on **OpenSearch UBI** (User Behavior Insights — the engine-neutral standardized event-capture schema championed by OSC) as a first-class judgment source. New [`feat_ubi_judgments/idea.md`](docs/02_product/planned_features/feat_ubi_judgments/idea.md) (P1, ~3.5 KB) captures the planned-feature scope: `UbiReader` (engine-agnostic; reads `ubi_queries` + `ubi_events` via any `SearchAdapter.search_batch`) + pluggable `SignalsConverter` Protocol (position-bias-corrected CTR, dwell-time, hybrid UBI+LLM) + `POST /api/v1/judgment-lists/generate-from-ubi` + `generate_judgments_from_ubi` agent tool. No schema migration required — rides the existing `judgments.source = 'click'` enum that has shipped since MVP1. Spec patches in [`docs/00_overview/product/relevance-copilot-spec.md`](docs/00_overview/product/relevance-copilot-spec.md): §1 summary (5 releases → 6), §14 (rewritten with UBI as engine-neutral primary path; collapsed the prior per-engine Fusion-specific subsection into a single trailing paragraph per new `feedback_de_emphasize_fusion` memory), §19 (`generate_judgments_from_ubi` tool added; existing `pull_signals` retargeted from v1.5+ to MVP3), §27 (release-timeline table adds MVP1.5; new MVP1.5 subsection inserted between MVP1 and MVP2; post-GA `v1.5` renamed to `v1.5+` with Fusion-Signals bullet removed; one signals-reader bullet added to the MVP3 subsection). Canonical release matrix updated in both [`tech-stack.md`](docs/01_architecture/tech-stack.md) and the CLAUDE.md mirror. Origin: external review on 2026-05-22 (LinkedIn outreach to a senior search engineer at a relevance-tooling company) flagging UBI as a stronger trust anchor than LLM-as-judge for v1. Gemini Code Assist 5 Medium findings: 1 accepted in `b2d1a37` (§27 arrow-sequence fix `(MVP1 → MVP1.5 → MVP2 → MVP3 → MVP4 → GA v1)`); 4 deferred — pre-existing dashboard "Depends on" column parser bug verified by `git show main:MVP1_DASHBOARD.md` showing 45 backtick'd entries on `feat_chat_agent` PRE-PR (PR #200 added one more, bringing it to 46); captured as [`bug_dashboard_depends_on_column_bloat/idea.md`](docs/02_product/planned_features/bug_dashboard_depends_on_column_bloat/idea.md) (P2). No CI run on `pr.yml` — docs-only PR caught by `paths-ignore`; `secrets-defense` + `gitleaks` both green. `Alembic head unchanged at 0015_trials_per_query_metrics` — planning-only change, no code. Earlier: after `infra_ir_measures_migration` merged into `main` as PR #198 squash `350b2fc` — **31st MVP1-era artifact shipped**, 8 stories across 1 epic. Swaps the IR-evaluation engine in `backend/app/eval/scoring.py` to `ir_measures` (PyTerrier team, actively maintained). Public API of `score()` FROZEN; persisted JSONB key shape FROZEN; aggregate computed via `ir_measures.iter_calc()` + manual mean (NOT `calc_aggregate` — see plan cycle-2 C2-F4); per-query universe filtered to mirror the prior evaluator's qid set on edge cases. **No migration, no schema change** — Alembic head unchanged at `0015_trials_per_query_metrics`. Cross-model review trajectory: spec 3 GPT-5.5 cycles (11→6→1 findings, all accepted); plan 3 GPT-5.5 cycles (10→4→1 findings, 14 accepted + 1 rejected with cited counter-evidence at scoring.py:74-78); phase-gate cumulative-diff review (10 findings — 5 accepted + applied in `b5dbaa3`, 3 rejected with cited counter-evidence: the mypy override was correctly dropped, the gitignored release-notes file can't appear in diffs, test files enumerating forbidden tokens are semantically allowlisted; 2 deferred to post-impl); Gemini Code Assist (3 findings — 1 already-resolved by 352d60f pre-Gemini-post, 2 accepted + applied in 90884ed: switched `obj_repr_to_user: dict[str, str]` keyed by `repr(obj)` to `obj_to_user: dict[Measure, str]` keyed by Measure object directly — ir_measures Measure objects implement __hash__ + __eq__ correctly); final GPT-5.5 review (4 findings — 2 accepted + applied in a6b954d for CLAUDE.md/optimization.md package-name removal + parity-test docstring reword, 1 rejected with cited counter-evidence: ir_measures METADATA classifier confirms Apache 2.0 license; 1 deferred to finalization: dashboard PR# auto-fixes when folder moves). Tests: 1128 unit (was 1077 pre-migration; +51 from 30 parity cases + per-query shape + 12 regex enumeration + 9 sanity-check), 30/30 (metric, k) parity cases match the prior evaluator to 1e-6, per-query shape parity confirms outer-qid + inner-metric-key + per-(qid, metric) value parity at 1e-6, AC-12 existing-row read regression exercises all three consumers (fetch_study_confidence directly + via API + digest-worker top-trials SELECT). Q5 perf benchmark passes under existing 100ms/query threshold; Q4 resolution: outcome (a) — default `ir_measures` provider routing produces parity, no forcing needed. Operator-visible string change: `INSUFFICIENT_JUDGMENT_OVERLAP` error message at studies.py:313 now names `ir_measures`; no API contract change. Bundled inline: `backend/app/services/test_seeding.py` `p@10` → `precision@10` (pre-existing inconsistency; spec §2 C2-F5). New permanent infrastructure: `docs/00_overview/dashboard_overrides/` directory + `scripts/build_mvp1_dashboard.py` override mechanism lets future library swaps update historical-feature dashboard rows without back-editing frozen implemented-feature specs. CI green on every push iteration (4 pushes — 2 CI fixes + 1 Gemini fix + 1 final-review fix on final SHA `a6b954d` landing as squash `350b2fc`); 5/5 jobs incl. smoke + backend full-coverage. **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. Earlier: after `chore_guides_glossary_route` + `chore_guides_faq` + `chore_guide_06_screenshot_refresh_confidence_panel` bundled into `main` as PR #195 squash `ea2b242` — **28th, 29th, and 30th MVP1-era artifacts** shipped in one PR. Three siblings under `/guide/*` bundled per "one branch, one PR" memory. New `/guide/glossary` route renders the 109-entry `ui/src/lib/glossary.ts` constant with substring search + 8 prefix-derived category facet chips + deep-link anchors (`#study.metric.ndcg`); 10 walkthrough `script.md` files gain a footer link to it. New `/guide/faq` route renders a fresh 19-entry typed `ui/src/lib/faq.ts` (categories: setup-and-install/studies-and-confidence/judgments/proposals-and-prs/chat-agent) with the same search + facet + anchor contract; entries' questions self-link for sharing. Guide-06 demo Playwright spec waits up to 45s for `[data-testid="confidence-panel"]` before screenshotting → `04-study-detail.png` now captures the ConfidencePanel partial-shape view (headline metric without CI band, Robust plateau runner-up gap, per-query outcomes 0/4/0); script.md narrative gains a Monitoring sub-section describing the three signals with cross-links into glossary + FAQ. Five SKILL.md gate edits ship together (impl-execute Step 2.5 FAQ-shaped catch-net + Step 3 terminology/drift/decision-point bullets; spec-gen Step 3 #11 tooltip-cites-glossary-key; impl-plan-gen line 111 per-tooltip checklist gains glossary key + source-of-truth comment target) — all locked by a new `glossary-gate-skill-edits.test.ts` that reads each SKILL.md from disk and grep-asserts the enforcement clauses (same-PR default, escape-hatch gating, Step 8 blocking, no-drift-escape, the literal `// Source-of-truth:` marker). New shared `ui/src/lib/markdown-safety.ts` exports `MARKDOWN_DISALLOWED_ELEMENTS` consumed by 4 surfaces (glossary route + FAQ route + HelpPopover + MarkdownDoc) — extracted in response to a Gemini security-medium finding + earlier GPT-5.5 cycle-2 F10 spec finding. Pivoted away from `/pipeline` mid-flow: glossary went through `/spec-gen` with 3 GPT-5.5 cross-model review cycles (10 findings adjudicated and applied) producing a committed `feature_spec.md` design reference; user observed pipeline ceremony was disproportionate for ~200 LOC per item; FAQ shipped without a formal spec (the `ui/src/lib/faq.ts` JSDoc + skill edits are the design surface). Cross-model review: GPT-5.5 spec 3 cycles (10 findings — 6 cycle-1 + 4 cycle-2 + 0 cycle-3 convergence — all accepted: §1 outcome rewrite, scope cross-ref fix, test-name canonicalization, DoD path alignment, FR-8c lead-in fix, FR-8a escape-hatch tightening, FR-8a glossary.ts path-ref, ACs locked enforcement clauses, vitest path-resolution guidance, AC-7 source-grep → behavioral DOM assertion); Gemini Code Assist 5 Medium findings (2 accepted in `` — unused `Card*` imports + shared `MARKDOWN_DISALLOWED_ELEMENTS`; 3 rejected with cited counter-evidence — no "FR-7" in faq.ts; `feat_pr_metric_confidence` slugs are deliberate codebase-grep handles for engineer audience; `ui/src/components/ui/card.tsx:9` Card primitive uses identical hardcoded `border-gray-200 bg-white text-gray-900` — matching established precedent). Tests: UI vitest **706/706** (was 639 — +67 across 6 new test files: `app/guide/glossary/page.test.tsx` 18, `app/guide/glossary/safety-filter.test.tsx` 2 [isolated because vi.doMock leaks across tests], `app/guide/faq/page.test.tsx` 18, `app/guide/page.test.tsx` 4, `skills/glossary-gate-skill-edits.test.ts` 18, `guides/script-footer.test.ts` 12); 2 new real-backend Playwright specs (`glossary.spec.ts` 7 + `faq.spec.ts` 6); demo Playwright regen on guide-06 (4 PNGs updated). CI: 1 fix push required after first push — stale `glossary-section` data-testid in `glossary.spec.ts` after the FAQ commit renamed it to `reference-section` (vitest was updated, Playwright spec missed); fix landed in commit before merge. 5/5 jobs green on final SHA. **Alembic head unchanged at `0015_trials_per_query_metrics`** — frontend-only feature, no backend code. Earlier: after `feat_study_preflight_overlap_probe` merged into `main` as PR #193 squash `ca835e0` — **27th MVP1 feature shipped**, 3 stories across 1 epic. Tier-2 create-time guard sitting between Tier 1 (string-equality target-mismatch, PR #184) and Tier 3 (mid-flight zero-streak abort, PR #191). `POST /api/v1/studies` now issues a single bounded `ids`-existence search against the study's target index after `JUDGMENT_TARGET_MISMATCH` and before config-serialize. When fewer than `min(MIN_OVERLAP=3, max(judged_doc_count, 1))` judged doc IDs are present, returns 422 `INSUFFICIENT_JUDGMENT_OVERLAP`. When the cluster is unreachable / probe times out / engine rejects the bare ids body, the probe emits a `studies.preflight.overlap_probe.skipped` WARN log with `reason ∈ {unreachable, timeout, invalid_query_dsl}` and the study creates 201 — consistent with "tolerate transient adapter failures at write time." Locked decisions per spec §19: ids-existence probe (NOT template-rendered — avoids parameter-synthesis brittleness), 2-tier cap-aware threshold (Q1 → B), fall-through on cluster-unreachable (Q2 → A), `strict_errors=True` on adapter call, module-level constants `MIN_OVERLAP=3 / PROBE_TIMEOUT_S=2.0 / MAX_PROBED_DOCS=200` (no `Settings` field), single representative qid K=1, `OverlapProbeResult` frozen dataclass return type, dict-key unpacking via `result.get("overlap_probe", [])`. Cross-model review: spec 3 cycles (14/7/4 findings — 23 accepted + 2 rejected with cited counter-evidence: `Query.id` is `Mapped[str]` String(36) not native UUID; UNIQUE on `(judgment_list_id, query_id, doc_id)` already guarantees DISTINCT so no `DISTINCT` keyword needed); plan 3 cycles (6/4/3 findings — 13 accepted, 0 rejected); phase-gate cumulative-diff GPT-5.5 (5 findings — 2 applied in `396da73` for runbook formula + `_log_helpers.py` convention, 2 deferred as `infra_study_preflight_real_engine_integration` + `chore_studies_post_arq_spy_fixture` idea files, 1 rejected with state.md-finalization-convention counter-evidence); Gemini Code Assist (1 Medium finding rejected with cited counter-evidence — Python 3.13 pin + ruff UP041 enforce bare `TimeoutError`); final GPT-5.5 (2 Medium findings — 1 rejected via `asyncio_mode = "auto"` in pyproject.toml:165 + sibling `test_dispatch_run_query.py` precedent, 1 accepted-as-documented for the unplanned E2E seed change anticipated by plan §3.5). New backend module `backend/app/services/study_preflight.py` (~180 LOC) + 2 new repo functions in `query.py` / `judgment.py` + handler integration in `studies.py` (between JUDGMENT_TARGET_MISMATCH line 283 and config-serialize line 286) + `INSUFFICIENT_JUDGMENT_OVERLAP` row in `api-conventions.md` + recovery paragraph in `study-lifecycle-debugging.md` + source-presence ordering test in `test_studies_api_contract.py` (locks `target_pos < probe_pos < overlap_pos < config_pos`). E2E seed helper at [`ui/tests/e2e/helpers/seed.ts`](ui/tests/e2e/helpers/seed.ts) extended with `bulkIndexDocsToES()` (POSTs NDJSON `_bulk` with `refresh=wait_for` to the host-side ES at `PLAYWRIGHT_ES_BASE_URL` default `http://localhost:9200`) so synthetic `e2e-doc-N` IDs are present in the cluster's target index — without it, the new probe rejects every seeded study. Tests: 1044 backend unit (was 1040, +4 in `backend/tests/unit/services/test_study_preflight.py`); backend integration +14 test functions / 18 parametrized cases — AC-1..AC-4b handler-level via `probe_judgment_overlap` monkeypatches, AC-5/AC-6 spy that the probe is NOT invoked on Tier 1 fail paths, AC-7/AC-8/AC-10/AC-11/AC-13 adapter-layer via `_FakeProbeAdapter` + `_install_real_probe_with_fake_adapter` (monkeypatches `study_preflight.acquire_adapter` to bypass CI's missing `CLUSTER_CREDENTIALS_FILE`), AC-9 empty-judgments path, AC-12 read-path negative; backend contract +2 (envelope shape + source-presence ordering lock); 1 autouse fixture (`_default_overlap_probe_passes`) installs a sufficient `OverlapProbeResult` so existing happy-path tests don't 422 on the new probe. CI green on every push iteration (3 pushes — 2 failures + 1 success on final SHA `b11a13d`-equivalent landing as `ca835e0`); 5/5 jobs incl. smoke 70+/70 Playwright. **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. Tangential captures during the session: [`infra_study_preflight_real_engine_integration/idea.md`](docs/02_product/planned_features/infra_study_preflight_real_engine_integration/idea.md) (P2: real-engine AC-1..AC-4b coverage), [`chore_studies_post_arq_spy_fixture/idea.md`](docs/02_product/planned_features/chore_studies_post_arq_spy_fixture/idea.md) (P2: Arq spy fixture for "no-enqueue on rejection" — symmetric gap across all studies-POST tests), [`bug_dashboard_banner_dismiss_persistence_flake/idea.md`](docs/02_product/planned_features/bug_dashboard_banner_dismiss_persistence_flake/idea.md) (pre-existing flake in `dashboard.spec.ts:63` introduced by PR #188 — test's `addInitScript` clears localStorage on every reload). Earlier: after `feat_orchestrator_zero_streak_abort` merged into `main` as PR #191 squash `51ae4b3c` — **26th MVP1 feature shipped**, 2 stories across 1 epic. Tier-3 mid-flight guard: orchestrator aborts a study as `failed` with `failed_reason="no signal: 20 consecutive trials scored 0.0 — judgment overlap likely lost mid-study"` after 20 consecutive `status='complete' AND primary_metric=0.0` trials. Mirrors the existing `_last_n_all_failed` precedent at [`backend/workers/orchestrator.py`](backend/workers/orchestrator.py) exactly — same block position (after failure-streak, before max_trials/time_budget), same cancel-race handling, same WARNING/INFO structlog levels. No migration, no API surface change, no frontend code change (existing `StudyHeader.failed_reason` renderer carries the new string). Composes with the Tier 1 shipped guard (`feat_study_target_judgment_mismatch_guard` PR #184) and the still-planned Tier 2 preflight overlap probe — together they close create-time + mid-flight + adapter-driven paths to "all trials score 0". Locked decisions per spec §19: threshold=20 (10 TPE random + 10 informed phases), module-level constant (NOT `Settings`), no `STUDY_NO_SIGNAL` error code (no envelope to attach it to; the `failed_reason` string IS the stable contract). Cross-model review: spec 3 cycles (21 findings, all accepted — including SQL-WHERE-on-study_id-only semantics, AC-5 impossible-data-state rewrite, log-level taxonomy reconciliation); plan 3 cycles (7 findings, all accepted — `from backend.workers` import path bug, barrier-stub determinism for AC-2, RecordingLogger setup for AC-3); cumulative-diff GPT-5.5 2 cycles (1 finding accepted in plan-patch `d3e2ac0` re: `_stop()` INFO vs WARNING log level); Gemini Code Assist 2 Medium (both accepted, fixed in `7ebbdda` — `Sequence[NativeQuery]` typing on test stubs); final GPT-5.5 review 2 cycles (3+2 findings, 4 accepted in `6e3d2dd`+`2d0bbc4` — pipeline_status surface update, STUDY_NO_SIGNAL supersession in idea, broken relative links, contract-gate clarification; 1 deferred — blog-post Fusion mention is project-scope-consistent). Tests: 1040 backend unit (unchanged); 6 new integration tests in [`backend/tests/integration/test_study_lifecycle.py`](backend/tests/integration/test_study_lifecycle.py) — 5 named (AC-1 zero-streak abort with WARNING log assertion, AC-2 outlier-in-window with barrier-stub determinism, AC-3 alternating zero/failed with INFO max_trials_reached log, AC-4 cancel-race via monkeypatched `fail_study`, AC-5 precedence via mocked helpers + spy) + 1 parameterized 8-subcase boundary matrix for FR-1/FR-5 (`_last_n_all_zero` helper SQL/order/LIMIT/NULL semantics). The existing `test_ac5_five_consecutive_failures_fail_the_study` continues to pass (FR-4 precedent regression). 13 new test cases total. New `build_zero_scoring_hits_response` fixture helper in [`backend/tests/integration/fixtures/handbuilt_qrels.py`](backend/tests/integration/fixtures/handbuilt_qrels.py). CI green on every push (7/7 jobs) incl. final `2d0bbc4`. **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. Tangential captures during the session: `bug_contract_test_stub_missing_target_filter_kwarg/idea.md` (pre-existing 2 contract-test failures in `test_error_codes.py` from PR #168's adapter Protocol change; stub signature drift). Bundled into the PR per user direction: planning docs from a pre-pipeline stash — `feat_chat_last_message_preview/idea.md` (MVP2 chat polish), `infra_ranx_migration/idea.md` (P2 — IR-evaluation engine swap; subsequently renamed `infra_ir_measures_migration/` at finalization), `docs/blog/2026-05-22-elevator-pitch-search-platform.md`, plus 3 cross-reference renames where `chore_chat_last_message_preview` → `feat_chat_last_message_preview`. Earlier: after `feat_home_first_run_demo_nudge` merged into `main` as PR #188 squash `21325432` — **25th MVP1 feature shipped**, 12 stories across 4 epics. Frontend-only polish layer on PR #182's auto-seed: dismissable demo-data banner on `/` + JSX `` on `/clusters` + ` (Demo)` text suffix in create-study modal cluster picker + proposals fk-select cluster filter + new `verify_demo_slug_parity.sh` CI guard. Cross-model review: spec 3 cycles (13 findings, all accepted); plan 3 cycles (11 findings, all accepted); phase gates 4 findings (all accepted); Gemini Code Assist 3 Medium (2 accepted, 1 rejected with counter-evidence — useMemo would save nothing given React's render model + early-return); final GPT-5.5 2 Low (1 fixed, 1 deferred to finalization). Phase 2 split out to `feat_home_demo_reseed_endpoint/idea.md` so the deferred reseed endpoint surfaces in `/pipeline --status` as its own planned feature. UI vitest **639** across 92 files (+29 across 7 new files + 2 extensions); Playwright E2E +3 on `dashboard.spec.ts`. Alembic head unchanged at `0015_trials_per_query_metrics`. Earlier: after `chore_e2e_test_rows_isolation` merged into `main` as PR #186 squash `a444b94` — **24th MVP1 feature shipped**, 2 stories across 1 epic. Closes the operator-visible-dev-DB pollution: every Playwright E2E run now drains its seeded rows after the suite via a per-worker JSONL cleanup registry, 6 new test-only `DELETE /api/v1/_test/*` endpoints gated by `_require_development_env`, FK-safe drain order (proposals → digests → studies → judgment_lists → query_sets → query_templates → clusters), and a new `cleanup-reporter.ts` Playwright Reporter that asserts `registered_deduped == attempted == deleted + failed + skipped_404 AND failed == 0` after every run. 11 strictly-new error codes (3 `_NOT_FOUND` + 8 `_HAS_DEPENDENT_*`) documented in [`docs/01_architecture/api-conventions.md`](docs/01_architecture/api-conventions.md). Pure `cleanup-core.ts` module extracted from `global-teardown.ts` so the dedupe/order/URL-build logic is unit-testable without fs/network mocks. Cross-model review: GPT-5.5 — spec 3 cycles (26 findings, 25 accepted + 1 deferred to PLAYWRIGHT_CLEANUP_STRICT=1 v2), plan 3 cycles (20 findings, all accepted); Gemini Code Assist 3 Medium findings (all rejected with SQLAlchemy AsyncSession-concurrency counter-evidence — `asyncio.gather` on the same session is forbidden); final GPT-5.5 1 High finding (rejected — truncated-diff false positive on `repo/__init__.py:38–42` import block; verified empirically `from backend.app.db.repo import hard_delete_*` works for all 6). Post-merge CI fix on the same branch: `testMatch: ['**/*.spec.ts']` added to `ui/playwright.config.ts` after the smoke job tried to load vitest `.test.ts` files as Playwright specs. Tests: 1040 backend unit (unchanged); backend integration +20 cases (6 happy + 6 parameterized 404 + 8 409 — covers all 11 strictly-new + 3 reused codes); backend contract +6 env-guard cases + 2 source-presence cases + 6 OpenAPI tuples; UI vitest **630** (was 601 — +29: 19 cleanup-core + 10 global-teardown). CI green on `01acc04` (5/5 jobs incl. smoke 70/70 Playwright). **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. Tangential capture: `chore_e2e_seed_acme_helper_dead/idea.md` (Backlog) — `seedAcmeProductsChain` has no spec caller. Earlier — after `feat_study_target_judgment_mismatch_guard` merged into `main` as PR #184 squash `ce3fcf4` — **23rd MVP1 feature shipped**, 3 stories across 1 epic. Closes the literal study2 incident: `POST /api/v1/studies` now rejects two mismatch classes at create time with specific 422 codes — `JUDGMENT_CLUSTER_MISMATCH` (judgment list and study point at different physical clusters; doc IDs are cluster-scoped so same target name on two clusters still produces zero overlap) and `JUDGMENT_TARGET_MISMATCH` (same cluster but different target index/collection). Cluster fires before target. Both checks fire AFTER FK resolution + the existing `query_set_id` `VALIDATION_ERROR` check. New `?target=` wire filter on `GET /api/v1/judgment-lists` (min_length=1, max_length=255) + `target: str` required field on `JudgmentListSummary` (additive; OpenAPI snapshot + ui/src/lib/types.ts regenerated). Frontend create-study modal Step-2 dropdown now passes `{ query_set_id, cluster_id, target, limit: 200 }` to `useJudgmentLists`; manual-mode `` uses hoisted `targetReg.onChange(e)` (RHF register preserved) then cascade-resets `judgment_list_id`; dropdown-mode target picker mirrors the same reset; new empty-state copy substitutes the target value + CTA href="/judgments". Drive-by fix bundled: E2E seed helpers (`seedJudgmentList`, `seedFullChain`, `seedStudy`) gain optional `target` overrides; 3 specs updated to align target values so the new FR-1 validator doesn't reject chained POSTs. Cross-model review: spec 3 cycles (17 findings, all accepted, 1 rejected with cited counter-evidence at create-study-modal.tsx:508), plan 3 cycles (16 findings, all accepted, 1 rejected); Gemini Code Assist 2 findings (1 accepted in `035af0a` — IIFE → hoisted register; 1 rejected with precedent counter-evidence at `test_judgments_api_contract.py:215-234`); final GPT-5.5 10 findings (2 accepted in `a358a71` — over-bound 422 test; 8 rejected — 5 truncation false positives + 3 plan/precedent rejects). Tests: 1040 backend unit (unchanged — inline conditionals), backend integration +7 cases (target/cluster mismatch + ordering + AND-semantics + summary shape + over-bound + GET-pre-existing-200), backend contract +2 cases (firing-order lock in `test_studies_api_contract.py` + summary `target` shape lock in `test_judgments_api_contract.py`), UI vitest 567 → 572 (+5: hook wire-filter, dropdown cascade, manual cascade, cluster regression-lock, empty-state CTA). CI green on `a358a71` (5/5 jobs incl. 70/70 Playwright). Alembic head unchanged at `0015_trials_per_query_metrics` — feature is purely additive at the application layer. Prior — after `feat_pr_metric_confidence` merged into `main` as PR #180 squash `d0a8358` — **22nd MVP1 feature shipped**, 9 stories across 2 epics. Backend persistence (migration `0015_trials_per_query_metrics` adds nullable JSONB column behind CHECK), analytics (`backend/app/domain/study/confidence.py` — pure-Python orchestrator + bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome helpers under FR-7 graceful-degradation), and three consumer surfaces — `StudyDetail.confidence` API enrichment, `## Confidence` PR body section, and digest narrative `` + `` Jinja blocks. Frontend ships `` on `/studies/[id]` (between StudyHeader and trials Card) + 6 glossary entries (text lifted verbatim from spec §11 tooltip table) + 2 real-backend Playwright E2E cases. Cross-model review: GPT-5.5 cycle 1 (Epic 1 gate) returned 12 findings — 5 rejected with cited counter-evidence (truncated-diff false positives), 2 deferred, 5 accepted + fixed inline; Gemini Code Assist clean pass; final GPT-5.5 review 3 Low findings all accepted + fixed inline. Tests: 1039 backend unit (+5 digest + 29 confidence + 13 studies confidence + extras), 189 contract (+2 OpenAPI shape lock + 4 PR-body section + 1 endpoint guard for the extended _test seed endpoint), 527 in-container integration (+13 StudyDetail.confidence + 5 migration round-trip + 1 open_pr plumbing + 2 Story 1.2 worker), 567 UI vitest (+14 ConfidencePanel — 13 layout + 1 tooltip-trigger inventory), 10/10 Playwright E2E (+2 ConfidencePanel real-backend). Three follow-ups filed: `chore_guides_glossary_route` (render `glossary.ts` as a `/guide/glossary` route), `chore_guides_faq` (curated operator-judgment Q&A), `chore_guide_06_screenshot_refresh_confidence_panel` (regenerate guide-06 screenshots). Alembic head moves to `0015_trials_per_query_metrics`. Prior — after `feat_pr_metric_confidence` Epic 1 landed locally on the `feat_pr_metric_confidence` branch — backend persistence + analytics + PR-body + digest-prompt surfaces complete, Epic 2 frontend ConfidencePanel ahead. Migration `0015_trials_per_query_metrics` adds the nullable JSONB column behind a CHECK constraint; new pure-Python `backend/app/domain/study/confidence.py` owns bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome classification under FR-7's graceful-degradation contract; new `backend/app/services/study_confidence.py` glues the 4-query read pattern onto the orchestrator and is consumed from `studies._detail()`, the `open_pr` worker, and the digest worker. GPT-5.5 cycle-1 review found 12 issues — 5 rejected as truncated-diff false positives, 2 deferred (plan/code interface drift; full-worker integration test deferred to feat_github_pr_worker's existing suite), 5 accepted + fixed inline (convergence `total_trials = max_trial_number + 1` instead of count; convergence KeyError guard when winner not in summary; pre-existing-row-stays-NULL migration test; Trial model docstring drift on metric key shape; state + architecture docs). 1039 backend unit tests pass (+5 digest prompt cases, +1 convergence assertion), 189 contract, 527/527 in-container integration. Prior — after `feat_agent_propose_search_space` shipped as PR #175 squash `5d29355`). **21st MVP1 feature merged** — 10 stories across 5 epics, all complete. New read-only agent tool `propose_search_space` (the 20th in the registry) builds a deterministic starter search space from a template's `declared_params` using the same heuristic that powers the create-study wizard's auto-fill — a Python port (`backend/app/domain/study/search_space_defaults.py`) of `ui/src/lib/search-space-defaults.ts` with a TS↔Python parity test driven by a shared JSON fixture (18 rows, byte-identical assertions on both sides). Cap-aware overflow guard added on both Python AND TS sides (fixes a latent bug where TS silently returned an invalid space when 8+ fall-through floats blew past 10⁶). Optional `prior_study_id` arg narrows numeric bounds via `winner ± |winner| × bracket` for sign-symmetric math (Gemini #1/#2 fix) with `bracket` threaded through the linear paths (Gemini #3 fix); log-uniform stays at √2. Graceful degrade on template mismatch + missing trial row + non-numeric winner — emits WARN logs (`agent.propose_search_space.prior_template_mismatch` / `.missing_winner_trial`). `ToolContext` gained `conversation_id: str` plumbed from `orchestrator.run_turn` for paired adherence telemetry — INFO events `agent.search_space_proposed` (propose-side) + `agent.create_study.invoked` (create-side) correlate offline by conversation_id per spec FR-6 (grep recipe in `docs/03_runbooks/agent-debugging.md` §5). New `repo.get_trial(db, trial_id)` parallels `repo.get_study`. System prompt updated: 19→20 tools, "Studies (4)" with `propose_search_space` first, new chain-guidance bullet. `ProposeSearchSpaceArgs` uses `ConfigDict(extra="forbid")` (GPT-5.5 F6 fix) so hallucinated LLM args fail Pydantic validation loudly. Spec converged at GPT-5.5 cycle 3 (19 findings, all accepted); plan converged at cycle 3 (8 findings, all accepted). Post-merge review: Gemini 3 findings all fixed in `642b5b9`; GPT-5.5 final review 6 findings — 1 fixed in `945e833`, 1 deferred (structlog migration), 4 rejected with cited counter-evidence (truncated-diff false positives). Tests: 1000 backend unit pass (+87 new cases) + 19 Python parity + 19 TS parity; 38 TS lib + 66 modal still green. Alembic head unchanged at `0014_clusters_target_filter` — feature is purely additive at the application layer. Earlier 2026-05-20 (after `feat_cluster_target_filter` shipped as PR #168 squash `57d3ba0` + follow-up `chore_seed_meaningful_demos` shipped as PR #169 squash `c44d774`). **20th MVP1 feature merged** + demo-state durability gap closed in the same session. PR #168: 5 stories (B1 migration 0014 + ORM column; B3 Pydantic + service plumb-through + responses; B2 adapter Protocol + ElasticAdapter + StubAdapter + router; F1 register modal Target filter input; F2 create-study modal filter-aware empty-state + EntitySelect accessibility improvement). Plus 4 post-impl fix commits (test_migrations head bump, register modal overflow-y-auto, EntitySelect sr-only Gemini fix, spec drift cleanup + OpenAPI shape-lock contract test from GPT-5.5 final review). PR #169: `scripts/seed_meaningful_demos.py` + `make seed-demo` target (idempotent: TRUNCATE clusters CASCADE + DELETE matching ES/OS indices + reseed with per-cluster `target_filter` values baked in — closes the gap where integration tests kept wiping the dev DB with no durable reseed mechanism). 529/529 vitest across 79 files (was 525/78), 903 backend unit tests (was 899), 50 cluster-API integration tests (was 45) + 3 new migration round-trip tests + 7 contract validator cases + OpenAPI shape-lock test. **Alembic head moved to `0014_clusters_target_filter`.** Cross-model review pre-impl: spec + plan both converged at GPT-5.5 cycle 2 (12 findings total, all accepted). Post-impl: Gemini Code Assist 3 findings (2 accepted: EntitySelect sr-only on #168, http() auth type hint on #169; 1 rejected with cited counter-evidence: out-of-scope test file from #168). GPT-5.5 final review on #168: 2 findings, both accepted (spec drift + OpenAPI shape-lock). **Process feedback captured:** `.claude/projects/.../memory/feedback_one_branch_per_session.md` — should have bundled the seed chore into PR #168 rather than spinning a sibling PR. End-to-end smoke verified live before both merges. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Bundled the `get_schema` + `explain` connect-error fix per `bug_get_schema_unhandled_connect_error` in the same PR. 525/525 vitest across 78 files, 33 adapter unit tests + contract suite + integration tests all green twice (initial + post-cycle-2). Gemini Code Assist: 1 finding rejected with cited counter-evidence (pre-existing list-shape assumption matches the wire contract). GPT-5.5 final review: 2 findings — 1 accepted in `19d9d51` (contract-layer TARGETS_FORBIDDEN + CLUSTER_UNREACHABLE envelope assertions), 1 deferred with counter-evidence (dropdown E2E `test.skip`'d; AC coverage satisfied by 8 hook unit + 6 modal unit + integration + contract tests). Two follow-up ideas filed in-PR: `bug_e2e_target_dropdown_flake` + `chore_guide_06_screenshot_refresh_target_picker`.) Earlier — same day (after `feat_create_study_search_space_builder` shipped as PR #163 squash commit `c703953`, bundling the search-space builder feature + the `bug_judgment_lists_listing_ignores_query_set_filter` backend fix surfaced during local verification. 18th MVP1 feature. The builder + bug-fix bundle reflects the single-developer series workflow: rather than spin a sibling backend PR off `main`, the bug fix landed in the same branch since the dev was already in verification mode. PR #163 went through 3 spec cycles (16 findings) + 3 plan cycles (27 findings) + 3 Gemini Code Assist findings + 2 GPT-5.5 final-review passes (1 second-pass Low finding accepted on test coverage) = 47 review findings all accepted with cited fixes. 512 vitest assertions across 77 files, 4 real-backend Playwright e2e cases against the builder, 2 new backend tests for the bundled filter fix. Two follow-up idea files captured during local verification: `feat_create_study_target_autocomplete` (Step-1 free-text target field has no autocomplete from cluster indexes — pre-existing UX debt deferred) and the now-closed `bug_judgment_lists_listing_ignores_query_set_filter` (bundled into this PR).) Earlier (also 2026-05-20) — PR #161 `0879df2` `chore_create_study_modal_e2e_stability` (un-skipped the deferred Playwright spec via `dispatchEvent('click')` on the Radix trigger), PR #160 `160ff6b` `bug_err_metric_frontend_backend_drift` (wire-enum trim — `err` removed from frontend + backend Literal), PR #159 `52e106d` `bug_tutorial_template_param_boost_naming` (heuristic extension for `_boost` suffix). Earlier (also 2026-05-20) — PR #157 `chore_create_study_wizard_polish` — squash commit `075c46b` — merged into `main`. Ships the 4-surface chore: backend template-mismatch validation at create time (two new error codes `SEARCH_SPACE_UNKNOWN_PARAM` + `SEARCH_SPACE_MISSING_DECLARED_PARAM`), Step-4 auto-fill via the new `ui/src/lib/search-space-defaults.ts` heuristic + cap-aware fallback + TS↔Python cardinality parity fixture, 4 new `study.search_space.*` glossary entries (one dual + three short-only) and 6 extended per-metric entries with k-tier clauses, Step-5 tri-state metric+k rendering with new `K_IGNORED` predicate, plus client-side validation mirror + zero-declared block + 404/transient template-fetch recovery + `__placeholder__` warning. 16 new test files + 2 modified + 1 shared JSON fixture across backend unit/integration/contract + frontend unit/component + 1 skipped E2E. Three follow-up ideas captured: `bug_tutorial_template_param_boost_naming` (tutorial template uses `_boost` suffix not matched by the locked heuristic), `chore_create_study_modal_e2e_stability` (re-enable the skipped Playwright spec once EntitySelect disabled gating stabilizes), `bug_err_metric_frontend_backend_drift` (`err` selectable in wizard but unsupported by `scoring.py`). Gemini Code Assist + GPT-5.5 final-pass both adjudicated on the PR — 2 Gemini findings + 7 GPT-5.5 findings, all addressed or filed.) Earlier 2026-05-19 (after a 4-PR shipping run drained the actionable post-MVP1 chore backlog: PR #152 `chore_ci_prettier_check` (`476db78`) + PR #153 `chore_extract_shadcn_select_test_mock` (`199e225`) + PR #154 `chore_form_dropdown_guide_screenshot_refresh` (`ed4121f`) + PR #155 `chore_detail_page_shell_primitive` (`9a72514`). PR #155 is the third primitive after `` and `` — 6 detail-page migrations + new lint guard + flattens a latent UX bug where only `proposals/[id]` discriminated 404 from network error. Earlier the same session: PR #150 (`chore_data_table_columnvisibility_tanstack`, `c1e4545`) — closes the residual DataTable follow-ups: item 5 migrates the primitive from `columns.filter(...)` to TanStack's `state.columnVisibility` API (memoized per Gemini feedback), item 3 locked the flat-prop `DataTableProps` API as canonical with a "Shipped contract addendum" on the historical implementation plan's Story 2.6. Folder renamed `chore_data_table_primitive_followups` → `chore_data_table_columnvisibility_tanstack`. Earlier 2026-05-19 PR #148 (`infra_e2e_wire_seed_helper_into_studies_spec`, squash `65f4150`) — restored the 2 digest-panel E2E tests deferred from PR #130, diagnosed and fixed the real root cause of the original smoke-lane failure (`GET /api/v1/proposals` was silently ignoring the `?study_id=` filter, returning the most-recent global pending proposal), added 5-case integration regression coverage at `backend/tests/integration/test_proposals_study_filter.py`. Plus: (a) earlier 2026-05-18 PR #146 (`bug_install_skip_ui_rebuild`, squash `7299fca`) made `make up` rebuild every Compose service (`docker compose build` no-args), switched `make down` to `docker compose down`, and added a `verify_install_builds_all_services.sh` CI gate to lock the contract; (b) earlier 2026-05-18 PR #147 captured `chore_detail_page_shell_primitive` idea (squash `8854e47`). Two new follow-ups filed: `chore_ci_prettier_check` (CI's frontend job has no `prettier --check` step — surfaced when PR #136 drift in 2 unrelated files blocked an unrelated commit) and the in-flight `chore_detail_page_shell_primitive` (third primitive after DataTable + EntitySelect).) +**Last updated:** 2026-05-25 (after `chore_dashboard_regen_quoted_pr_false_positive` admin-merged into `main` as PR #253 squash `20bcb36d` — 40th MVP1-era artifact. Dev-infra script chore that fixes `_extract_pr_number` priority-3 fuzzy match to stop matching backtick-quoted PR-merge phrases via a new `_strip_backtick_quoted_segments` helper. Pass A regex `(`{3,}).*?\1` strips triple-backtick fences (multi-line + single-line + empty, with backref enforcing same-width close so 4-backtick outer with inner 3-backtick is one unit); Pass B regex `(`{1,2})[^\n]*?\1` strips inline 1-or-2-backtick spans (including empty `` and Gemini-caught double-backtick `` `` ``). 8 new tests in TestBacktickStripPriority3 (AC-6..AC-13) + 28 existing regression guards = 36 PASSED. **4-cycle / 9-finding cross-model review**: spec-gen 3 cycles (single-line fence coverage gap, regex `+` would skip empty spans, residual count refs); impl-plan-gen 2 cycles (gate arithmetic, regex contract precision); Epic 1 phase-gate 1 Medium (backref same-width close + AC-13 added); final review 1 Medium accepted-in-part (self-triggering spec/plan examples rewritten + priority-4 follow-on bug class filed as `chore_dashboard_regen_priority4_dependency_cite_false_positive`); Gemini Code Assist 1 High (double-backtick inline-span fix, accepted using Gemini's `(`{1,2})[^\n]*?\1` suggestion). CI smoke failed (pre-existing 6+ main pushes failing the same way; `bug_smoke_dashboard_demo_state_locator_missing`); other 6/7 checks green. Two-PR rollout: PR #253 = content (FRs 1–4 + 2 review-driven fix commits), PR B = this finalization commit (FR-5 folder move). **Earlier today** `chore_e2e_seed_acme_idea_obsolete` admin-merged into `main` as PR #250 squash `05f3d486` — 39th MVP1-era artifact. Doc-only chore that closes the OBE'd `chore_e2e_seed_acme_helper_dead` idea (Option A — in-place `**Status:**` line edit at line 4, dashboard regen picks up the closure via `_extract_status_line` at `scripts/build_mvp1_dashboard.py:213`) and refreshes `ui/tests/e2e/helpers/coverage-audit.md` to 9-of-9 helper coverage (the `seedAcmeProductsChain` "0 specs — currently uncalled" framing was OBE'd by commit `2cbcb93b` which wired the helper into `ui/tests/e2e/guides/06_create_and_monitor_study.spec.ts`). **5-cycle / 13-finding cross-model review**: spec-gen 3 cycles (`**Status (...):**` regex mismatch, state.md scope drift, residual "prepend" contradictions, two-PR rollout shape); impl-plan-gen 2 cycles (dashboard-regen anti-pattern unenforceable given pre-commit hook, story numbering, file-count, AC-2 substring checks); Epic 1 phase-gate 0 findings; final review cycle 1 caught stale-base after sibling-worktree Phase 2 merged mid-flight → rebased onto `bfa8799f` with `-X ours` for dashboard conflicts; final review cycle 2 = 0 findings. Gemini Code Assist posted 3 line-level findings on `feature_spec.md` claiming 3-up paths should be 4-up — all 3 rejected with empirical `ls -d` counter-evidence (hunk-isolated path-counting false positives). CI smoke failed (pre-existing 5+ main pushes failing the same way; captured in `bug_smoke_dashboard_demo_state_locator_missing`); other 6 checks green. Two-PR rollout: PR #250 = content (FRs 1–4), PR B = this finalization commit (FR-5 folder move). **Earlier today** `infra_agent_sibling_worktree_isolation` Phase 1 + Phase 2 admin-merged into `main` as PR #249 squash `22f878f` — 38th MVP1-era artifact. Adds the `## Working in sibling worktrees` section to CLAUDE.md + 5-test regression suite locking its invariants + `scripts/run-tests-in-worktree.sh` automation + `make test-worktree` + 8-test smoke + runbook. Two tangentials captured: `chore_state_md_size_compression` and `bug_dockerfile_venv_root_owned_after_user_switch`. Phase 3 deferred per `phase3_idea.md`. 11-cycle / 39-finding cross-model review; all accepted. **Earlier today** `feat_study_baseline_trial` admin-merged as PR #245 squash `53be6c63` — 36th MVP1-era artifact. Implements the deferred Phase 2 of `feat_pr_metric_confidence` (PR #180): orchestrator runs a single non-Optuna baseline trial before Optuna via the 4-tier params resolver (parent_proposal → parent_study → operator-supplied → template-defaults), persists it as a real `Trial` row with `is_baseline=TRUE` + `optuna_trial_number=-1` sentinel, stamps `studies.baseline_trial_id` + `baseline_metric` via the new `services.study_state.stamp_baseline_trial` chokepoint (FR-12). Existing data-driven consumers flip automatically from "vs runner-up" to "vs baseline" — confidence per-query outcomes (FR-4), auto-followup chain gate (FR-5, now direction-aware — closes a latent minimize-direction bug as a side effect), digest narrative framing, PR body, ConfidencePanel label. New trials-table "Show baseline trial" UI toggle (FR-9). Migration 0020 adds `studies.baseline_trial_id` VARCHAR(36) NULL + `trials.is_baseline` BOOLEAN NOT NULL DEFAULT FALSE + partial unique index `uq_trials_study_baseline_complete` (defense layer 2 of the 3-layer resume-race guard per D-16). **3-cycle spec + 3-cycle plan + 1 CI-fix round.** CI-fix root cause: `test_study_cancel` hung 8min in backend-full because `_wait_for_baseline_trial_by_*` didn't check `study.status` for cancel (production bug — without this, operator-initiated cancel mid-baseline would wait the full `_BASELINE_WAIT_FLOOR_S` 60s before noticing); fixed by adding `_study_cancelled()` helper to bail on every poll tick + extending `_InProcessPool.enqueue_job` to dispatch `run_baseline_trial` inline + monkeypatching `_BASELINE_WAIT_FLOOR_S` = 2.0 in `_running_orchestrator`. Gemini Code Assist: 4 findings, all rejected with cited counter-evidence (3 High duplicates were false positives on `create_trial`'s `**fields: object` signature; 1 Medium was a redundant check already enforced at `FloatParam.model_validator`). **Admin-merged** because smoke is still red on the orthogonal pre-existing dashboard-banner E2E (same `dashboard.spec.ts` + `dashboard-reseed.spec.ts` failures from PR #232/#234/#236), and `ui/tests/e2e/` is untouched by this branch. **Alembic head advanced 0019 → 0020.** Only this finalization docs PR remains. Earlier: `feat_study_clone_from_previous` merged into `main` as PR #243 squash `34118ade` — admin-merged because the pre-existing dashboard demo-state-locator smoke regression (`bug_smoke_dashboard_demo_state_locator_missing`) is still blocking the smoke gate. The feature ships the full clone flow end-to-end: backend `parent_study_id` field + early-placement validation (`PARENT_STUDY_NOT_FOUND` 404, `PARENT_STUDY_WRONG_CLUSTER` 422) + persistence (no migration — column was already present from `0003_study_lifecycle_schema.py`); frontend `PrefillValues` widening + `buildPrefillFromStudy` helper + "Clone study" button on `StudyActionBar` (with running-source confirmation `AlertDialog` per FR-11) + cloned-from banner in `CreateStudyModal` (UI-only `cloneSource` never reaches the wire per D-12); deep-link `?clone_from=` reader on `/studies` with one-shot `useRef` guard + automatic re-arm on `cloneFromId` change (Gemini PR #243 #1 fix); 22 new frontend vitest cases + 7 backend integration cases + 1 Playwright real-backend E2E spec. **15 FRs / 17 ACs** all covered. **3-cycle spec + 3-cycle plan + 1-cycle Epic-1 phase-gate + 1-cycle Epic-2 phase-gate + 1-cycle final-pass + 1-cycle Gemini** = all reviews adjudicated (15 findings total, 2 accepted+fixed, 1 deferred-as-non-regression, 12 rejected with cited counter-evidence or resolved-by-merge). **Two tangential bug ideas surfaced:** [`bug_datatable_col_vis_density_localstorage_undefined_jsdom`](docs/02_product/planned_features/bug_datatable_col_vis_density_localstorage_undefined_jsdom/idea.md) (pre-existing vitest localStorage failures) + [`bug_smoke_dashboard_demo_state_locator_missing`](docs/02_product/planned_features/bug_smoke_dashboard_demo_state_locator_missing/idea.md) (pre-existing smoke regression on dashboard demo-state locators — same failure reproduces on main run #26397500888). **Follow-up:** [`feat_study_clone_narrow_bounds`](docs/02_product/planned_features/feat_study_clone_narrow_bounds/idea.md) (smart-rewrite of search-space bounds around the source's winner trial) remains in `planned_features/` for future scoping. Earlier: after `bug_demo_clusters_unreachable_in_healthz` merged into `main` as PR #236 squash `70b2ae46` — admin-merged because the pre-existing dashboard banner E2E failure still blocks smoke. Closes BOTH smoke-cascade `/healthz` observability bugs (PR #234 + PR #236) surfaced during the PR #232 unblock. **The fix:** new `run_cluster_health_warmup_background` service module spawned from the FastAPI lifespan hook + FR-7 fix to `get_or_probe_health`'s `CredentialsMissing` branch (now writes synthetic unreachable to cache instead of returning without caching). Within ~5s of API startup, `/healthz` reports truthful `elasticsearch_clusters` aggregate counts. **3-cycle spec + 3-cycle plan + 1-cycle phase-gate + 1-cycle final cross-model review** = 4 cycles total of GPT-5.5 review (32 findings, all accepted). **3 CI fix rounds after PR open:** (1) per-page session lifecycle refactor to release asyncpg connections before HTTP probes; (2) env-var gate `RELYLOOP_DISABLE_STARTUP_WARMUP=1` for integration tests to avoid asyncio interleaving with the latent webhook merge-handler row-lock race; (3) `monkeypatch.delenv` for unit test isolation. **Notable tangential bug captured:** [`bug_webhook_concurrent_merge_race_timing_sensitive/idea.md`](docs/02_product/planned_features/bug_webhook_concurrent_merge_race_timing_sensitive/idea.md) — real production-correctness bug in the webhook merge handler's row-lock, deterministically reproducible by adding ANY second lifespan task, masked on main today by pure asyncio-scheduling luck; the next feature that adds a lifespan task will trip it. P2 next-ticket. **Architecture doc** updated with three-path cache-population subsection (registration / lazy on-demand / startup warmup) + race-window caveat. **No new migration; no /healthz response shape change.** Earlier: after `bug_openai_capability_check_incapable_on_valid_key` merged into `main` as PR #234 squash `d69189db` — admin-merged because the pre-existing `bug_demo_clusters_unreachable_in_healthz` failure still blocks the smoke gate at the dashboard-banner E2E layer; this PR fixes the OTHER smoke-cascade bug (the openai capability observability gap). `/healthz` `openai_capabilities` block now carries 5 required fields: the existing `chat / function_calling / structured_output` plus new `models_endpoint: Literal["ok","fail","untested"]` and required-but-nullable `models_endpoint_status_code: int | None`. `_probe_models_endpoint` return contract widened to `tuple[bool, int | None]`; status code captured only on `>= 400` HTTP failure (never on success or network errors — and the response body is NEVER captured, only the integer status, per CLAUDE.md Absolute Rule #10). Cached `CapabilityResult.models_endpoint` schema stays 2-valued — `"untested"` only widens on the response model. Backwards compat verified: pre-fix Redis cache rows deserialize cleanly via Pydantic optional-field defaulting. Spec converged at GPT-5.5 cycle 3 (13 findings, 12 accepted + 1 rejected with counter-evidence); plan at cycle 3 (17 findings all accepted); phase-gate at cycle 1 (3 findings, 2 accepted + 1 rejected — dashboard regen is the auto-run pre-commit hook); final review at cycle 1 (1 Low finding deferred as non-regression — test-helper type hints matching existing file convention). Gemini Code Assist: clean review, zero findings. Tests: +15 cases (7 in test_capability_check.py incl `TestSecurityRedaction` for AC-10 + 5 in test_health.py incl AC-10 end-to-end through `check_capabilities` → Redis JSON round-trip → /healthz + 1 defensive in test_probes.py + 2 in test_health_contract.py). Architecture doc updated with success/failure response examples + repo-secret-vs-`.env` divergence note + cascade explanation. Remaining smoke-cascade item: [`bug_demo_clusters_unreachable_in_healthz`](docs/02_product/planned_features/bug_demo_clusters_unreachable_in_healthz/idea.md) (P2). Earlier: after `feat_home_demo_reseed_endpoint` merged into `main` as PR #228 squash `ad6ff826`. Dev-only `POST /api/v1/_test/demo/reseed` endpoint that wipes the 10 demo Postgres tables + 4 ES/OS indices and re-seeds the 4 demo scenarios from `scripts/seed_meaningful_demos.py`. Dashboard now renders a "Reset to demo state" disclosure inside `StartHereChecklist` whenever all three first-run signals are false. Architecture: dual httpx clients (api + engine) + session-level Postgres advisory lock on a dedicated pinned `AsyncConnection` + NO outer wall-clock timeout (per-call HTTP ceiling only) + TRUNCATE-commits-before-self-call invariant (AC-13) + cleanup-on-failure pass via a fresh DB connection. 14 GPT-5.5 spec cycles + 14 plan cycles to convergence; 2 Gemini Medium findings + 1 GPT-5.5 High + 2 Medium accepted, 1 Medium rejected as stale, 1 Low deferred. The High-severity fix: the in-container OpenSearch port resolver was mapping `localhost:9201` → `opensearch:9201` but the OS container actually listens on `:9200` inside the Compose network (the host `:9201` is just the port-mapping to avoid colliding with ES on the host). Tests: backend unit+contract 1560 pass (+45 vs the prior baseline); 10 integration tests at `backend/tests/integration/test_demo_seeding{,_timeout}.py` covering AC-1..AC-5 + AC-12..AC-16 (skip outside CI service containers); 21 dashboard vitest cases; 1 Playwright spec at `ui/tests/e2e/dashboard-reseed.spec.ts`. New runbook at [`docs/03_runbooks/demo-reseed-debugging.md`](docs/03_runbooks/demo-reseed-debugging.md). Tangential capture: [`bug_vitest_jsdom_localstorage_failures/idea.md`](docs/02_product/planned_features/bug_vitest_jsdom_localstorage_failures/idea.md) — 31 pre-existing vitest failures in 4 files all touching `window.localStorage`, confirmed unrelated to this PR by stashing the feature branch and reproducing on baseline. **Alembic head unchanged at `0019_digests_suggested_followups_jsonb`** — no schema change.) Prior update: 2026-05-23 (after `chore_study_default_stop_conditions` merged into `main` as PR #215 squash `370c87d9` — **first MVP1.0-cleanup chore shipped** in the operator's stated "finish MVP1.0 before MVP1.5" sweep. Frontend-only chore (~175 LOC production code + 12 vitest cases): pre-fills `max_trials = 200` in the create-study modal's Step-5 form (FR-1, baked into `useForm` `defaultValues`), adds a 4-button Stop-condition preset selector (Focused 50 / Standard 200 / Deep 1000 / Custom) above the numeric inputs (FR-2..FR-4 + FR-7), refreshes `study.max_trials` + `study.time_budget_min` glossary copy with dimensionality-keyed framing and adds a new `study.preset` glossary entry (FR-5 + FR-8), updates the chat orchestrator system prompt to recommend `max_trials=200` by default with the dimensionality scaling guidance (FR-6), and ships a 12-case vitest suite in [`ui/src/__tests__/components/studies/create-study-modal.stop-conditions.test.tsx`](ui/src/__tests__/components/studies/create-study-modal.stop-conditions.test.tsx) covering AC-1..AC-6 + 2 bug-guards + AC-8 + AC-10 + the type="button" check (FR-9). `activePreset` is derived purely from form values via `useMemo` (no `useState` + watcher `useEffect`); Custom click is a no-op (Custom == "values don't match any preset"). Cross-model review: spec converged at GPT-5.5 cycle 3 (12 findings — 11 accepted across cycles 1-3, 1 rejected with cited counter-evidence at `backend/app/api/errors.py:62, 118` re: VALIDATION_ERROR envelope claim); plan converged at cycle 2 (5 cycle-1 findings, all accepted; 0 cycle-2 findings); impl-diff GPT-5.5 cycle 1 raised 1 Medium finding (modal-open form-field reset gap, Radix Dialog mount-persistence bug), addressed in subsequent fix iterations; Gemini Code Assist 1 Medium finding (Defer — pre-existing form-state persistence across modal toggles, out of scope for this chore). **Late-stage E2E regression + root-cause fix**: `studies-create-builder.spec.ts:130` + `studies-create-target-dropdown.spec.ts:48` started failing against the production UI image — Playwright's `.fill('10')` on a non-empty Max trials input (200 default) was triggering a stray form-submit event before the test's explicit submit-button click, leaving the button stuck in `Submitting…` while Playwright retried the click against a vanishing button. Seven mechanical fix attempts (drop `form` from useEffect deps, modal-overflow scrolling, prev-open `useRef` gating, in-effect setValue removal, RHF subscription watcher, useMemo-derived activePreset, Enter-key suppression) didn't move the needle. The actual fix decouples submission from the form's `onSubmit` event entirely: ` e.preventDefault()}>` + the submit button changed from `type="submit"` to `type="button"` with `onClick={form.handleSubmit(onSubmit)}` — submission goes through exactly one path, the explicit submit-button click. Reproduced locally against the rebuilt production UI image (5/5 E2E green post-fix incl. both previously-failing ones). Also bundled: `seed.ts` ES_BASE switched from `localhost` to `127.0.0.1` (Node's IPv6-first resolver was hitting `::1` against an IPv4-only ES bind). Tests: UI vitest 98/98 studies + 730+ overall green; backend full-coverage green; smoke 70+/70 Playwright. CI: 7/7 jobs green on the final SHA. **Alembic head unchanged at `0017_proposals_last_polled_at`** — frontend-only chore at the application layer. Tangential capture: none — the seven ruled-out fix attempts are documented in the merged PR's commit history rather than as a deferred chore. Earlier: after `chore_reconciler_terminal_closed_no_poll` merged into `main` as PR #216 squash `95d4c414` — **Tier A of the reconciler polling-cost polish layer shipped**; predicated on the same-day `bug_pr_reconciler_blocked_by_closed_fallback` (PR #204) that widened the candidate set to include `pr_state='closed'` rows. New nullable `proposals.last_polled_at TIMESTAMPTZ` column via migration `0017_proposals_last_polled_at`. `list_pr_opened_proposals_for_reconcile` gains a 24-hour exclusion clause: rows with `pr_state='closed' AND last_polled_at > now() - interval '24 hours'` are excluded from each tick. The reconciler's `elif state == "closed":` branch now branches on the candidate's selection-time `pr_state` to avoid the webhook-reopen clobber race — selected-as-open candidates run the legacy `mark_proposal_pr_closed` transition (no stamp); selected-as-closed candidates skip the close helper entirely and call the new `stamp_proposal_last_polled_at` helper (whose defensive `WHERE pr_state='closed'` guard returns `None` as a benign no-op if a webhook flipped the row mid-tick). Effect: case-(b) (closed-without-merge) rows get polled at most once per 24 hours instead of once per tick — a ~288× reduction on the default 5-minute cadence under MVP1's single-worker Arq deployment. Case-(a) recovery is unaffected for first-observation rows; the narrow race where `(merged=false, closed)` was observed once before GitHub flipped to `merged=true` accepts a worst-case 24-hour latency increase (documented in spec §11 flow 3). Cross-model review: spec converged at GPT-5.5 cycle 5 (8 findings, all Accepted); plan converged at cycle 3 (3 findings — 2 Accept-Low + 1 Reject with cited counter-evidence at `proposal.py:513-523` re: pr_state filter that doesn't currently exist). Gemini Code Assist clean pass ("I have no feedback to provide"). Final GPT-5.5 review surfaced 1 Low finding (runbook path), rejected with cited counter-evidence: plan called for `pr-open-debugging.md` but the reconciler runbook is `webhook-debugging.md` (grep at implementation time confirmed zero reconciler refs in pr-open-debugging.md). Tier B (terminal `pr_closed_unmerged` status enum) explicitly out of scope per the spec — captured in [`idea.md`](docs/00_overview/implemented_features/2026_05_23_chore_reconciler_terminal_closed_no_poll/idea.md) §"Tier B" for future UX-brief gating. Tests: 16 new integration tests (9 repo + 7 worker — covering AC-2/3a/3b/4/5/6/7/8/9-race/9-reclose/10); existing `test_proposal_repo_webhook.py` + `test_pr_reconcile_config_repo_pointer.py` continue to pass; `test_migrations.py` head assertions bumped to `0017`; `test_migration_0016.py` pinned to specific revision via explicit downgrade/upgrade so it stays robust against future migrations. Migration round-trip verified clean against the shared Postgres. **Alembic head moved to `0017_proposals_last_polled_at`.** Tangential capture: [`chore_migration_test_head_brittleness/idea.md`](docs/02_product/planned_features/chore_migration_test_head_brittleness/idea.md) (P3) — `test_migrations.py:130,155` hardcodes the expected head version; every new migration requires a sympathy edit. Proposed fix: dynamic `_current_head()` helper reading from `alembic heads`. Earlier: after `bug_dashboard_banner_dismiss_persistence_flake` merged into `main` as PR #213 squash `a8b788c` — **fifth MVP1.0-cleanup bug shipped**; closes the last MVP1.0 `bug_*` item in the operator's stated "finish MVP1.0 before MVP1.5" sweep. The `Dismiss persists across reload (FR-7, AC-3)` Playwright test at [`ui/tests/e2e/dashboard.spec.ts:63`](ui/tests/e2e/dashboard.spec.ts#L63) flaked intermittently on CI smoke runs because it used `context.addInitScript` to clear the dismissed flag from localStorage — but init scripts run on every page initialization INCLUDING `page.reload()`, so the post-dismiss reload re-cleared the flag, racing with React hydration to decide whether the banner stayed hidden. The banner code at [`demo-data-banner.tsx:55-63`](ui/src/components/dashboard/demo-data-banner.tsx#L55-L63) was correct (conservative SSR snapshot returns `true` so banner starts hidden, then hydration reads localStorage) — the test was fighting the design. Race surfaced twice on PR #193 smoke CI on unrelated changes, confirming it was the test pattern at fault. Fix per preflight-locked Option A: replace `context.addInitScript` with a one-shot `page.evaluate` + `page.reload` sequence so the localStorage cleanup runs ONCE before the first user-facing assertion; the post-dismiss reload has no init script interfering with hydration. Option B (assert localStorage directly via `page.evaluate(() => window.localStorage.getItem(...))`) rejected because it would change the test's intent from "banner stays hidden" to "localStorage was written" and silently miss a future regression where storage writes succeed but the banner re-renders. Verification: 5/5 local Playwright runs of the changed test pass deterministically post-fix (vs latent race pre-fix). Cross-model review: Gemini Code Assist clean pass (0 findings); GPT-5.5 final review skipped per threshold (~15 LOC, test-only, no flagged subsystem). **Alembic head unchanged at `0016_config_repos_last_merged_proposal_id`** — frontend-test-only change. **MVP1.0 cleanup queue now bug-free; remaining items are 3 P2 chore_* + 1 Backlog chore_*.** Earlier: after `bug_dashboard_classifier_half_step_releases` merged into `main` as PR #211 squash `ab8674a` — **fourth MVP1.0-cleanup bug shipped** in the operator's stated "finish MVP1.0 before MVP1.5" sweep. Operator noticed via `/pipeline status` that `feat_ubi_judgments` (the MVP1.5 anchor) was ranking #1 in the MVP1.0 backlog despite the MVP1.5 release tier having been introduced 2026-05-23 via PR #200. Root cause: the dashboard regen script's `_target_release` classifier was never updated when MVP1.5 landed — three concrete gaps (`_RELEASE_SUFFIX_RE` integer-only `r"_mvp(\d+)$"` pattern; `_RELEASE_STATUS_RE` only matched `Held for MVPN` framing with integer N; `ROADMAP_RELEASES` had no `mvp1.5` row) plus a secondary input-scoping bug surfaced during implementation (`_load_planned` passed `status_line + " " + (idea or "")` to the classifier, so body prose that quoted release-tag phrases as documentation examples got misclassified — this bug's own idea.md was the self-collision canary). Fix extends `_RELEASE_SUFFIX_RE` to `r"_mvp(\d+(?:_\d+)?)$"` with `"1_5"` → `"1.5"` normalization, extends `_RELEASE_STATUS_RE` to match both `Held for MVPN` and `anchor for MVPN` / `anchor feature for MVPN` framings with integer-or-decimal captures, inserts `("mvp1.5", "MVP1.5 / v0.1.5", "Real Signals")` into `ROADMAP_RELEASES`, extracts a new `_release_filename_safe(release)` helper for dot→underscore filename normalization used by all 4 sites (file-write at `_dashboard_paths` + 3 link-render sites: `render_markdown` "rich local view" callout, `render_roadmap_html` cards, `render_roadmap_markdown` table cells), and scopes `_load_planned`'s classifier input to `status_line` only (matching the existing `_load_implemented` pattern). Folder rename mid-fix: `bug_dashboard_classifier_missing_mvp1_5` → `bug_dashboard_classifier_half_step_releases` because the original ended in `_mvp1_5` and triggered the new regex on itself. **idea.md's front-matter note documents the general rule: feature folders *about* a release shouldn't use the literal `mvp1_5` substring in their descriptive tail.** Cross-model review: Gemini Code Assist 4 findings (1 High root cause + 3 Medium consequences — all link-site normalization drift) all **accepted** in `d903558`; new test `test_no_raw_release_tag_in_link_renderers` reads the script source and forbids the drift pattern from recurring. GPT-5.5 final review 1 Low finding (helper-level test doesn't catch a regression at the caller) **accepted** in `9e3b095` — added `TestLoadPlannedReleaseScoping` class with 2 tmp_path fixtures exercising `_load_planned` directly. Tests: 1163 unit (was 1143 + 20 new in `test_dashboard_release_classifier.py` — 14 initial + 4 cycle-1 + 2 cycle-2 cases covering suffix recognition, status-line recognition, body-prose-not-matched, filename normalization, link-site drift sentinel, and caller-level scoping). End-to-end on the live filesystem post-fix: `mvp1: 96 features, mvp1.5: 1 features, mvp2: 5 features` (was `mvp1: 95, mvp2: 5`); new `MVP1_5_DASHBOARD.md` + `mvp1_5_dashboard.html` exist with `feat_ubi_judgments` as the only Idea row; `MVP1_DASHBOARD.md`'s Idea table dropped `feat_ubi_judgments`; the next MVP1.0 backlog item is now correctly surfaced. **Alembic head unchanged at `0016_config_repos_last_merged_proposal_id`** — purely a regen-script + dashboard data-layer change. Tangential capture (in-body note): folder-name convention should add a one-line rule to `feature_templates/README.md` about avoiding `_mvpN_M` in descriptive tails; deferred since this is the first instance. Earlier: after `bug_dashboard_depends_on_column_bloat` merged into `main` as PR #208 squash `8bb7148` — **third MVP1.0-cleanup bug shipped** in the operator's stated "finish MVP1.0 before MVP1.5" sweep. The MVP1 dashboard's "Depends on" column rendered impossibly-large lists for two shipped features (`feat_chat_agent` 46→**10** entries, `chore_tutorial_polish` 42→**11** entries) — including features that shipped weeks later and still-planned ideas. Root cause was NOT what the original idea.md claimed (the parser was already correctly scoped to the `- Depends on:` line); the actual bug lived in the `DEPS_ALL_BACKEND` sentinel-expansion block at [`scripts/build_mvp1_dashboard.py:707-714`](scripts/build_mvp1_dashboard.py), which expanded the "ALL prior backend features" / "ALL prior MVP1 features" prose marker against the **current snapshot** of all `infra_*`/`feat_*` folders without any time-ordering filter. Only two features use the marker (verified by `grep -rlE "ALL prior backend|all backend|ALL prior MVP" docs/`), so the fix surface is narrow. Fix extracted a module-level `_expand_transitive_deps(features)` helper + new `_merge_order_key()` tuple sort `(merged_date, pr_number, folder)`. For shipped features the expansion is filtered to peers strictly earlier in merge order; for planned features the full-snapshot expansion is preserved. Conservative-exclusion: anything with missing fields (no `merged_date` or no `pr_number`) sorts to end-of-time, so the helper excludes ambiguously-ordered peers rather than risk including post-shipment ones. Cross-model review: Gemini Code Assist 1 Medium finding **accepted** in `3261b06` — the self-dep guard `set(explicit) | scoped - {f.folder}` was Python-set-operator-precedence-bound to subtract only from `scoped`; moved the parens to `(set(explicit) | scoped) - {f.folder}` so self-refs from BOTH the explicit list and the sentinel expansion get dropped (aligned with the existing comment's stated intent — no real-data effect since no shipped feature lists itself). GPT-5.5 final review 1 Low finding **accepted** in `eecec9b` as doc tightening — bug_fix.md's "all other rows byte-identical" claim was overstated because the protocol-required tangential idea file (`chore_dashboard_pr_extraction_from_idea`) adds one Idea row. Tests: 1138 unit (was 1128 + 10 new in [`backend/tests/unit/scripts/test_dashboard_expand_transitive_deps.py`](backend/tests/unit/scripts/test_dashboard_expand_transitive_deps.py) — 6 expansion cases + 4 merge-order sort key cases); regression test fails on `main` with `ImportError: cannot import name '_expand_transitive_deps'`, passes on the branch. End-to-end regen confirmed bloat fix: feat_chat_agent's 10 deps are exactly the `infra_*`/`feat_*` folders shipped ≤2026-05-12 minus itself. Tangential capture: [`chore_dashboard_pr_extraction_from_idea`](docs/02_product/planned_features/chore_dashboard_pr_extraction_from_idea/idea.md) — `_extract_pr_number` only reads `pipe + plan + spec`, not `idea.md`, so legacy implemented features (e.g., `infra_frontend_stack_refresh`) that only have `idea.md` sort to end-of-day in `_merge_order_key` and get excluded from same-day peers' deps. Minor data gap, not a correctness regression. **Alembic head unchanged at `0016_config_repos_last_merged_proposal_id`** — purely a regen-script change at the developer-tooling layer. Earlier: after `bug_contract_test_stub_missing_target_filter_kwarg` merged into `main` as PR #206 squash `d3fbbce` — **second MVP1.0-cleanup bug shipped** in the operator's stated "finish MVP1.0 before MVP1.5" sweep. Two `_Stub` classes in [`backend/tests/contract/test_error_codes.py:195-206` + `:238-249`](backend/tests/contract/test_error_codes.py) crashed locally with `TypeError: _Stub.list_targets() got an unexpected keyword argument 'target_filter'` whenever Elasticsearch was reachable to the test process — pre-existing drift from `feat_cluster_target_filter` (PR #168, 2026-05-20) which extended the `SearchAdapter` Protocol + the production caller at [`clusters.py:359`](backend/app/api/v1/clusters.py#L359) without updating the contract-test stubs. CI didn't catch it because `_body()` hardcodes Compose-network DNS `base_url="http://elasticsearch:9200"` but GHA service containers bind at `localhost:9200`; cluster registration fails the verification step → both tests hit `pytest.skip("Could not register cluster — ES likely unreachable")` and the broken stubs are never exercised. Fix is 6 LOC: add `target_filter: str | None = None` to both stub signatures (the kwarg is unused — stubs raise immediately — it just needs to be accepted to match the Protocol at [`protocol.py:131-136`](backend/app/adapters/protocol.py#L131-L136)). The crash happened BEFORE the test's actual `TARGETS_FORBIDDEN` / `CLUSTER_UNREACHABLE` envelope assertions, so the tests were effectively dead — they no longer verified what they claimed to verify. Cross-model review: Gemini Code Assist 2 Medium findings (both `**_kwargs` suggestions for future-proofing) — both **deferred** with cited rationale (the idea's "Anti-pattern note" explicitly chose explicit named kwargs over `**kwargs` to keep drift detection LOUD; the systemic "shared `_BaseStubAdapter` synced via `typing.Protocol` + `mypy --strict`" fix is held as a future chore contingent on drift recurrence — `**_kwargs` would silence exactly the failure mode this PR catches). GPT-5.5 final review skipped per threshold (14 LOC, 2 files, no flagged subsystem). Tests: backend contract suite **291 passed** (was 282 passed + 2 failed pre-fix); the 2 previously-failing tests now exercise the actual envelope assertions and pass. **Alembic head unchanged at `0016_config_repos_last_merged_proposal_id`** — purely test-only change. Tangential observations sweep: none found. Earlier: after `bug_pr_reconciler_blocked_by_closed_fallback` merged into `main` as PR #204 squash `a0ca5b9` — **first MVP1.0-cleanup bug shipped** in the operator's stated "finish MVP1.0 before MVP1.5" sweep. The PR reconciler can now recover proposals stranded in `(pr_opened, closed)` by the webhook's `merged_at=null` eventual-consistency fallback: widened `list_pr_opened_proposals_for_reconcile` to include `pr_state='closed'` candidates, added `mark_proposal_pr_merged_from_closed` doing the atomic `(pr_opened, closed) → (pr_merged, merged)` UPDATE, branched the reconciler on `proposal.pr_state` to route to the right helper, and added a `pr_reconcile_recovered_eventual_consistency` INFO log for operator grep handles. FR-3a pointer-update fires from both paths. Genuinely-closed-unmerged proposals (case b — operator closed without merge) now also enter the candidate set; they become benign no-ops via the existing `mark_proposal_pr_closed` `pr_state='open'` guard but get re-polled every reconciler tick — captured as [`chore_reconciler_terminal_closed_no_poll/idea.md`](docs/02_product/planned_features/chore_reconciler_terminal_closed_no_poll/idea.md) (P2 polish, ~50 LOC Tier A: add `last_polled_at` column + exclude recently-polled closed rows). Skill chain: `/idea-preflight` rewrote the Problem section after discovering the prior diagnosis was incomplete — the actual primary blocker is the candidate-query filter, not the WHERE clause in `mark_proposal_pr_merged` (the reconciler never even sees fallback-closed proposals because they're filtered out at candidacy); `/bug-fix` produced [`bug_fix.md`](docs/00_overview/implemented_features/2026_05_23_bug_pr_reconciler_blocked_by_closed_fallback/bug_fix.md) locking Option B (new repo helper) over Option A (two-UPDATE reopen+merge) for single-conditional-UPDATE parity with every other `mark_proposal_pr_*` helper; `/impl-execute --ad-hoc` ran the standard ceremony — pre-push gate green, 7/7 CI checks pass, Gemini Code Assist clean ("I have no feedback to provide"), GPT-5.5 final review 1 Low finding (accepted-partial in `7613aab` — bug_fix.md tangential-observations section flipped from "None" to record the chore link; rejected-partial — "returns BOT" in dashboard was the regen script's standard 200-char truncation, not corruption). Regression test pivoted from negative-documentation to positive recovery + new case-(b) no-op lock: `test_reconciler_recovers_fallback_closed_proposal` + `test_reconciler_noops_on_genuinely_closed_unmerged` in [`test_pr_reconcile_config_repo_pointer.py`](backend/tests/integration/test_pr_reconcile_config_repo_pointer.py). Verified via stash-revert that the recovery test fails on `main` (`candidates=0` — fallback-closed rows invisible to candidate query). Runbook §8 paragraph at [`webhook-debugging.md`](docs/03_runbooks/webhook-debugging.md) flipped from "Known limitation" to "Eventual-consistency recovery" with the new helper named. **Alembic head unchanged at `0016_config_repos_last_merged_proposal_id`** — purely additive at the application layer. Tests: 1128 unit (unchanged), 3 reconciler-pointer integration tests passing, 54 reconciler/webhook integration sweep clean, 235 contract. Earlier: after MVP1.5 / v0.1.5 "Real Signals" tier introduced to the canonical release matrix via PR #200 squash `594f7b4` — new interstitial release between MVP1 and MVP2, anchored on **OpenSearch UBI** (User Behavior Insights — the engine-neutral standardized event-capture schema championed by OSC) as a first-class judgment source. New [`feat_ubi_judgments/idea.md`](docs/02_product/planned_features/feat_ubi_judgments/idea.md) (P1, ~3.5 KB) captures the planned-feature scope: `UbiReader` (engine-agnostic; reads `ubi_queries` + `ubi_events` via any `SearchAdapter.search_batch`) + pluggable `SignalsConverter` Protocol (position-bias-corrected CTR, dwell-time, hybrid UBI+LLM) + `POST /api/v1/judgment-lists/generate-from-ubi` + `generate_judgments_from_ubi` agent tool. No schema migration required — rides the existing `judgments.source = 'click'` enum that has shipped since MVP1. Spec patches in [`docs/00_overview/product/relevance-copilot-spec.md`](docs/00_overview/product/relevance-copilot-spec.md): §1 summary (5 releases → 6), §14 (rewritten with UBI as engine-neutral primary path; collapsed the prior per-engine Fusion-specific subsection into a single trailing paragraph per new `feedback_de_emphasize_fusion` memory), §19 (`generate_judgments_from_ubi` tool added; existing `pull_signals` retargeted from v1.5+ to MVP3), §27 (release-timeline table adds MVP1.5; new MVP1.5 subsection inserted between MVP1 and MVP2; post-GA `v1.5` renamed to `v1.5+` with Fusion-Signals bullet removed; one signals-reader bullet added to the MVP3 subsection). Canonical release matrix updated in both [`tech-stack.md`](docs/01_architecture/tech-stack.md) and the CLAUDE.md mirror. Origin: external review on 2026-05-22 (LinkedIn outreach to a senior search engineer at a relevance-tooling company) flagging UBI as a stronger trust anchor than LLM-as-judge for v1. Gemini Code Assist 5 Medium findings: 1 accepted in `b2d1a37` (§27 arrow-sequence fix `(MVP1 → MVP1.5 → MVP2 → MVP3 → MVP4 → GA v1)`); 4 deferred — pre-existing dashboard "Depends on" column parser bug verified by `git show main:MVP1_DASHBOARD.md` showing 45 backtick'd entries on `feat_chat_agent` PRE-PR (PR #200 added one more, bringing it to 46); captured as [`bug_dashboard_depends_on_column_bloat/idea.md`](docs/02_product/planned_features/bug_dashboard_depends_on_column_bloat/idea.md) (P2). No CI run on `pr.yml` — docs-only PR caught by `paths-ignore`; `secrets-defense` + `gitleaks` both green. `Alembic head unchanged at 0015_trials_per_query_metrics` — planning-only change, no code. Earlier: after `infra_ir_measures_migration` merged into `main` as PR #198 squash `350b2fc` — **31st MVP1-era artifact shipped**, 8 stories across 1 epic. Swaps the IR-evaluation engine in `backend/app/eval/scoring.py` to `ir_measures` (PyTerrier team, actively maintained). Public API of `score()` FROZEN; persisted JSONB key shape FROZEN; aggregate computed via `ir_measures.iter_calc()` + manual mean (NOT `calc_aggregate` — see plan cycle-2 C2-F4); per-query universe filtered to mirror the prior evaluator's qid set on edge cases. **No migration, no schema change** — Alembic head unchanged at `0015_trials_per_query_metrics`. Cross-model review trajectory: spec 3 GPT-5.5 cycles (11→6→1 findings, all accepted); plan 3 GPT-5.5 cycles (10→4→1 findings, 14 accepted + 1 rejected with cited counter-evidence at scoring.py:74-78); phase-gate cumulative-diff review (10 findings — 5 accepted + applied in `b5dbaa3`, 3 rejected with cited counter-evidence: the mypy override was correctly dropped, the gitignored release-notes file can't appear in diffs, test files enumerating forbidden tokens are semantically allowlisted; 2 deferred to post-impl); Gemini Code Assist (3 findings — 1 already-resolved by 352d60f pre-Gemini-post, 2 accepted + applied in 90884ed: switched `obj_repr_to_user: dict[str, str]` keyed by `repr(obj)` to `obj_to_user: dict[Measure, str]` keyed by Measure object directly — ir_measures Measure objects implement __hash__ + __eq__ correctly); final GPT-5.5 review (4 findings — 2 accepted + applied in a6b954d for CLAUDE.md/optimization.md package-name removal + parity-test docstring reword, 1 rejected with cited counter-evidence: ir_measures METADATA classifier confirms Apache 2.0 license; 1 deferred to finalization: dashboard PR# auto-fixes when folder moves). Tests: 1128 unit (was 1077 pre-migration; +51 from 30 parity cases + per-query shape + 12 regex enumeration + 9 sanity-check), 30/30 (metric, k) parity cases match the prior evaluator to 1e-6, per-query shape parity confirms outer-qid + inner-metric-key + per-(qid, metric) value parity at 1e-6, AC-12 existing-row read regression exercises all three consumers (fetch_study_confidence directly + via API + digest-worker top-trials SELECT). Q5 perf benchmark passes under existing 100ms/query threshold; Q4 resolution: outcome (a) — default `ir_measures` provider routing produces parity, no forcing needed. Operator-visible string change: `INSUFFICIENT_JUDGMENT_OVERLAP` error message at studies.py:313 now names `ir_measures`; no API contract change. Bundled inline: `backend/app/services/test_seeding.py` `p@10` → `precision@10` (pre-existing inconsistency; spec §2 C2-F5). New permanent infrastructure: `docs/00_overview/dashboard_overrides/` directory + `scripts/build_mvp1_dashboard.py` override mechanism lets future library swaps update historical-feature dashboard rows without back-editing frozen implemented-feature specs. CI green on every push iteration (4 pushes — 2 CI fixes + 1 Gemini fix + 1 final-review fix on final SHA `a6b954d` landing as squash `350b2fc`); 5/5 jobs incl. smoke + backend full-coverage. **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. Earlier: after `chore_guides_glossary_route` + `chore_guides_faq` + `chore_guide_06_screenshot_refresh_confidence_panel` bundled into `main` as PR #195 squash `ea2b242` — **28th, 29th, and 30th MVP1-era artifacts** shipped in one PR. Three siblings under `/guide/*` bundled per "one branch, one PR" memory. New `/guide/glossary` route renders the 109-entry `ui/src/lib/glossary.ts` constant with substring search + 8 prefix-derived category facet chips + deep-link anchors (`#study.metric.ndcg`); 10 walkthrough `script.md` files gain a footer link to it. New `/guide/faq` route renders a fresh 19-entry typed `ui/src/lib/faq.ts` (categories: setup-and-install/studies-and-confidence/judgments/proposals-and-prs/chat-agent) with the same search + facet + anchor contract; entries' questions self-link for sharing. Guide-06 demo Playwright spec waits up to 45s for `[data-testid="confidence-panel"]` before screenshotting → `04-study-detail.png` now captures the ConfidencePanel partial-shape view (headline metric without CI band, Robust plateau runner-up gap, per-query outcomes 0/4/0); script.md narrative gains a Monitoring sub-section describing the three signals with cross-links into glossary + FAQ. Five SKILL.md gate edits ship together (impl-execute Step 2.5 FAQ-shaped catch-net + Step 3 terminology/drift/decision-point bullets; spec-gen Step 3 #11 tooltip-cites-glossary-key; impl-plan-gen line 111 per-tooltip checklist gains glossary key + source-of-truth comment target) — all locked by a new `glossary-gate-skill-edits.test.ts` that reads each SKILL.md from disk and grep-asserts the enforcement clauses (same-PR default, escape-hatch gating, Step 8 blocking, no-drift-escape, the literal `// Source-of-truth:` marker). New shared `ui/src/lib/markdown-safety.ts` exports `MARKDOWN_DISALLOWED_ELEMENTS` consumed by 4 surfaces (glossary route + FAQ route + HelpPopover + MarkdownDoc) — extracted in response to a Gemini security-medium finding + earlier GPT-5.5 cycle-2 F10 spec finding. Pivoted away from `/pipeline` mid-flow: glossary went through `/spec-gen` with 3 GPT-5.5 cross-model review cycles (10 findings adjudicated and applied) producing a committed `feature_spec.md` design reference; user observed pipeline ceremony was disproportionate for ~200 LOC per item; FAQ shipped without a formal spec (the `ui/src/lib/faq.ts` JSDoc + skill edits are the design surface). Cross-model review: GPT-5.5 spec 3 cycles (10 findings — 6 cycle-1 + 4 cycle-2 + 0 cycle-3 convergence — all accepted: §1 outcome rewrite, scope cross-ref fix, test-name canonicalization, DoD path alignment, FR-8c lead-in fix, FR-8a escape-hatch tightening, FR-8a glossary.ts path-ref, ACs locked enforcement clauses, vitest path-resolution guidance, AC-7 source-grep → behavioral DOM assertion); Gemini Code Assist 5 Medium findings (2 accepted in `` — unused `Card*` imports + shared `MARKDOWN_DISALLOWED_ELEMENTS`; 3 rejected with cited counter-evidence — no "FR-7" in faq.ts; `feat_pr_metric_confidence` slugs are deliberate codebase-grep handles for engineer audience; `ui/src/components/ui/card.tsx:9` Card primitive uses identical hardcoded `border-gray-200 bg-white text-gray-900` — matching established precedent). Tests: UI vitest **706/706** (was 639 — +67 across 6 new test files: `app/guide/glossary/page.test.tsx` 18, `app/guide/glossary/safety-filter.test.tsx` 2 [isolated because vi.doMock leaks across tests], `app/guide/faq/page.test.tsx` 18, `app/guide/page.test.tsx` 4, `skills/glossary-gate-skill-edits.test.ts` 18, `guides/script-footer.test.ts` 12); 2 new real-backend Playwright specs (`glossary.spec.ts` 7 + `faq.spec.ts` 6); demo Playwright regen on guide-06 (4 PNGs updated). CI: 1 fix push required after first push — stale `glossary-section` data-testid in `glossary.spec.ts` after the FAQ commit renamed it to `reference-section` (vitest was updated, Playwright spec missed); fix landed in commit before merge. 5/5 jobs green on final SHA. **Alembic head unchanged at `0015_trials_per_query_metrics`** — frontend-only feature, no backend code. Earlier: after `feat_study_preflight_overlap_probe` merged into `main` as PR #193 squash `ca835e0` — **27th MVP1 feature shipped**, 3 stories across 1 epic. Tier-2 create-time guard sitting between Tier 1 (string-equality target-mismatch, PR #184) and Tier 3 (mid-flight zero-streak abort, PR #191). `POST /api/v1/studies` now issues a single bounded `ids`-existence search against the study's target index after `JUDGMENT_TARGET_MISMATCH` and before config-serialize. When fewer than `min(MIN_OVERLAP=3, max(judged_doc_count, 1))` judged doc IDs are present, returns 422 `INSUFFICIENT_JUDGMENT_OVERLAP`. When the cluster is unreachable / probe times out / engine rejects the bare ids body, the probe emits a `studies.preflight.overlap_probe.skipped` WARN log with `reason ∈ {unreachable, timeout, invalid_query_dsl}` and the study creates 201 — consistent with "tolerate transient adapter failures at write time." Locked decisions per spec §19: ids-existence probe (NOT template-rendered — avoids parameter-synthesis brittleness), 2-tier cap-aware threshold (Q1 → B), fall-through on cluster-unreachable (Q2 → A), `strict_errors=True` on adapter call, module-level constants `MIN_OVERLAP=3 / PROBE_TIMEOUT_S=2.0 / MAX_PROBED_DOCS=200` (no `Settings` field), single representative qid K=1, `OverlapProbeResult` frozen dataclass return type, dict-key unpacking via `result.get("overlap_probe", [])`. Cross-model review: spec 3 cycles (14/7/4 findings — 23 accepted + 2 rejected with cited counter-evidence: `Query.id` is `Mapped[str]` String(36) not native UUID; UNIQUE on `(judgment_list_id, query_id, doc_id)` already guarantees DISTINCT so no `DISTINCT` keyword needed); plan 3 cycles (6/4/3 findings — 13 accepted, 0 rejected); phase-gate cumulative-diff GPT-5.5 (5 findings — 2 applied in `396da73` for runbook formula + `_log_helpers.py` convention, 2 deferred as `infra_study_preflight_real_engine_integration` + `chore_studies_post_arq_spy_fixture` idea files, 1 rejected with state.md-finalization-convention counter-evidence); Gemini Code Assist (1 Medium finding rejected with cited counter-evidence — Python 3.13 pin + ruff UP041 enforce bare `TimeoutError`); final GPT-5.5 (2 Medium findings — 1 rejected via `asyncio_mode = "auto"` in pyproject.toml:165 + sibling `test_dispatch_run_query.py` precedent, 1 accepted-as-documented for the unplanned E2E seed change anticipated by plan §3.5). New backend module `backend/app/services/study_preflight.py` (~180 LOC) + 2 new repo functions in `query.py` / `judgment.py` + handler integration in `studies.py` (between JUDGMENT_TARGET_MISMATCH line 283 and config-serialize line 286) + `INSUFFICIENT_JUDGMENT_OVERLAP` row in `api-conventions.md` + recovery paragraph in `study-lifecycle-debugging.md` + source-presence ordering test in `test_studies_api_contract.py` (locks `target_pos < probe_pos < overlap_pos < config_pos`). E2E seed helper at [`ui/tests/e2e/helpers/seed.ts`](ui/tests/e2e/helpers/seed.ts) extended with `bulkIndexDocsToES()` (POSTs NDJSON `_bulk` with `refresh=wait_for` to the host-side ES at `PLAYWRIGHT_ES_BASE_URL` default `http://localhost:9200`) so synthetic `e2e-doc-N` IDs are present in the cluster's target index — without it, the new probe rejects every seeded study. Tests: 1044 backend unit (was 1040, +4 in `backend/tests/unit/services/test_study_preflight.py`); backend integration +14 test functions / 18 parametrized cases — AC-1..AC-4b handler-level via `probe_judgment_overlap` monkeypatches, AC-5/AC-6 spy that the probe is NOT invoked on Tier 1 fail paths, AC-7/AC-8/AC-10/AC-11/AC-13 adapter-layer via `_FakeProbeAdapter` + `_install_real_probe_with_fake_adapter` (monkeypatches `study_preflight.acquire_adapter` to bypass CI's missing `CLUSTER_CREDENTIALS_FILE`), AC-9 empty-judgments path, AC-12 read-path negative; backend contract +2 (envelope shape + source-presence ordering lock); 1 autouse fixture (`_default_overlap_probe_passes`) installs a sufficient `OverlapProbeResult` so existing happy-path tests don't 422 on the new probe. CI green on every push iteration (3 pushes — 2 failures + 1 success on final SHA `b11a13d`-equivalent landing as `ca835e0`); 5/5 jobs incl. smoke 70+/70 Playwright. **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. Tangential captures during the session: [`infra_study_preflight_real_engine_integration/idea.md`](docs/02_product/planned_features/infra_study_preflight_real_engine_integration/idea.md) (P2: real-engine AC-1..AC-4b coverage), [`chore_studies_post_arq_spy_fixture/idea.md`](docs/02_product/planned_features/chore_studies_post_arq_spy_fixture/idea.md) (P2: Arq spy fixture for "no-enqueue on rejection" — symmetric gap across all studies-POST tests), [`bug_dashboard_banner_dismiss_persistence_flake/idea.md`](docs/02_product/planned_features/bug_dashboard_banner_dismiss_persistence_flake/idea.md) (pre-existing flake in `dashboard.spec.ts:63` introduced by PR #188 — test's `addInitScript` clears localStorage on every reload). Earlier: after `feat_orchestrator_zero_streak_abort` merged into `main` as PR #191 squash `51ae4b3c` — **26th MVP1 feature shipped**, 2 stories across 1 epic. Tier-3 mid-flight guard: orchestrator aborts a study as `failed` with `failed_reason="no signal: 20 consecutive trials scored 0.0 — judgment overlap likely lost mid-study"` after 20 consecutive `status='complete' AND primary_metric=0.0` trials. Mirrors the existing `_last_n_all_failed` precedent at [`backend/workers/orchestrator.py`](backend/workers/orchestrator.py) exactly — same block position (after failure-streak, before max_trials/time_budget), same cancel-race handling, same WARNING/INFO structlog levels. No migration, no API surface change, no frontend code change (existing `StudyHeader.failed_reason` renderer carries the new string). Composes with the Tier 1 shipped guard (`feat_study_target_judgment_mismatch_guard` PR #184) and the still-planned Tier 2 preflight overlap probe — together they close create-time + mid-flight + adapter-driven paths to "all trials score 0". Locked decisions per spec §19: threshold=20 (10 TPE random + 10 informed phases), module-level constant (NOT `Settings`), no `STUDY_NO_SIGNAL` error code (no envelope to attach it to; the `failed_reason` string IS the stable contract). Cross-model review: spec 3 cycles (21 findings, all accepted — including SQL-WHERE-on-study_id-only semantics, AC-5 impossible-data-state rewrite, log-level taxonomy reconciliation); plan 3 cycles (7 findings, all accepted — `from backend.workers` import path bug, barrier-stub determinism for AC-2, RecordingLogger setup for AC-3); cumulative-diff GPT-5.5 2 cycles (1 finding accepted in plan-patch `d3e2ac0` re: `_stop()` INFO vs WARNING log level); Gemini Code Assist 2 Medium (both accepted, fixed in `7ebbdda` — `Sequence[NativeQuery]` typing on test stubs); final GPT-5.5 review 2 cycles (3+2 findings, 4 accepted in `6e3d2dd`+`2d0bbc4` — pipeline_status surface update, STUDY_NO_SIGNAL supersession in idea, broken relative links, contract-gate clarification; 1 deferred — blog-post Fusion mention is project-scope-consistent). Tests: 1040 backend unit (unchanged); 6 new integration tests in [`backend/tests/integration/test_study_lifecycle.py`](backend/tests/integration/test_study_lifecycle.py) — 5 named (AC-1 zero-streak abort with WARNING log assertion, AC-2 outlier-in-window with barrier-stub determinism, AC-3 alternating zero/failed with INFO max_trials_reached log, AC-4 cancel-race via monkeypatched `fail_study`, AC-5 precedence via mocked helpers + spy) + 1 parameterized 8-subcase boundary matrix for FR-1/FR-5 (`_last_n_all_zero` helper SQL/order/LIMIT/NULL semantics). The existing `test_ac5_five_consecutive_failures_fail_the_study` continues to pass (FR-4 precedent regression). 13 new test cases total. New `build_zero_scoring_hits_response` fixture helper in [`backend/tests/integration/fixtures/handbuilt_qrels.py`](backend/tests/integration/fixtures/handbuilt_qrels.py). CI green on every push (7/7 jobs) incl. final `2d0bbc4`. **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. Tangential captures during the session: `bug_contract_test_stub_missing_target_filter_kwarg/idea.md` (pre-existing 2 contract-test failures in `test_error_codes.py` from PR #168's adapter Protocol change; stub signature drift). Bundled into the PR per user direction: planning docs from a pre-pipeline stash — `feat_chat_last_message_preview/idea.md` (MVP2 chat polish), `infra_ranx_migration/idea.md` (P2 — IR-evaluation engine swap; subsequently renamed `infra_ir_measures_migration/` at finalization), `docs/blog/2026-05-22-elevator-pitch-search-platform.md`, plus 3 cross-reference renames where `chore_chat_last_message_preview` → `feat_chat_last_message_preview`. Earlier: after `feat_home_first_run_demo_nudge` merged into `main` as PR #188 squash `21325432` — **25th MVP1 feature shipped**, 12 stories across 4 epics. Frontend-only polish layer on PR #182's auto-seed: dismissable demo-data banner on `/` + JSX `` on `/clusters` + ` (Demo)` text suffix in create-study modal cluster picker + proposals fk-select cluster filter + new `verify_demo_slug_parity.sh` CI guard. Cross-model review: spec 3 cycles (13 findings, all accepted); plan 3 cycles (11 findings, all accepted); phase gates 4 findings (all accepted); Gemini Code Assist 3 Medium (2 accepted, 1 rejected with counter-evidence — useMemo would save nothing given React's render model + early-return); final GPT-5.5 2 Low (1 fixed, 1 deferred to finalization). Phase 2 split out to `feat_home_demo_reseed_endpoint/idea.md` so the deferred reseed endpoint surfaces in `/pipeline --status` as its own planned feature. UI vitest **639** across 92 files (+29 across 7 new files + 2 extensions); Playwright E2E +3 on `dashboard.spec.ts`. Alembic head unchanged at `0015_trials_per_query_metrics`. Earlier: after `chore_e2e_test_rows_isolation` merged into `main` as PR #186 squash `a444b94` — **24th MVP1 feature shipped**, 2 stories across 1 epic. Closes the operator-visible-dev-DB pollution: every Playwright E2E run now drains its seeded rows after the suite via a per-worker JSONL cleanup registry, 6 new test-only `DELETE /api/v1/_test/*` endpoints gated by `_require_development_env`, FK-safe drain order (proposals → digests → studies → judgment_lists → query_sets → query_templates → clusters), and a new `cleanup-reporter.ts` Playwright Reporter that asserts `registered_deduped == attempted == deleted + failed + skipped_404 AND failed == 0` after every run. 11 strictly-new error codes (3 `_NOT_FOUND` + 8 `_HAS_DEPENDENT_*`) documented in [`docs/01_architecture/api-conventions.md`](docs/01_architecture/api-conventions.md). Pure `cleanup-core.ts` module extracted from `global-teardown.ts` so the dedupe/order/URL-build logic is unit-testable without fs/network mocks. Cross-model review: GPT-5.5 — spec 3 cycles (26 findings, 25 accepted + 1 deferred to PLAYWRIGHT_CLEANUP_STRICT=1 v2), plan 3 cycles (20 findings, all accepted); Gemini Code Assist 3 Medium findings (all rejected with SQLAlchemy AsyncSession-concurrency counter-evidence — `asyncio.gather` on the same session is forbidden); final GPT-5.5 1 High finding (rejected — truncated-diff false positive on `repo/__init__.py:38–42` import block; verified empirically `from backend.app.db.repo import hard_delete_*` works for all 6). Post-merge CI fix on the same branch: `testMatch: ['**/*.spec.ts']` added to `ui/playwright.config.ts` after the smoke job tried to load vitest `.test.ts` files as Playwright specs. Tests: 1040 backend unit (unchanged); backend integration +20 cases (6 happy + 6 parameterized 404 + 8 409 — covers all 11 strictly-new + 3 reused codes); backend contract +6 env-guard cases + 2 source-presence cases + 6 OpenAPI tuples; UI vitest **630** (was 601 — +29: 19 cleanup-core + 10 global-teardown). CI green on `01acc04` (5/5 jobs incl. smoke 70/70 Playwright). **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. Tangential capture: `chore_e2e_seed_acme_helper_dead/idea.md` (Backlog) — `seedAcmeProductsChain` has no spec caller. Earlier — after `feat_study_target_judgment_mismatch_guard` merged into `main` as PR #184 squash `ce3fcf4` — **23rd MVP1 feature shipped**, 3 stories across 1 epic. Closes the literal study2 incident: `POST /api/v1/studies` now rejects two mismatch classes at create time with specific 422 codes — `JUDGMENT_CLUSTER_MISMATCH` (judgment list and study point at different physical clusters; doc IDs are cluster-scoped so same target name on two clusters still produces zero overlap) and `JUDGMENT_TARGET_MISMATCH` (same cluster but different target index/collection). Cluster fires before target. Both checks fire AFTER FK resolution + the existing `query_set_id` `VALIDATION_ERROR` check. New `?target=` wire filter on `GET /api/v1/judgment-lists` (min_length=1, max_length=255) + `target: str` required field on `JudgmentListSummary` (additive; OpenAPI snapshot + ui/src/lib/types.ts regenerated). Frontend create-study modal Step-2 dropdown now passes `{ query_set_id, cluster_id, target, limit: 200 }` to `useJudgmentLists`; manual-mode `` uses hoisted `targetReg.onChange(e)` (RHF register preserved) then cascade-resets `judgment_list_id`; dropdown-mode target picker mirrors the same reset; new empty-state copy substitutes the target value + CTA href="/judgments". Drive-by fix bundled: E2E seed helpers (`seedJudgmentList`, `seedFullChain`, `seedStudy`) gain optional `target` overrides; 3 specs updated to align target values so the new FR-1 validator doesn't reject chained POSTs. Cross-model review: spec 3 cycles (17 findings, all accepted, 1 rejected with cited counter-evidence at create-study-modal.tsx:508), plan 3 cycles (16 findings, all accepted, 1 rejected); Gemini Code Assist 2 findings (1 accepted in `035af0a` — IIFE → hoisted register; 1 rejected with precedent counter-evidence at `test_judgments_api_contract.py:215-234`); final GPT-5.5 10 findings (2 accepted in `a358a71` — over-bound 422 test; 8 rejected — 5 truncation false positives + 3 plan/precedent rejects). Tests: 1040 backend unit (unchanged — inline conditionals), backend integration +7 cases (target/cluster mismatch + ordering + AND-semantics + summary shape + over-bound + GET-pre-existing-200), backend contract +2 cases (firing-order lock in `test_studies_api_contract.py` + summary `target` shape lock in `test_judgments_api_contract.py`), UI vitest 567 → 572 (+5: hook wire-filter, dropdown cascade, manual cascade, cluster regression-lock, empty-state CTA). CI green on `a358a71` (5/5 jobs incl. 70/70 Playwright). Alembic head unchanged at `0015_trials_per_query_metrics` — feature is purely additive at the application layer. Prior — after `feat_pr_metric_confidence` merged into `main` as PR #180 squash `d0a8358` — **22nd MVP1 feature shipped**, 9 stories across 2 epics. Backend persistence (migration `0015_trials_per_query_metrics` adds nullable JSONB column behind CHECK), analytics (`backend/app/domain/study/confidence.py` — pure-Python orchestrator + bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome helpers under FR-7 graceful-degradation), and three consumer surfaces — `StudyDetail.confidence` API enrichment, `## Confidence` PR body section, and digest narrative `` + `` Jinja blocks. Frontend ships `` on `/studies/[id]` (between StudyHeader and trials Card) + 6 glossary entries (text lifted verbatim from spec §11 tooltip table) + 2 real-backend Playwright E2E cases. Cross-model review: GPT-5.5 cycle 1 (Epic 1 gate) returned 12 findings — 5 rejected with cited counter-evidence (truncated-diff false positives), 2 deferred, 5 accepted + fixed inline; Gemini Code Assist clean pass; final GPT-5.5 review 3 Low findings all accepted + fixed inline. Tests: 1039 backend unit (+5 digest + 29 confidence + 13 studies confidence + extras), 189 contract (+2 OpenAPI shape lock + 4 PR-body section + 1 endpoint guard for the extended _test seed endpoint), 527 in-container integration (+13 StudyDetail.confidence + 5 migration round-trip + 1 open_pr plumbing + 2 Story 1.2 worker), 567 UI vitest (+14 ConfidencePanel — 13 layout + 1 tooltip-trigger inventory), 10/10 Playwright E2E (+2 ConfidencePanel real-backend). Three follow-ups filed: `chore_guides_glossary_route` (render `glossary.ts` as a `/guide/glossary` route), `chore_guides_faq` (curated operator-judgment Q&A), `chore_guide_06_screenshot_refresh_confidence_panel` (regenerate guide-06 screenshots). Alembic head moves to `0015_trials_per_query_metrics`. Prior — after `feat_pr_metric_confidence` Epic 1 landed locally on the `feat_pr_metric_confidence` branch — backend persistence + analytics + PR-body + digest-prompt surfaces complete, Epic 2 frontend ConfidencePanel ahead. Migration `0015_trials_per_query_metrics` adds the nullable JSONB column behind a CHECK constraint; new pure-Python `backend/app/domain/study/confidence.py` owns bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome classification under FR-7's graceful-degradation contract; new `backend/app/services/study_confidence.py` glues the 4-query read pattern onto the orchestrator and is consumed from `studies._detail()`, the `open_pr` worker, and the digest worker. GPT-5.5 cycle-1 review found 12 issues — 5 rejected as truncated-diff false positives, 2 deferred (plan/code interface drift; full-worker integration test deferred to feat_github_pr_worker's existing suite), 5 accepted + fixed inline (convergence `total_trials = max_trial_number + 1` instead of count; convergence KeyError guard when winner not in summary; pre-existing-row-stays-NULL migration test; Trial model docstring drift on metric key shape; state + architecture docs). 1039 backend unit tests pass (+5 digest prompt cases, +1 convergence assertion), 189 contract, 527/527 in-container integration. Prior — after `feat_agent_propose_search_space` shipped as PR #175 squash `5d29355`). **21st MVP1 feature merged** — 10 stories across 5 epics, all complete. New read-only agent tool `propose_search_space` (the 20th in the registry) builds a deterministic starter search space from a template's `declared_params` using the same heuristic that powers the create-study wizard's auto-fill — a Python port (`backend/app/domain/study/search_space_defaults.py`) of `ui/src/lib/search-space-defaults.ts` with a TS↔Python parity test driven by a shared JSON fixture (18 rows, byte-identical assertions on both sides). Cap-aware overflow guard added on both Python AND TS sides (fixes a latent bug where TS silently returned an invalid space when 8+ fall-through floats blew past 10⁶). Optional `prior_study_id` arg narrows numeric bounds via `winner ± |winner| × bracket` for sign-symmetric math (Gemini #1/#2 fix) with `bracket` threaded through the linear paths (Gemini #3 fix); log-uniform stays at √2. Graceful degrade on template mismatch + missing trial row + non-numeric winner — emits WARN logs (`agent.propose_search_space.prior_template_mismatch` / `.missing_winner_trial`). `ToolContext` gained `conversation_id: str` plumbed from `orchestrator.run_turn` for paired adherence telemetry — INFO events `agent.search_space_proposed` (propose-side) + `agent.create_study.invoked` (create-side) correlate offline by conversation_id per spec FR-6 (grep recipe in `docs/03_runbooks/agent-debugging.md` §5). New `repo.get_trial(db, trial_id)` parallels `repo.get_study`. System prompt updated: 19→20 tools, "Studies (4)" with `propose_search_space` first, new chain-guidance bullet. `ProposeSearchSpaceArgs` uses `ConfigDict(extra="forbid")` (GPT-5.5 F6 fix) so hallucinated LLM args fail Pydantic validation loudly. Spec converged at GPT-5.5 cycle 3 (19 findings, all accepted); plan converged at cycle 3 (8 findings, all accepted). Post-merge review: Gemini 3 findings all fixed in `642b5b9`; GPT-5.5 final review 6 findings — 1 fixed in `945e833`, 1 deferred (structlog migration), 4 rejected with cited counter-evidence (truncated-diff false positives). Tests: 1000 backend unit pass (+87 new cases) + 19 Python parity + 19 TS parity; 38 TS lib + 66 modal still green. Alembic head unchanged at `0014_clusters_target_filter` — feature is purely additive at the application layer. Earlier 2026-05-20 (after `feat_cluster_target_filter` shipped as PR #168 squash `57d3ba0` + follow-up `chore_seed_meaningful_demos` shipped as PR #169 squash `c44d774`). **20th MVP1 feature merged** + demo-state durability gap closed in the same session. PR #168: 5 stories (B1 migration 0014 + ORM column; B3 Pydantic + service plumb-through + responses; B2 adapter Protocol + ElasticAdapter + StubAdapter + router; F1 register modal Target filter input; F2 create-study modal filter-aware empty-state + EntitySelect accessibility improvement). Plus 4 post-impl fix commits (test_migrations head bump, register modal overflow-y-auto, EntitySelect sr-only Gemini fix, spec drift cleanup + OpenAPI shape-lock contract test from GPT-5.5 final review). PR #169: `scripts/seed_meaningful_demos.py` + `make seed-demo` target (idempotent: TRUNCATE clusters CASCADE + DELETE matching ES/OS indices + reseed with per-cluster `target_filter` values baked in — closes the gap where integration tests kept wiping the dev DB with no durable reseed mechanism). 529/529 vitest across 79 files (was 525/78), 903 backend unit tests (was 899), 50 cluster-API integration tests (was 45) + 3 new migration round-trip tests + 7 contract validator cases + OpenAPI shape-lock test. **Alembic head moved to `0014_clusters_target_filter`.** Cross-model review pre-impl: spec + plan both converged at GPT-5.5 cycle 2 (12 findings total, all accepted). Post-impl: Gemini Code Assist 3 findings (2 accepted: EntitySelect sr-only on #168, http() auth type hint on #169; 1 rejected with cited counter-evidence: out-of-scope test file from #168). GPT-5.5 final review on #168: 2 findings, both accepted (spec drift + OpenAPI shape-lock). **Process feedback captured:** `.claude/projects/.../memory/feedback_one_branch_per_session.md` — should have bundled the seed chore into PR #168 rather than spinning a sibling PR. End-to-end smoke verified live before both merges. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Bundled the `get_schema` + `explain` connect-error fix per `bug_get_schema_unhandled_connect_error` in the same PR. 525/525 vitest across 78 files, 33 adapter unit tests + contract suite + integration tests all green twice (initial + post-cycle-2). Gemini Code Assist: 1 finding rejected with cited counter-evidence (pre-existing list-shape assumption matches the wire contract). GPT-5.5 final review: 2 findings — 1 accepted in `19d9d51` (contract-layer TARGETS_FORBIDDEN + CLUSTER_UNREACHABLE envelope assertions), 1 deferred with counter-evidence (dropdown E2E `test.skip`'d; AC coverage satisfied by 8 hook unit + 6 modal unit + integration + contract tests). Two follow-up ideas filed in-PR: `bug_e2e_target_dropdown_flake` + `chore_guide_06_screenshot_refresh_target_picker`.) Earlier — same day (after `feat_create_study_search_space_builder` shipped as PR #163 squash commit `c703953`, bundling the search-space builder feature + the `bug_judgment_lists_listing_ignores_query_set_filter` backend fix surfaced during local verification. 18th MVP1 feature. The builder + bug-fix bundle reflects the single-developer series workflow: rather than spin a sibling backend PR off `main`, the bug fix landed in the same branch since the dev was already in verification mode. PR #163 went through 3 spec cycles (16 findings) + 3 plan cycles (27 findings) + 3 Gemini Code Assist findings + 2 GPT-5.5 final-review passes (1 second-pass Low finding accepted on test coverage) = 47 review findings all accepted with cited fixes. 512 vitest assertions across 77 files, 4 real-backend Playwright e2e cases against the builder, 2 new backend tests for the bundled filter fix. Two follow-up idea files captured during local verification: `feat_create_study_target_autocomplete` (Step-1 free-text target field has no autocomplete from cluster indexes — pre-existing UX debt deferred) and the now-closed `bug_judgment_lists_listing_ignores_query_set_filter` (bundled into this PR).) Earlier (also 2026-05-20) — PR #161 `0879df2` `chore_create_study_modal_e2e_stability` (un-skipped the deferred Playwright spec via `dispatchEvent('click')` on the Radix trigger), PR #160 `160ff6b` `bug_err_metric_frontend_backend_drift` (wire-enum trim — `err` removed from frontend + backend Literal), PR #159 `52e106d` `bug_tutorial_template_param_boost_naming` (heuristic extension for `_boost` suffix). Earlier (also 2026-05-20) — PR #157 `chore_create_study_wizard_polish` — squash commit `075c46b` — merged into `main`. Ships the 4-surface chore: backend template-mismatch validation at create time (two new error codes `SEARCH_SPACE_UNKNOWN_PARAM` + `SEARCH_SPACE_MISSING_DECLARED_PARAM`), Step-4 auto-fill via the new `ui/src/lib/search-space-defaults.ts` heuristic + cap-aware fallback + TS↔Python cardinality parity fixture, 4 new `study.search_space.*` glossary entries (one dual + three short-only) and 6 extended per-metric entries with k-tier clauses, Step-5 tri-state metric+k rendering with new `K_IGNORED` predicate, plus client-side validation mirror + zero-declared block + 404/transient template-fetch recovery + `__placeholder__` warning. 16 new test files + 2 modified + 1 shared JSON fixture across backend unit/integration/contract + frontend unit/component + 1 skipped E2E. Three follow-up ideas captured: `bug_tutorial_template_param_boost_naming` (tutorial template uses `_boost` suffix not matched by the locked heuristic), `chore_create_study_modal_e2e_stability` (re-enable the skipped Playwright spec once EntitySelect disabled gating stabilizes), `bug_err_metric_frontend_backend_drift` (`err` selectable in wizard but unsupported by `scoring.py`). Gemini Code Assist + GPT-5.5 final-pass both adjudicated on the PR — 2 Gemini findings + 7 GPT-5.5 findings, all addressed or filed.) Earlier 2026-05-19 (after a 4-PR shipping run drained the actionable post-MVP1 chore backlog: PR #152 `chore_ci_prettier_check` (`476db78`) + PR #153 `chore_extract_shadcn_select_test_mock` (`199e225`) + PR #154 `chore_form_dropdown_guide_screenshot_refresh` (`ed4121f`) + PR #155 `chore_detail_page_shell_primitive` (`9a72514`). PR #155 is the third primitive after `` and `` — 6 detail-page migrations + new lint guard + flattens a latent UX bug where only `proposals/[id]` discriminated 404 from network error. Earlier the same session: PR #150 (`chore_data_table_columnvisibility_tanstack`, `c1e4545`) — closes the residual DataTable follow-ups: item 5 migrates the primitive from `columns.filter(...)` to TanStack's `state.columnVisibility` API (memoized per Gemini feedback), item 3 locked the flat-prop `DataTableProps` API as canonical with a "Shipped contract addendum" on the historical implementation plan's Story 2.6. Folder renamed `chore_data_table_primitive_followups` → `chore_data_table_columnvisibility_tanstack`. Earlier 2026-05-19 PR #148 (`infra_e2e_wire_seed_helper_into_studies_spec`, squash `65f4150`) — restored the 2 digest-panel E2E tests deferred from PR #130, diagnosed and fixed the real root cause of the original smoke-lane failure (`GET /api/v1/proposals` was silently ignoring the `?study_id=` filter, returning the most-recent global pending proposal), added 5-case integration regression coverage at `backend/tests/integration/test_proposals_study_filter.py`. Plus: (a) earlier 2026-05-18 PR #146 (`bug_install_skip_ui_rebuild`, squash `7299fca`) made `make up` rebuild every Compose service (`docker compose build` no-args), switched `make down` to `docker compose down`, and added a `verify_install_builds_all_services.sh` CI gate to lock the contract; (b) earlier 2026-05-18 PR #147 captured `chore_detail_page_shell_primitive` idea (squash `8854e47`). Two new follow-ups filed: `chore_ci_prettier_check` (CI's frontend job has no `prettier --check` step — surfaced when PR #136 drift in 2 unrelated files blocked an unrelated commit) and the in-flight `chore_detail_page_shell_primitive` (third primitive after DataTable + EntitySelect).) ---