diff --git a/docs/00_overview/DASHBOARD.md b/docs/00_overview/DASHBOARD.md
index 971a6cb8..15c29466 100644
--- a/docs/00_overview/DASHBOARD.md
+++ b/docs/00_overview/DASHBOARD.md
@@ -6,7 +6,7 @@ _Top-level index across MVP1 → GA v1+ as of **2026-05-22**. Click a release na
 
 | Release | Theme | Progress | Status |
 |---|---|---|---|
-| [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 58 / 59 scoped done · 7 remaining | **In progress** |
+| [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 59 / 59 scoped done · 6 remaining | **In progress** |
 | [MVP2 / v0.2](MVP2_DASHBOARD.md) | Observable | 1 / 1 scoped done · 1 remaining | **In progress** |
 | MVP3 / v0.3 | Production Stacks | — | **Not yet scoped** |
 | MVP4 / v0.4 | Multi-tenant, Multi-LLM | — | **Not yet scoped** |
diff --git a/docs/00_overview/MVP1_DASHBOARD.md b/docs/00_overview/MVP1_DASHBOARD.md
index 0183d0d5..b93b0819 100644
--- a/docs/00_overview/MVP1_DASHBOARD.md
+++ b/docs/00_overview/MVP1_DASHBOARD.md
@@ -6,34 +6,28 @@ _Reflects feature-folder state as of **2026-05-22** (latest mtime of any planned
 
 ## Next up
 
-**[chore_e2e_test_rows_isolation](../02_product/planned_features/chore_e2e_test_rows_isolation/feature_spec.md)** — Chore, currently in **Plan**
+All scoped MVP1 features shipped 🎉
 
-> Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the registry in FK-safe order at the end of the
-
-Plan approved; run /impl-execute to ship
-
-```bash
-/impl-execute docs/02_product/planned_features/chore_e2e_test_rows_isolation/implementation_plan.md --all
-```
+Pull from the Idea backlog or capture a new feature spec.
 
 ## MVP1 Progress
 
 | Metric | Value |
 |---|---|
-| Scoped items done | **58 / 59** (98%) — feat_/infra_/chore_/epic_ past idea stage |
-| Pending work | **14** items (every not-done feat/infra/chore/bug across all priorities) |
-| → P0 — do next | **1** unblocking / paying daily cost |
+| Scoped items done | **59 / 59** (100%) — feat_/infra_/chore_/epic_ past idea stage |
+| Pending work | **13** items (every not-done feat/infra/chore/bug across all priorities) |
+| → P0 — do next | **0** unblocking / paying daily cost |
 | → P1 | **6** high-value, ready when P0 clears |
 | → P2 (default) | 6 important to file, not blocking |
 | → Backlog | 1 captured for record, not planned |
 | Open bugs | 0 |
-| Legacy "Path to MVP1" | 7 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) |
+| Legacy "Path to MVP1" | 6 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) |
 | Backlog ideas | 7 idea-only feat/infra (not yet scoped into MVP1) |
 | In flight | 0 feature(s) actively shipping |
 
 ## Pipeline
 
-### Done (70)
+### Done (71)
 
 | Feature | Type | One-liner | Depends on | Status |
 |---|---|---|---|---|
@@ -78,6 +72,7 @@ Plan approved; run /impl-execute to ship
 | [chore_data_table_columnvisibility_tanstack](implemented_features/2026_05_19_chore_data_table_columnvisibility_tanstack/idea.md) | Chore | Complete | — | Complete |
 | [chore_detail_page_shell_primitive](implemented_features/2026_05_19_chore_detail_page_shell_primitive/idea.md) | Chore | Complete | — | Complete |
 | [chore_digest_worker_narrow_except](implemented_features/2026_05_14_chore_digest_worker_narrow_except/idea.md) | Chore | Complete | — | Complete |
+| [chore_e2e_test_rows_isolation](implemented_features/2026_05_21_chore_e2e_test_rows_isolation/feature_spec.md) | Chore | Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the reg | — | [PR #186](https://github.com/SoundMindsAI/relyloop/pull/186) merged 2026-05-21 |
 | [chore_env_guard_extend_deny_pattern](implemented_features/2026_05_13_chore_env_guard_extend_deny_pattern/idea.md) | Chore | Complete | — | Complete |
 | [chore_extract_shadcn_select_test_mock](implemented_features/2026_05_19_chore_extract_shadcn_select_test_mock/idea.md) | Chore | Complete | — | Complete |
 | [chore_form_dropdown_guide_screenshot_refresh](implemented_features/2026_05_19_chore_form_dropdown_guide_screenshot_refresh/idea.md) | Chore | Complete | — | Complete |
@@ -112,11 +107,9 @@ Plan approved; run /impl-execute to ship
 
 _None._
 
-### Plan (1)
+### Plan (0)
 
-| Priority | Feature | Type | One-liner | Depends on | Status |
-|---|---|---|---|---|---|
-| P0 | [chore_e2e_test_rows_isolation](../02_product/planned_features/chore_e2e_test_rows_isolation/feature_spec.md) | Chore | Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the reg | — | [PR #182](https://github.com/SoundMindsAI/relyloop/pull/182) |
+_None._
 
 ### Spec (0)
 
@@ -151,8 +144,6 @@ graph LR
   classDef plan fill:#fef9c3,stroke:#854d0e,color:#854d0e;
   classDef spec fill:#dbeafe,stroke:#1e40af,color:#1e40af;
   classDef idea fill:#f1f5f9,stroke:#334155,color:#334155;
-  chore_e2e_test_rows_isolation["e2e test rows isolation"]
-  class chore_e2e_test_rows_isolation plan;
   infra_foundation["foundation"]
   class infra_foundation done;
   feat_study_lifecycle["study lifecycle"]
@@ -255,6 +246,8 @@ graph LR
   class feat_create_study_search_space_builder done;
   feat_create_study_target_autocomplete["create study target autocomplete"]
   class feat_create_study_target_autocomplete done;
+  chore_e2e_test_rows_isolation["e2e test rows isolation"]
+  class chore_e2e_test_rows_isolation done;
   chore_guide_01_screenshot_refresh_target_filter["guide 01 screenshot refresh target filter"]
   class chore_guide_01_screenshot_refresh_target_filter done;
   chore_guide_06_screenshot_refresh_target_picker["guide 06 screenshot refresh target picker"]
diff --git a/docs/00_overview/dashboard.html b/docs/00_overview/dashboard.html
index a6af878f..858eb316 100644
--- a/docs/00_overview/dashboard.html
+++ b/docs/00_overview/dashboard.html
@@ -384,7 +384,7 @@ <h2>Releases</h2>
 <div class="roadmap-row">
   <div class="release-name"><a href="mvp1_dashboard.html">MVP1 / v0.1</a></div>
   <div class="theme">The Loop</div>
-  <div class="progress">58 / 59 scoped done · 7 remaining</div>
+  <div class="progress">59 / 59 scoped done · 6 remaining</div>
   <span class="state-pill in_progress">In progress</span>
 </div>
 
diff --git a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/feature_spec.md b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/feature_spec.md
similarity index 100%
rename from docs/02_product/planned_features/chore_e2e_test_rows_isolation/feature_spec.md
rename to docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/feature_spec.md
diff --git a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/idea.md b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/idea.md
similarity index 100%
rename from docs/02_product/planned_features/chore_e2e_test_rows_isolation/idea.md
rename to docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/idea.md
diff --git a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/implementation_plan.md b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/implementation_plan.md
similarity index 99%
rename from docs/02_product/planned_features/chore_e2e_test_rows_isolation/implementation_plan.md
rename to docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/implementation_plan.md
index 3c3ac273..7024ec44 100644
--- a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/implementation_plan.md
+++ b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/implementation_plan.md
@@ -1,7 +1,7 @@
 # Implementation Plan — chore_e2e_test_rows_isolation
 
 **Date:** 2026-05-21
-**Status:** Draft
+**Status:** Complete (PR #186 squash `a444b94`, merged 2026-05-21)
 **Primary spec:** [feature_spec.md](feature_spec.md)
 **Policy source(s):** [api-conventions.md](../../../01_architecture/api-conventions.md), [CLAUDE.md](../../../../CLAUDE.md), spec §19 Decision log
 
diff --git a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/pipeline_status.md b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/pipeline_status.md
similarity index 61%
rename from docs/02_product/planned_features/chore_e2e_test_rows_isolation/pipeline_status.md
rename to docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/pipeline_status.md
index 069fd4ec..52bf2e13 100644
--- a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/pipeline_status.md
+++ b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/pipeline_status.md
@@ -25,4 +25,16 @@
 - Critical cycle-3 findings: parse failures didn't count toward `failed` invariant (now do); stdout log misstated `entries.length` vs distinct-resource count.
 
 ## Implementation
-- Status: Not started
+- Status: Complete
+- Date: 2026-05-21
+- PR: #186 (squash `a444b94`, merged into `main` 2026-05-21)
+- Branch: `chore/e2e-test-rows-isolation` (deleted post-merge)
+- Stories shipped: 2 of 2 (1.1 backend 6 DELETE endpoints + 20 integration cases + 6 env-guard contract + 11 strictly-new error-code source-presence + 7 OpenAPI tuples; 1.2 frontend per-worker JSONL registry + globalSetup/Teardown + cleanup-reporter + 29 vitest cases)
+- CI: green on final HEAD (5/5 jobs incl. smoke 70/70 Playwright)
+- Reviews: Gemini Code Assist 3 Medium findings (all rejected with SQLAlchemy AsyncSession-concurrency counter-evidence at `backend/app/api/v1/_test.py:269/353/415`); GPT-5.5 final review 1 High finding (rejected — truncated-diff false positive at `backend/app/db/repo/__init__.py:38–42`).
+- Post-merge fix: one follow-up commit on the same branch added `testMatch: ['**/*.spec.ts']` to `ui/playwright.config.ts` after the smoke job tried to load vitest `.test.ts` files as Playwright specs.
+- Tangential capture: `chore_e2e_seed_acme_helper_dead/idea.md` — `seedAcmeProductsChain` is dead code (Backlog).
+
+## Done
+- Status: Merged
+- Date: 2026-05-21
diff --git a/docs/00_overview/mvp1_dashboard.html b/docs/00_overview/mvp1_dashboard.html
index 5b48564f..f5a83b3e 100644
--- a/docs/00_overview/mvp1_dashboard.html
+++ b/docs/00_overview/mvp1_dashboard.html
@@ -382,12 +382,12 @@ <h1>RelyLoop MVP1 Dashboard</h1>
 <main>
 
 <section>
-  <div class="next-up">
-    <div class="eyebrow">Next up — Chore, currently in <strong>Plan</strong></div>
-    <div class="title"><a href="../../docs/02_product/planned_features/chore_e2e_test_rows_isolation/feature_spec.md">E2E Test Rows Isolation</a></div>
-    <div class="one-liner">Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the registry in FK-safe order at the end of the</div>
-    <div class="stage-hint">Plan approved; run /impl-execute to ship</div>
-    <code class="cmd">/impl-execute docs/02_product/planned_features/chore_e2e_test_rows_isolation/implementation_plan.md --all</code>
+  <div class="next-up done">
+    <div class="eyebrow">Next up</div>
+    <div class="title">All scoped MVP1 features shipped 🎉</div>
+    <div class="one-liner">
+      Pull from the Idea backlog or capture a new feature spec.
+    </div>
   </div>
 </section>
 
@@ -395,15 +395,15 @@ <h1>RelyLoop MVP1 Dashboard</h1>
 <section>
   <h2>MVP1 Progress</h2>
   <div class="kpi-row">
-    <div class="kpi ">
+    <div class="kpi complete">
       <div class="label">Scoped items done</div>
-      <div class="value">58 / 59</div>
-      <div class="sub">98% of feat_/infra_/chore_/epic_ items past idea stage</div>
-      <div class="bar"><span style="width:98%"></span></div>
+      <div class="value">59 / 59</div>
+      <div class="sub">100% of feat_/infra_/chore_/epic_ items past idea stage</div>
+      <div class="bar"><span style="width:100%"></span></div>
     </div>
     <div class="kpi warn">
       <div class="label">Pending work</div>
-      <div class="value">14</div>
+      <div class="value">13</div>
       <div class="sub">every not-done feat/infra/chore/bug across all priorities</div>
     </div>
     <div class="kpi ">
@@ -411,9 +411,9 @@ <h2>MVP1 Progress</h2>
       <div class="value">0</div>
       <div class="sub">tracked bug_* idea files</div>
     </div>
-    <div class="kpi warn">
+    <div class="kpi ">
       <div class="label">P0 — do next</div>
-      <div class="value">1</div>
+      <div class="value">0</div>
       <div class="sub">unblocking / paying daily cost</div>
     </div>
   </div>
@@ -435,7 +435,7 @@ <h2>MVP1 Progress</h2>
     </div>
     <div class="kpi">
       <div class="label">Legacy "Path to MVP1"</div>
-      <div class="value">7</div>
+      <div class="value">6</div>
       <div class="sub">scoped not-done + bugs + chore-ideas only (excludes feat/infra ideas)</div>
     </div>
   </div>
@@ -641,19 +641,7 @@ <h3>Spec <span class="count">0</span></h3>
 </div>
 
 <div class="col plan">
-  <h3>Plan <span class="count">1</span></h3>
-
-<div class="card chore" data-prefix="chore" data-priority="P0">
-  <div class="name"><a href="../../docs/02_product/planned_features/chore_e2e_test_rows_isolation/feature_spec.md">E2E Test Rows Isolation</a></div>
-  <div class="meta">
-    <span class="badge chore">Chore</span>
-    <span class="badge priority" data-priority="P0">P0</span>
-    <a class="pr" href="https://github.com/SoundMindsAI/relyloop/pull/182">PR #182</a>
-  </div>
-  <div class="one-liner">Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the reg</div>
-
-
-</div>
+  <h3>Plan <span class="count">0</span></h3>
 
 </div>
 
@@ -663,7 +651,7 @@ <h3>Implementing <span class="count">0</span></h3>
 </div>
 
 <div class="col done">
-  <h3>Done <span class="count">70</span></h3>
+  <h3>Done <span class="count">71</span></h3>
 
 <div class="card feat" data-prefix="feat" data-priority="P2">
   <div class="name"><a href="../../docs/00_overview/implemented_features/2026_05_21_feat_agent_propose_search_space/feature_spec.md">Agent Propose Search Space</a></div>
@@ -1198,6 +1186,19 @@ <h3>Done <span class="count">70</span></h3>
 </div>
 
 
+<div class="card chore" data-prefix="chore" data-priority="P2">
+  <div class="name"><a href="../../docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/feature_spec.md">E2E Test Rows Isolation</a></div>
+  <div class="meta">
+    <span class="badge chore">Chore</span>
+
+    <a class="pr" href="https://github.com/SoundMindsAI/relyloop/pull/186">PR #186</a><span>merged 2026-05-21</span>
+  </div>
+  <div class="one-liner">Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the reg</div>
+
+
+</div>
+
+
 <div class="card chore" data-prefix="chore" data-priority="P2">
   <div class="name"><a href="../../docs/00_overview/implemented_features/2026_05_13_chore_env_guard_extend_deny_pattern">Env Guard Extend Deny Pattern</a></div>
   <div class="meta">
@@ -1587,8 +1588,6 @@ <h2>Dependency graph (feat_ + infra_)</h2>
   classDef plan fill:#fef9c3,stroke:#854d0e,color:#854d0e;
   classDef spec fill:#dbeafe,stroke:#1e40af,color:#1e40af;
   classDef idea fill:#f1f5f9,stroke:#334155,color:#334155;
-  chore_e2e_test_rows_isolation[&quot;e2e test rows isolation&quot;]
-  class chore_e2e_test_rows_isolation plan;
   infra_foundation[&quot;foundation&quot;]
   class infra_foundation done;
   feat_study_lifecycle[&quot;study lifecycle&quot;]
@@ -1691,6 +1690,8 @@ <h2>Dependency graph (feat_ + infra_)</h2>
   class feat_create_study_search_space_builder done;
   feat_create_study_target_autocomplete[&quot;create study target autocomplete&quot;]
   class feat_create_study_target_autocomplete done;
+  chore_e2e_test_rows_isolation[&quot;e2e test rows isolation&quot;]
+  class chore_e2e_test_rows_isolation done;
   chore_guide_01_screenshot_refresh_target_filter[&quot;guide 01 screenshot refresh target filter&quot;]
   class chore_guide_01_screenshot_refresh_target_filter done;
   chore_guide_06_screenshot_refresh_target_picker[&quot;guide 06 screenshot refresh target picker&quot;]
@@ -1798,8 +1799,6 @@ <h2>Dependency graph (feat_ + infra_)</h2>
   classDef plan fill:#fef9c3,stroke:#854d0e,color:#854d0e;
   classDef spec fill:#dbeafe,stroke:#1e40af,color:#1e40af;
   classDef idea fill:#f1f5f9,stroke:#334155,color:#334155;
-  chore_e2e_test_rows_isolation[&quot;e2e test rows isolation&quot;]
-  class chore_e2e_test_rows_isolation plan;
   infra_foundation[&quot;foundation&quot;]
   class infra_foundation done;
   feat_study_lifecycle[&quot;study lifecycle&quot;]
@@ -1902,6 +1901,8 @@ <h2>Dependency graph (feat_ + infra_)</h2>
   class feat_create_study_search_space_builder done;
   feat_create_study_target_autocomplete[&quot;create study target autocomplete&quot;]
   class feat_create_study_target_autocomplete done;
+  chore_e2e_test_rows_isolation[&quot;e2e test rows isolation&quot;]
+  class chore_e2e_test_rows_isolation done;
   chore_guide_01_screenshot_refresh_target_filter[&quot;guide 01 screenshot refresh target filter&quot;]
   class chore_guide_01_screenshot_refresh_target_filter done;
   chore_guide_06_screenshot_refresh_target_picker[&quot;guide 06 screenshot refresh target picker&quot;]
diff --git a/state.md b/state.md
index 97749b7f..40be439b 100644
--- a/state.md
+++ b/state.md
@@ -2,14 +2,14 @@
 
 > Read this first. Snapshots the active branch, what just shipped, what's in flight, what's queued, and where the project currently sits in the MVP1 → GA roadmap. Updated whenever a feature lands or a priority shifts.
 
-**Last updated:** 2026-05-21 (after `feat_study_target_judgment_mismatch_guard` merged into `main` as PR #184 squash `ce3fcf4` — **23rd MVP1 feature shipped**, 3 stories across 1 epic. Closes the literal study2 incident: `POST /api/v1/studies` now rejects two mismatch classes at create time with specific 422 codes — `JUDGMENT_CLUSTER_MISMATCH` (judgment list and study point at different physical clusters; doc IDs are cluster-scoped so same target name on two clusters still produces zero overlap) and `JUDGMENT_TARGET_MISMATCH` (same cluster but different target index/collection). Cluster fires before target. Both checks fire AFTER FK resolution + the existing `query_set_id` `VALIDATION_ERROR` check. New `?target=` wire filter on `GET /api/v1/judgment-lists` (min_length=1, max_length=255) + `target: str` required field on `JudgmentListSummary` (additive; OpenAPI snapshot + ui/src/lib/types.ts regenerated). Frontend create-study modal Step-2 dropdown now passes `{ query_set_id, cluster_id, target, limit: 200 }` to `useJudgmentLists`; manual-mode `<Input>` uses hoisted `targetReg.onChange(e)` (RHF register preserved) then cascade-resets `judgment_list_id`; dropdown-mode target picker mirrors the same reset; new empty-state copy substitutes the target value + CTA href="/judgments". Drive-by fix bundled: E2E seed helpers (`seedJudgmentList`, `seedFullChain`, `seedStudy`) gain optional `target` overrides; 3 specs updated to align target values so the new FR-1 validator doesn't reject chained POSTs. Cross-model review: spec 3 cycles (17 findings, all accepted, 1 rejected with cited counter-evidence at create-study-modal.tsx:508), plan 3 cycles (16 findings, all accepted, 1 rejected); Gemini Code Assist 2 findings (1 accepted in `035af0a` — IIFE → hoisted register; 1 rejected with precedent counter-evidence at `test_judgments_api_contract.py:215-234`); final GPT-5.5 10 findings (2 accepted in `a358a71` — over-bound 422 test; 8 rejected — 5 truncation false positives + 3 plan/precedent rejects). Tests: 1040 backend unit (unchanged — inline conditionals), backend integration +7 cases (target/cluster mismatch + ordering + AND-semantics + summary shape + over-bound + GET-pre-existing-200), backend contract +2 cases (firing-order lock in `test_studies_api_contract.py` + summary `target` shape lock in `test_judgments_api_contract.py`), UI vitest 567 → 572 (+5: hook wire-filter, dropdown cascade, manual cascade, cluster regression-lock, empty-state CTA). CI green on `a358a71` (5/5 jobs incl. 70/70 Playwright). Alembic head unchanged at `0015_trials_per_query_metrics` — feature is purely additive at the application layer. Prior — after `feat_pr_metric_confidence` merged into `main` as PR #180 squash `d0a8358` — **22nd MVP1 feature shipped**, 9 stories across 2 epics. Backend persistence (migration `0015_trials_per_query_metrics` adds nullable JSONB column behind CHECK), analytics (`backend/app/domain/study/confidence.py` — pure-Python orchestrator + bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome helpers under FR-7 graceful-degradation), and three consumer surfaces — `StudyDetail.confidence` API enrichment, `## Confidence` PR body section, and digest narrative `<confidence>` + `<per_query_outcomes>` Jinja blocks. Frontend ships `<ConfidencePanel>` on `/studies/[id]` (between StudyHeader and trials Card) + 6 glossary entries (text lifted verbatim from spec §11 tooltip table) + 2 real-backend Playwright E2E cases. Cross-model review: GPT-5.5 cycle 1 (Epic 1 gate) returned 12 findings — 5 rejected with cited counter-evidence (truncated-diff false positives), 2 deferred, 5 accepted + fixed inline; Gemini Code Assist clean pass; final GPT-5.5 review 3 Low findings all accepted + fixed inline. Tests: 1039 backend unit (+5 digest + 29 confidence + 13 studies confidence + extras), 189 contract (+2 OpenAPI shape lock + 4 PR-body section + 1 endpoint guard for the extended _test seed endpoint), 527 in-container integration (+13 StudyDetail.confidence + 5 migration round-trip + 1 open_pr plumbing + 2 Story 1.2 worker), 567 UI vitest (+14 ConfidencePanel — 13 layout + 1 tooltip-trigger inventory), 10/10 Playwright E2E (+2 ConfidencePanel real-backend). Three follow-ups filed: `chore_guides_glossary_route` (render `glossary.ts` as a `/guide/glossary` route), `chore_guides_faq` (curated operator-judgment Q&A), `chore_guide_06_screenshot_refresh_confidence_panel` (regenerate guide-06 screenshots). Alembic head moves to `0015_trials_per_query_metrics`. Prior — after `feat_pr_metric_confidence` Epic 1 landed locally on the `feat_pr_metric_confidence` branch — backend persistence + analytics + PR-body + digest-prompt surfaces complete, Epic 2 frontend ConfidencePanel ahead. Migration `0015_trials_per_query_metrics` adds the nullable JSONB column behind a CHECK constraint; new pure-Python `backend/app/domain/study/confidence.py` owns bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome classification under FR-7's graceful-degradation contract; new `backend/app/services/study_confidence.py` glues the 4-query read pattern onto the orchestrator and is consumed from `studies._detail()`, the `open_pr` worker, and the digest worker. GPT-5.5 cycle-1 review found 12 issues — 5 rejected as truncated-diff false positives, 2 deferred (plan/code interface drift; full-worker integration test deferred to feat_github_pr_worker's existing suite), 5 accepted + fixed inline (convergence `total_trials = max_trial_number + 1` instead of count; convergence KeyError guard when winner not in summary; pre-existing-row-stays-NULL migration test; Trial model docstring drift on metric key shape; state + architecture docs). 1039 backend unit tests pass (+5 digest prompt cases, +1 convergence assertion), 189 contract, 527/527 in-container integration. Prior — after `feat_agent_propose_search_space` shipped as PR #175 squash `5d29355`). **21st MVP1 feature merged** — 10 stories across 5 epics, all complete. New read-only agent tool `propose_search_space` (the 20th in the registry) builds a deterministic starter search space from a template's `declared_params` using the same heuristic that powers the create-study wizard's auto-fill — a Python port (`backend/app/domain/study/search_space_defaults.py`) of `ui/src/lib/search-space-defaults.ts` with a TS↔Python parity test driven by a shared JSON fixture (18 rows, byte-identical assertions on both sides). Cap-aware overflow guard added on both Python AND TS sides (fixes a latent bug where TS silently returned an invalid space when 8+ fall-through floats blew past 10⁶). Optional `prior_study_id` arg narrows numeric bounds via `winner ± |winner| × bracket` for sign-symmetric math (Gemini #1/#2 fix) with `bracket` threaded through the linear paths (Gemini #3 fix); log-uniform stays at √2. Graceful degrade on template mismatch + missing trial row + non-numeric winner — emits WARN logs (`agent.propose_search_space.prior_template_mismatch` / `.missing_winner_trial`). `ToolContext` gained `conversation_id: str` plumbed from `orchestrator.run_turn` for paired adherence telemetry — INFO events `agent.search_space_proposed` (propose-side) + `agent.create_study.invoked` (create-side) correlate offline by conversation_id per spec FR-6 (grep recipe in `docs/03_runbooks/agent-debugging.md` §5). New `repo.get_trial(db, trial_id)` parallels `repo.get_study`. System prompt updated: 19→20 tools, "Studies (4)" with `propose_search_space` first, new chain-guidance bullet. `ProposeSearchSpaceArgs` uses `ConfigDict(extra="forbid")` (GPT-5.5 F6 fix) so hallucinated LLM args fail Pydantic validation loudly. Spec converged at GPT-5.5 cycle 3 (19 findings, all accepted); plan converged at cycle 3 (8 findings, all accepted). Post-merge review: Gemini 3 findings all fixed in `642b5b9`; GPT-5.5 final review 6 findings — 1 fixed in `945e833`, 1 deferred (structlog migration), 4 rejected with cited counter-evidence (truncated-diff false positives). Tests: 1000 backend unit pass (+87 new cases) + 19 Python parity + 19 TS parity; 38 TS lib + 66 modal still green. Alembic head unchanged at `0014_clusters_target_filter` — feature is purely additive at the application layer. Earlier 2026-05-20 (after `feat_cluster_target_filter` shipped as PR #168 squash `57d3ba0` + follow-up `chore_seed_meaningful_demos` shipped as PR #169 squash `c44d774`). **20th MVP1 feature merged** + demo-state durability gap closed in the same session. PR #168: 5 stories (B1 migration 0014 + ORM column; B3 Pydantic + service plumb-through + responses; B2 adapter Protocol + ElasticAdapter + StubAdapter + router; F1 register modal Target filter input; F2 create-study modal filter-aware empty-state + EntitySelect accessibility improvement). Plus 4 post-impl fix commits (test_migrations head bump, register modal overflow-y-auto, EntitySelect sr-only Gemini fix, spec drift cleanup + OpenAPI shape-lock contract test from GPT-5.5 final review). PR #169: `scripts/seed_meaningful_demos.py` + `make seed-demo` target (idempotent: TRUNCATE clusters CASCADE + DELETE matching ES/OS indices + reseed with per-cluster `target_filter` values baked in — closes the gap where integration tests kept wiping the dev DB with no durable reseed mechanism). 529/529 vitest across 79 files (was 525/78), 903 backend unit tests (was 899), 50 cluster-API integration tests (was 45) + 3 new migration round-trip tests + 7 contract validator cases + OpenAPI shape-lock test. **Alembic head moved to `0014_clusters_target_filter`.** Cross-model review pre-impl: spec + plan both converged at GPT-5.5 cycle 2 (12 findings total, all accepted). Post-impl: Gemini Code Assist 3 findings (2 accepted: EntitySelect sr-only on #168, http() auth type hint on #169; 1 rejected with cited counter-evidence: out-of-scope test file from #168). GPT-5.5 final review on #168: 2 findings, both accepted (spec drift + OpenAPI shape-lock). **Process feedback captured:** `.claude/projects/.../memory/feedback_one_branch_per_session.md` — should have bundled the seed chore into PR #168 rather than spinning a sibling PR. End-to-end smoke verified live before both merges. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Bundled the `get_schema` + `explain` connect-error fix per `bug_get_schema_unhandled_connect_error` in the same PR. 525/525 vitest across 78 files, 33 adapter unit tests + contract suite + integration tests all green twice (initial + post-cycle-2). Gemini Code Assist: 1 finding rejected with cited counter-evidence (pre-existing list-shape assumption matches the wire contract). GPT-5.5 final review: 2 findings — 1 accepted in `19d9d51` (contract-layer TARGETS_FORBIDDEN + CLUSTER_UNREACHABLE envelope assertions), 1 deferred with counter-evidence (dropdown E2E `test.skip`'d; AC coverage satisfied by 8 hook unit + 6 modal unit + integration + contract tests). Two follow-up ideas filed in-PR: `bug_e2e_target_dropdown_flake` + `chore_guide_06_screenshot_refresh_target_picker`.) Earlier — same day (after `feat_create_study_search_space_builder` shipped as PR #163 squash commit `c703953`, bundling the search-space builder feature + the `bug_judgment_lists_listing_ignores_query_set_filter` backend fix surfaced during local verification. 18th MVP1 feature. The builder + bug-fix bundle reflects the single-developer series workflow: rather than spin a sibling backend PR off `main`, the bug fix landed in the same branch since the dev was already in verification mode. PR #163 went through 3 spec cycles (16 findings) + 3 plan cycles (27 findings) + 3 Gemini Code Assist findings + 2 GPT-5.5 final-review passes (1 second-pass Low finding accepted on test coverage) = 47 review findings all accepted with cited fixes. 512 vitest assertions across 77 files, 4 real-backend Playwright e2e cases against the builder, 2 new backend tests for the bundled filter fix. Two follow-up idea files captured during local verification: `feat_create_study_target_autocomplete` (Step-1 free-text target field has no autocomplete from cluster indexes — pre-existing UX debt deferred) and the now-closed `bug_judgment_lists_listing_ignores_query_set_filter` (bundled into this PR).) Earlier (also 2026-05-20) — PR #161 `0879df2` `chore_create_study_modal_e2e_stability` (un-skipped the deferred Playwright spec via `dispatchEvent('click')` on the Radix trigger), PR #160 `160ff6b` `bug_err_metric_frontend_backend_drift` (wire-enum trim — `err` removed from frontend + backend Literal), PR #159 `52e106d` `bug_tutorial_template_param_boost_naming` (heuristic extension for `<field>_boost` suffix). Earlier (also 2026-05-20) — PR #157 `chore_create_study_wizard_polish` — squash commit `075c46b` — merged into `main`. Ships the 4-surface chore: backend template-mismatch validation at create time (two new error codes `SEARCH_SPACE_UNKNOWN_PARAM` + `SEARCH_SPACE_MISSING_DECLARED_PARAM`), Step-4 auto-fill via the new `ui/src/lib/search-space-defaults.ts` heuristic + cap-aware fallback + TS↔Python cardinality parity fixture, 4 new `study.search_space.*` glossary entries (one dual + three short-only) and 6 extended per-metric entries with k-tier clauses, Step-5 tri-state metric+k rendering with new `K_IGNORED` predicate, plus client-side validation mirror + zero-declared block + 404/transient template-fetch recovery + `__placeholder__` warning. 16 new test files + 2 modified + 1 shared JSON fixture across backend unit/integration/contract + frontend unit/component + 1 skipped E2E. Three follow-up ideas captured: `bug_tutorial_template_param_boost_naming` (tutorial template uses `<field>_boost` suffix not matched by the locked heuristic), `chore_create_study_modal_e2e_stability` (re-enable the skipped Playwright spec once EntitySelect disabled gating stabilizes), `bug_err_metric_frontend_backend_drift` (`err` selectable in wizard but unsupported by `scoring.py`). Gemini Code Assist + GPT-5.5 final-pass both adjudicated on the PR — 2 Gemini findings + 7 GPT-5.5 findings, all addressed or filed.) Earlier 2026-05-19 (after a 4-PR shipping run drained the actionable post-MVP1 chore backlog: PR #152 `chore_ci_prettier_check` (`476db78`) + PR #153 `chore_extract_shadcn_select_test_mock` (`199e225`) + PR #154 `chore_form_dropdown_guide_screenshot_refresh` (`ed4121f`) + PR #155 `chore_detail_page_shell_primitive` (`9a72514`). PR #155 is the third primitive after `<DataTable>` and `<EntitySelect>` — 6 detail-page migrations + new lint guard + flattens a latent UX bug where only `proposals/[id]` discriminated 404 from network error. Earlier the same session: PR #150 (`chore_data_table_columnvisibility_tanstack`, `c1e4545`) — closes the residual DataTable follow-ups: item 5 migrates the primitive from `columns.filter(...)` to TanStack's `state.columnVisibility` API (memoized per Gemini feedback), item 3 locked the flat-prop `DataTableProps` API as canonical with a "Shipped contract addendum" on the historical implementation plan's Story 2.6. Folder renamed `chore_data_table_primitive_followups` → `chore_data_table_columnvisibility_tanstack`. Earlier 2026-05-19 PR #148 (`infra_e2e_wire_seed_helper_into_studies_spec`, squash `65f4150`) — restored the 2 digest-panel E2E tests deferred from PR #130, diagnosed and fixed the real root cause of the original smoke-lane failure (`GET /api/v1/proposals` was silently ignoring the `?study_id=` filter, returning the most-recent global pending proposal), added 5-case integration regression coverage at `backend/tests/integration/test_proposals_study_filter.py`. Plus: (a) earlier 2026-05-18 PR #146 (`bug_install_skip_ui_rebuild`, squash `7299fca`) made `make up` rebuild every Compose service (`docker compose build` no-args), switched `make down` to `docker compose down`, and added a `verify_install_builds_all_services.sh` CI gate to lock the contract; (b) earlier 2026-05-18 PR #147 captured `chore_detail_page_shell_primitive` idea (squash `8854e47`). Two new follow-ups filed: `chore_ci_prettier_check` (CI's frontend job has no `prettier --check` step — surfaced when PR #136 drift in 2 unrelated files blocked an unrelated commit) and the in-flight `chore_detail_page_shell_primitive` (third primitive after DataTable + EntitySelect).)
+**Last updated:** 2026-05-21 (after `chore_e2e_test_rows_isolation` merged into `main` as PR #186 squash `a444b94` — **24th MVP1 feature shipped**, 2 stories across 1 epic. Closes the operator-visible-dev-DB pollution: every Playwright E2E run now drains its seeded rows after the suite via a per-worker JSONL cleanup registry, 6 new test-only `DELETE /api/v1/_test/*` endpoints gated by `_require_development_env`, FK-safe drain order (proposals → digests → studies → judgment_lists → query_sets → query_templates → clusters), and a new `cleanup-reporter.ts` Playwright Reporter that asserts `registered_deduped == attempted == deleted + failed + skipped_404 AND failed == 0` after every run. 11 strictly-new error codes (3 `_NOT_FOUND` + 8 `_HAS_DEPENDENT_*`) documented in [`docs/01_architecture/api-conventions.md`](docs/01_architecture/api-conventions.md). Pure `cleanup-core.ts` module extracted from `global-teardown.ts` so the dedupe/order/URL-build logic is unit-testable without fs/network mocks. Cross-model review: GPT-5.5 — spec 3 cycles (26 findings, 25 accepted + 1 deferred to PLAYWRIGHT_CLEANUP_STRICT=1 v2), plan 3 cycles (20 findings, all accepted); Gemini Code Assist 3 Medium findings (all rejected with SQLAlchemy AsyncSession-concurrency counter-evidence — `asyncio.gather` on the same session is forbidden); final GPT-5.5 1 High finding (rejected — truncated-diff false positive on `repo/__init__.py:38–42` import block; verified empirically `from backend.app.db.repo import hard_delete_*` works for all 6). Post-merge CI fix on the same branch: `testMatch: ['**/*.spec.ts']` added to `ui/playwright.config.ts` after the smoke job tried to load vitest `.test.ts` files as Playwright specs. Tests: 1040 backend unit (unchanged); backend integration +20 cases (6 happy + 6 parameterized 404 + 8 409 — covers all 11 strictly-new + 3 reused codes); backend contract +6 env-guard cases + 2 source-presence cases + 6 OpenAPI tuples; UI vitest **630** (was 601 — +29: 19 cleanup-core + 10 global-teardown). CI green on `01acc04` (5/5 jobs incl. smoke 70/70 Playwright). **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. Tangential capture: `chore_e2e_seed_acme_helper_dead/idea.md` (Backlog) — `seedAcmeProductsChain` has no spec caller. Earlier — after `feat_study_target_judgment_mismatch_guard` merged into `main` as PR #184 squash `ce3fcf4` — **23rd MVP1 feature shipped**, 3 stories across 1 epic. Closes the literal study2 incident: `POST /api/v1/studies` now rejects two mismatch classes at create time with specific 422 codes — `JUDGMENT_CLUSTER_MISMATCH` (judgment list and study point at different physical clusters; doc IDs are cluster-scoped so same target name on two clusters still produces zero overlap) and `JUDGMENT_TARGET_MISMATCH` (same cluster but different target index/collection). Cluster fires before target. Both checks fire AFTER FK resolution + the existing `query_set_id` `VALIDATION_ERROR` check. New `?target=` wire filter on `GET /api/v1/judgment-lists` (min_length=1, max_length=255) + `target: str` required field on `JudgmentListSummary` (additive; OpenAPI snapshot + ui/src/lib/types.ts regenerated). Frontend create-study modal Step-2 dropdown now passes `{ query_set_id, cluster_id, target, limit: 200 }` to `useJudgmentLists`; manual-mode `<Input>` uses hoisted `targetReg.onChange(e)` (RHF register preserved) then cascade-resets `judgment_list_id`; dropdown-mode target picker mirrors the same reset; new empty-state copy substitutes the target value + CTA href="/judgments". Drive-by fix bundled: E2E seed helpers (`seedJudgmentList`, `seedFullChain`, `seedStudy`) gain optional `target` overrides; 3 specs updated to align target values so the new FR-1 validator doesn't reject chained POSTs. Cross-model review: spec 3 cycles (17 findings, all accepted, 1 rejected with cited counter-evidence at create-study-modal.tsx:508), plan 3 cycles (16 findings, all accepted, 1 rejected); Gemini Code Assist 2 findings (1 accepted in `035af0a` — IIFE → hoisted register; 1 rejected with precedent counter-evidence at `test_judgments_api_contract.py:215-234`); final GPT-5.5 10 findings (2 accepted in `a358a71` — over-bound 422 test; 8 rejected — 5 truncation false positives + 3 plan/precedent rejects). Tests: 1040 backend unit (unchanged — inline conditionals), backend integration +7 cases (target/cluster mismatch + ordering + AND-semantics + summary shape + over-bound + GET-pre-existing-200), backend contract +2 cases (firing-order lock in `test_studies_api_contract.py` + summary `target` shape lock in `test_judgments_api_contract.py`), UI vitest 567 → 572 (+5: hook wire-filter, dropdown cascade, manual cascade, cluster regression-lock, empty-state CTA). CI green on `a358a71` (5/5 jobs incl. 70/70 Playwright). Alembic head unchanged at `0015_trials_per_query_metrics` — feature is purely additive at the application layer. Prior — after `feat_pr_metric_confidence` merged into `main` as PR #180 squash `d0a8358` — **22nd MVP1 feature shipped**, 9 stories across 2 epics. Backend persistence (migration `0015_trials_per_query_metrics` adds nullable JSONB column behind CHECK), analytics (`backend/app/domain/study/confidence.py` — pure-Python orchestrator + bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome helpers under FR-7 graceful-degradation), and three consumer surfaces — `StudyDetail.confidence` API enrichment, `## Confidence` PR body section, and digest narrative `<confidence>` + `<per_query_outcomes>` Jinja blocks. Frontend ships `<ConfidencePanel>` on `/studies/[id]` (between StudyHeader and trials Card) + 6 glossary entries (text lifted verbatim from spec §11 tooltip table) + 2 real-backend Playwright E2E cases. Cross-model review: GPT-5.5 cycle 1 (Epic 1 gate) returned 12 findings — 5 rejected with cited counter-evidence (truncated-diff false positives), 2 deferred, 5 accepted + fixed inline; Gemini Code Assist clean pass; final GPT-5.5 review 3 Low findings all accepted + fixed inline. Tests: 1039 backend unit (+5 digest + 29 confidence + 13 studies confidence + extras), 189 contract (+2 OpenAPI shape lock + 4 PR-body section + 1 endpoint guard for the extended _test seed endpoint), 527 in-container integration (+13 StudyDetail.confidence + 5 migration round-trip + 1 open_pr plumbing + 2 Story 1.2 worker), 567 UI vitest (+14 ConfidencePanel — 13 layout + 1 tooltip-trigger inventory), 10/10 Playwright E2E (+2 ConfidencePanel real-backend). Three follow-ups filed: `chore_guides_glossary_route` (render `glossary.ts` as a `/guide/glossary` route), `chore_guides_faq` (curated operator-judgment Q&A), `chore_guide_06_screenshot_refresh_confidence_panel` (regenerate guide-06 screenshots). Alembic head moves to `0015_trials_per_query_metrics`. Prior — after `feat_pr_metric_confidence` Epic 1 landed locally on the `feat_pr_metric_confidence` branch — backend persistence + analytics + PR-body + digest-prompt surfaces complete, Epic 2 frontend ConfidencePanel ahead. Migration `0015_trials_per_query_metrics` adds the nullable JSONB column behind a CHECK constraint; new pure-Python `backend/app/domain/study/confidence.py` owns bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome classification under FR-7's graceful-degradation contract; new `backend/app/services/study_confidence.py` glues the 4-query read pattern onto the orchestrator and is consumed from `studies._detail()`, the `open_pr` worker, and the digest worker. GPT-5.5 cycle-1 review found 12 issues — 5 rejected as truncated-diff false positives, 2 deferred (plan/code interface drift; full-worker integration test deferred to feat_github_pr_worker's existing suite), 5 accepted + fixed inline (convergence `total_trials = max_trial_number + 1` instead of count; convergence KeyError guard when winner not in summary; pre-existing-row-stays-NULL migration test; Trial model docstring drift on metric key shape; state + architecture docs). 1039 backend unit tests pass (+5 digest prompt cases, +1 convergence assertion), 189 contract, 527/527 in-container integration. Prior — after `feat_agent_propose_search_space` shipped as PR #175 squash `5d29355`). **21st MVP1 feature merged** — 10 stories across 5 epics, all complete. New read-only agent tool `propose_search_space` (the 20th in the registry) builds a deterministic starter search space from a template's `declared_params` using the same heuristic that powers the create-study wizard's auto-fill — a Python port (`backend/app/domain/study/search_space_defaults.py`) of `ui/src/lib/search-space-defaults.ts` with a TS↔Python parity test driven by a shared JSON fixture (18 rows, byte-identical assertions on both sides). Cap-aware overflow guard added on both Python AND TS sides (fixes a latent bug where TS silently returned an invalid space when 8+ fall-through floats blew past 10⁶). Optional `prior_study_id` arg narrows numeric bounds via `winner ± |winner| × bracket` for sign-symmetric math (Gemini #1/#2 fix) with `bracket` threaded through the linear paths (Gemini #3 fix); log-uniform stays at √2. Graceful degrade on template mismatch + missing trial row + non-numeric winner — emits WARN logs (`agent.propose_search_space.prior_template_mismatch` / `.missing_winner_trial`). `ToolContext` gained `conversation_id: str` plumbed from `orchestrator.run_turn` for paired adherence telemetry — INFO events `agent.search_space_proposed` (propose-side) + `agent.create_study.invoked` (create-side) correlate offline by conversation_id per spec FR-6 (grep recipe in `docs/03_runbooks/agent-debugging.md` §5). New `repo.get_trial(db, trial_id)` parallels `repo.get_study`. System prompt updated: 19→20 tools, "Studies (4)" with `propose_search_space` first, new chain-guidance bullet. `ProposeSearchSpaceArgs` uses `ConfigDict(extra="forbid")` (GPT-5.5 F6 fix) so hallucinated LLM args fail Pydantic validation loudly. Spec converged at GPT-5.5 cycle 3 (19 findings, all accepted); plan converged at cycle 3 (8 findings, all accepted). Post-merge review: Gemini 3 findings all fixed in `642b5b9`; GPT-5.5 final review 6 findings — 1 fixed in `945e833`, 1 deferred (structlog migration), 4 rejected with cited counter-evidence (truncated-diff false positives). Tests: 1000 backend unit pass (+87 new cases) + 19 Python parity + 19 TS parity; 38 TS lib + 66 modal still green. Alembic head unchanged at `0014_clusters_target_filter` — feature is purely additive at the application layer. Earlier 2026-05-20 (after `feat_cluster_target_filter` shipped as PR #168 squash `57d3ba0` + follow-up `chore_seed_meaningful_demos` shipped as PR #169 squash `c44d774`). **20th MVP1 feature merged** + demo-state durability gap closed in the same session. PR #168: 5 stories (B1 migration 0014 + ORM column; B3 Pydantic + service plumb-through + responses; B2 adapter Protocol + ElasticAdapter + StubAdapter + router; F1 register modal Target filter input; F2 create-study modal filter-aware empty-state + EntitySelect accessibility improvement). Plus 4 post-impl fix commits (test_migrations head bump, register modal overflow-y-auto, EntitySelect sr-only Gemini fix, spec drift cleanup + OpenAPI shape-lock contract test from GPT-5.5 final review). PR #169: `scripts/seed_meaningful_demos.py` + `make seed-demo` target (idempotent: TRUNCATE clusters CASCADE + DELETE matching ES/OS indices + reseed with per-cluster `target_filter` values baked in — closes the gap where integration tests kept wiping the dev DB with no durable reseed mechanism). 529/529 vitest across 79 files (was 525/78), 903 backend unit tests (was 899), 50 cluster-API integration tests (was 45) + 3 new migration round-trip tests + 7 contract validator cases + OpenAPI shape-lock test. **Alembic head moved to `0014_clusters_target_filter`.** Cross-model review pre-impl: spec + plan both converged at GPT-5.5 cycle 2 (12 findings total, all accepted). Post-impl: Gemini Code Assist 3 findings (2 accepted: EntitySelect sr-only on #168, http() auth type hint on #169; 1 rejected with cited counter-evidence: out-of-scope test file from #168). GPT-5.5 final review on #168: 2 findings, both accepted (spec drift + OpenAPI shape-lock). **Process feedback captured:** `.claude/projects/.../memory/feedback_one_branch_per_session.md` — should have bundled the seed chore into PR #168 rather than spinning a sibling PR. End-to-end smoke verified live before both merges. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Bundled the `get_schema` + `explain` connect-error fix per `bug_get_schema_unhandled_connect_error` in the same PR. 525/525 vitest across 78 files, 33 adapter unit tests + contract suite + integration tests all green twice (initial + post-cycle-2). Gemini Code Assist: 1 finding rejected with cited counter-evidence (pre-existing list-shape assumption matches the wire contract). GPT-5.5 final review: 2 findings — 1 accepted in `19d9d51` (contract-layer TARGETS_FORBIDDEN + CLUSTER_UNREACHABLE envelope assertions), 1 deferred with counter-evidence (dropdown E2E `test.skip`'d; AC coverage satisfied by 8 hook unit + 6 modal unit + integration + contract tests). Two follow-up ideas filed in-PR: `bug_e2e_target_dropdown_flake` + `chore_guide_06_screenshot_refresh_target_picker`.) Earlier — same day (after `feat_create_study_search_space_builder` shipped as PR #163 squash commit `c703953`, bundling the search-space builder feature + the `bug_judgment_lists_listing_ignores_query_set_filter` backend fix surfaced during local verification. 18th MVP1 feature. The builder + bug-fix bundle reflects the single-developer series workflow: rather than spin a sibling backend PR off `main`, the bug fix landed in the same branch since the dev was already in verification mode. PR #163 went through 3 spec cycles (16 findings) + 3 plan cycles (27 findings) + 3 Gemini Code Assist findings + 2 GPT-5.5 final-review passes (1 second-pass Low finding accepted on test coverage) = 47 review findings all accepted with cited fixes. 512 vitest assertions across 77 files, 4 real-backend Playwright e2e cases against the builder, 2 new backend tests for the bundled filter fix. Two follow-up idea files captured during local verification: `feat_create_study_target_autocomplete` (Step-1 free-text target field has no autocomplete from cluster indexes — pre-existing UX debt deferred) and the now-closed `bug_judgment_lists_listing_ignores_query_set_filter` (bundled into this PR).) Earlier (also 2026-05-20) — PR #161 `0879df2` `chore_create_study_modal_e2e_stability` (un-skipped the deferred Playwright spec via `dispatchEvent('click')` on the Radix trigger), PR #160 `160ff6b` `bug_err_metric_frontend_backend_drift` (wire-enum trim — `err` removed from frontend + backend Literal), PR #159 `52e106d` `bug_tutorial_template_param_boost_naming` (heuristic extension for `<field>_boost` suffix). Earlier (also 2026-05-20) — PR #157 `chore_create_study_wizard_polish` — squash commit `075c46b` — merged into `main`. Ships the 4-surface chore: backend template-mismatch validation at create time (two new error codes `SEARCH_SPACE_UNKNOWN_PARAM` + `SEARCH_SPACE_MISSING_DECLARED_PARAM`), Step-4 auto-fill via the new `ui/src/lib/search-space-defaults.ts` heuristic + cap-aware fallback + TS↔Python cardinality parity fixture, 4 new `study.search_space.*` glossary entries (one dual + three short-only) and 6 extended per-metric entries with k-tier clauses, Step-5 tri-state metric+k rendering with new `K_IGNORED` predicate, plus client-side validation mirror + zero-declared block + 404/transient template-fetch recovery + `__placeholder__` warning. 16 new test files + 2 modified + 1 shared JSON fixture across backend unit/integration/contract + frontend unit/component + 1 skipped E2E. Three follow-up ideas captured: `bug_tutorial_template_param_boost_naming` (tutorial template uses `<field>_boost` suffix not matched by the locked heuristic), `chore_create_study_modal_e2e_stability` (re-enable the skipped Playwright spec once EntitySelect disabled gating stabilizes), `bug_err_metric_frontend_backend_drift` (`err` selectable in wizard but unsupported by `scoring.py`). Gemini Code Assist + GPT-5.5 final-pass both adjudicated on the PR — 2 Gemini findings + 7 GPT-5.5 findings, all addressed or filed.) Earlier 2026-05-19 (after a 4-PR shipping run drained the actionable post-MVP1 chore backlog: PR #152 `chore_ci_prettier_check` (`476db78`) + PR #153 `chore_extract_shadcn_select_test_mock` (`199e225`) + PR #154 `chore_form_dropdown_guide_screenshot_refresh` (`ed4121f`) + PR #155 `chore_detail_page_shell_primitive` (`9a72514`). PR #155 is the third primitive after `<DataTable>` and `<EntitySelect>` — 6 detail-page migrations + new lint guard + flattens a latent UX bug where only `proposals/[id]` discriminated 404 from network error. Earlier the same session: PR #150 (`chore_data_table_columnvisibility_tanstack`, `c1e4545`) — closes the residual DataTable follow-ups: item 5 migrates the primitive from `columns.filter(...)` to TanStack's `state.columnVisibility` API (memoized per Gemini feedback), item 3 locked the flat-prop `DataTableProps` API as canonical with a "Shipped contract addendum" on the historical implementation plan's Story 2.6. Folder renamed `chore_data_table_primitive_followups` → `chore_data_table_columnvisibility_tanstack`. Earlier 2026-05-19 PR #148 (`infra_e2e_wire_seed_helper_into_studies_spec`, squash `65f4150`) — restored the 2 digest-panel E2E tests deferred from PR #130, diagnosed and fixed the real root cause of the original smoke-lane failure (`GET /api/v1/proposals` was silently ignoring the `?study_id=` filter, returning the most-recent global pending proposal), added 5-case integration regression coverage at `backend/tests/integration/test_proposals_study_filter.py`. Plus: (a) earlier 2026-05-18 PR #146 (`bug_install_skip_ui_rebuild`, squash `7299fca`) made `make up` rebuild every Compose service (`docker compose build` no-args), switched `make down` to `docker compose down`, and added a `verify_install_builds_all_services.sh` CI gate to lock the contract; (b) earlier 2026-05-18 PR #147 captured `chore_detail_page_shell_primitive` idea (squash `8854e47`). Two new follow-ups filed: `chore_ci_prettier_check` (CI's frontend job has no `prettier --check` step — surfaced when PR #136 drift in 2 unrelated files blocked an unrelated commit) and the in-flight `chore_detail_page_shell_primitive` (third primitive after DataTable + EntitySelect).)
 
 ---
 
 ## Current branch / execution context
 
-- **Branch:** `docs/finalize-study-target-judgment-mismatch-guard` — finalization docs PR after PR #184 (`ce3fcf4`) merged 2026-05-21. `feature/study-target-judgment-mismatch-guard` branch deleted post-merge. Earlier: `docs/finalize-pr-metric-confidence` — finalization docs PR after PR #180 (`d0a8358`) merged 2026-05-21. `feat_pr_metric_confidence` branch deleted post-merge. Earlier: `docs/finalize-agent-propose-search-space` — finalization docs PR after PR #175 (`5d29355`) merged 2026-05-21. `feature/agent-propose-search-space` deleted post-merge. Earlier: `docs/finalize-cluster-target-filter` — finalization docs PR after PR #168 (`57d3ba0`) + PR #169 (`c44d774`) both merged. Prior `main` post-merge of PR #168 squash `57d3ba0` (`feat_cluster_target_filter`) + PR #169 squash `c44d774` (`chore_seed_meaningful_demos`) 2026-05-20. Earlier: PR #165 squash commit `bd4516a` 2026-05-20. Finalization docs branch `docs/finalize-create-study-target-autocomplete`. Prior squash same day: PR #163 `c703953` (`feat_create_study_search_space_builder`). Finalization docs PR off `docs/finalize-create-study-search-space-builder`. Prior squashes (same day): PR #161 `0879df2` (`chore_create_study_modal_e2e_stability`), PR #160 `160ff6b` (`bug_err_metric_frontend_backend_drift`), PR #159 `52e106d` (`bug_tutorial_template_param_boost_naming`), PR #158 `308c315` (finalize chore_create_study_wizard_polish), PR #157 `075c46b` (`chore_create_study_wizard_polish`). Prior squash: PR #155 `9a72514` 2026-05-19. Prior squashes: PR #154 `ed4121f` 2026-05-19 (`chore_form_dropdown_guide_screenshot_refresh`), PR #153 `199e225` 2026-05-19 (`chore_extract_shadcn_select_test_mock`), PR #152 `476db78` 2026-05-19 (`chore_ci_prettier_check`), PR #151 `110dc5a` 2026-05-19 (finalize chore_data_table_columnvisibility_tanstack), PR #150 `c1e4545` 2026-05-19 (`chore_data_table_columnvisibility_tanstack`), PR #149 `da9506b` 2026-05-19 (finalize infra_e2e_wire_seed_helper_into_studies_spec), PR #148 `65f4150` 2026-05-19 (`infra_e2e_wire_seed_helper_into_studies_spec` — `?study_id=` filter bug + E2E test restore), PR #147 `8854e47` 2026-05-18 (capture chore_detail_page_shell_primitive idea), PR #146 `7299fca` 2026-05-18 (bug_install_skip_ui_rebuild — `make up`/`make down` lifecycle fix), PR #136 `cb7d9ee` 2026-05-18 (chore_form_dropdown_primitive), PR #132 `ee4c8d4` 2026-05-17 (chore_data_table_primitive_followups items 1+2+4+6), PR #130 `13b3383` 2026-05-17 (infra_e2e_seed_completed_study), PR #128 `73459d2` 2026-05-17 (bug_cursor_decode_value_validation), PR #126 `d6115b3` 2026-05-16 (feat_data_table_primitive). `v0.1.0` annotated tag still on `main` commit `d099536` 2026-05-13; GitHub Release at https://github.com/SoundMindsAI/relyloop/releases/tag/v0.1.0.
-- **Active feature:** none in flight (PR #184 closed `feat_study_target_judgment_mismatch_guard` on 2026-05-21 as the **23rd MVP1 feature** merged; only finalization docs PR remains). Prior: none in flight (PR #180 closed `feat_pr_metric_confidence` on 2026-05-21 as the **22nd MVP1 feature** merged; only finalization docs PR remains). Prior: none in flight (PR #175 closed `feat_agent_propose_search_space` on 2026-05-21; only finalization docs PR remains for the 21st MVP1 feature). Prior — none in flight (PR #168 closed `feat_cluster_target_filter` + PR #169 closed `chore_seed_meaningful_demos` on 2026-05-20; only finalization docs PR remains for the 20th MVP1 feature). Prior — none in flight (PR #165 closed `feat_create_study_target_autocomplete` + the bundled `bug_get_schema_unhandled_connect_error` fix on 2026-05-20). Prior — none in flight (PR #163 closed `feat_create_study_search_space_builder` + the `bug_judgment_lists_listing_ignores_query_set_filter` bundled fix on 2026-05-20). PR #168 closed `feat_cluster_target_filter` + PR #169 closed `chore_seed_meaningful_demos` (sibling). **Three PRs shipped 2026-05-15:** PR #122 (Phase 1, 16th MVP1 feature — Tooltip primitive + 26 placements on create-study modal + study detail), PR #123 (Phase 1 finalization docs), PR #124 (Phases 2 + 3 — 17th MVP1 feature; 21 additional tooltips on judgments + proposals + cluster registration + 2 new first-run components: chat ExamplePrompts strip + Stripe-style StartHereChecklist on home page). The original "MVP1 Phase 1 only" scope-lock was reversed mid-day: operator decided to ship Phases 2 + 3 together with a Stripe-style design call rather than wait for MVP2. PR #124 took 2 hours from idea-folder reuse to merge. 47 total tooltip placements + 2 new first-run components live in `main`. **PR #122 shipped 2026-05-15 morning** — `feat_contextual_help` Phase 1 (16th MVP1 feature). Adds the first Tooltip primitive (`@radix-ui/react-tooltip@~1.2.8` + shadcn-style wrapper at `ui/src/components/ui/tooltip.tsx`), two glossary-backed wrappers (`InfoTooltip` standalone + asChild modes; `HelpPopover` click-to-open with `react-markdown` safety filter), and a 49-key glossary source-of-truth at `ui/src/lib/glossary.ts` (8 enum groups parity-tested against `enums.ts`). 26 tooltip placements across the create-study modal (Step 1 target + Step 3 template + 9 Step 5 inputs), study-header (status badge dynamic key + Best metric + Trials), trials-table (5 column headers + Sort label), and digest panel (5 section labels + Open PR enabled + Open PR disabled). The disabled Open PR button refactored from native `disabled` to `aria-disabled="true"` so it stays focusable and the tooltip reveals on focus (AC-11). Gemini Code Assist: 2 findings (1 accepted + fixed, 1 rejected with cited counter-evidence). Final GPT-5.5 review: 1 Medium accepted-framing-but-deferred. Spec converged at GPT-5.5 cycle 3 (24 findings, 23 accepted + 1 rejected); plan converged at cycle 2 (12 findings, 10 accepted + 1 rejected + 1 spec patch). UI vitest now **279 passing across 48 files** (was 249 across 45 — +3 new test files, +30 cases). Playwright E2E **8 passing** (was 5 — +3 new contextual-help tests). One follow-up filed: `infra_e2e_seed_completed_study/idea.md` tracks the E2E gap for digest-panel triggers + AC-11 (cross-subsystem helper for seeding a completed study with digest + proposal; component-level coverage is in place). Phases 2 + 3 deferred to MVP2 via `feat_contextual_help_mvp2/` (judgments + proposals tooltips; chat + cluster + home onboarding; the home-page "Start here" panel is the only product-design-shaped item).
+- **Branch:** `docs/finalize-e2e-test-rows-isolation` — finalization docs PR after PR #186 (`a444b94`) merged 2026-05-21. `chore/e2e-test-rows-isolation` branch deleted post-merge. Earlier: `docs/finalize-study-target-judgment-mismatch-guard` — finalization docs PR after PR #184 (`ce3fcf4`) merged 2026-05-21. `feature/study-target-judgment-mismatch-guard` branch deleted post-merge. Earlier: `docs/finalize-pr-metric-confidence` — finalization docs PR after PR #180 (`d0a8358`) merged 2026-05-21. `feat_pr_metric_confidence` branch deleted post-merge. Earlier: `docs/finalize-agent-propose-search-space` — finalization docs PR after PR #175 (`5d29355`) merged 2026-05-21. `feature/agent-propose-search-space` deleted post-merge. Earlier: `docs/finalize-cluster-target-filter` — finalization docs PR after PR #168 (`57d3ba0`) + PR #169 (`c44d774`) both merged. Prior `main` post-merge of PR #168 squash `57d3ba0` (`feat_cluster_target_filter`) + PR #169 squash `c44d774` (`chore_seed_meaningful_demos`) 2026-05-20. Earlier: PR #165 squash commit `bd4516a` 2026-05-20. Finalization docs branch `docs/finalize-create-study-target-autocomplete`. Prior squash same day: PR #163 `c703953` (`feat_create_study_search_space_builder`). Finalization docs PR off `docs/finalize-create-study-search-space-builder`. Prior squashes (same day): PR #161 `0879df2` (`chore_create_study_modal_e2e_stability`), PR #160 `160ff6b` (`bug_err_metric_frontend_backend_drift`), PR #159 `52e106d` (`bug_tutorial_template_param_boost_naming`), PR #158 `308c315` (finalize chore_create_study_wizard_polish), PR #157 `075c46b` (`chore_create_study_wizard_polish`). Prior squash: PR #155 `9a72514` 2026-05-19. Prior squashes: PR #154 `ed4121f` 2026-05-19 (`chore_form_dropdown_guide_screenshot_refresh`), PR #153 `199e225` 2026-05-19 (`chore_extract_shadcn_select_test_mock`), PR #152 `476db78` 2026-05-19 (`chore_ci_prettier_check`), PR #151 `110dc5a` 2026-05-19 (finalize chore_data_table_columnvisibility_tanstack), PR #150 `c1e4545` 2026-05-19 (`chore_data_table_columnvisibility_tanstack`), PR #149 `da9506b` 2026-05-19 (finalize infra_e2e_wire_seed_helper_into_studies_spec), PR #148 `65f4150` 2026-05-19 (`infra_e2e_wire_seed_helper_into_studies_spec` — `?study_id=` filter bug + E2E test restore), PR #147 `8854e47` 2026-05-18 (capture chore_detail_page_shell_primitive idea), PR #146 `7299fca` 2026-05-18 (bug_install_skip_ui_rebuild — `make up`/`make down` lifecycle fix), PR #136 `cb7d9ee` 2026-05-18 (chore_form_dropdown_primitive), PR #132 `ee4c8d4` 2026-05-17 (chore_data_table_primitive_followups items 1+2+4+6), PR #130 `13b3383` 2026-05-17 (infra_e2e_seed_completed_study), PR #128 `73459d2` 2026-05-17 (bug_cursor_decode_value_validation), PR #126 `d6115b3` 2026-05-16 (feat_data_table_primitive). `v0.1.0` annotated tag still on `main` commit `d099536` 2026-05-13; GitHub Release at https://github.com/SoundMindsAI/relyloop/releases/tag/v0.1.0.
+- **Active feature:** none in flight (PR #186 closed `chore_e2e_test_rows_isolation` on 2026-05-21 as the **24th MVP1 feature** merged; only finalization docs PR remains). Prior: none in flight (PR #184 closed `feat_study_target_judgment_mismatch_guard` on 2026-05-21 as the **23rd MVP1 feature** merged; only finalization docs PR remains). Prior: none in flight (PR #180 closed `feat_pr_metric_confidence` on 2026-05-21 as the **22nd MVP1 feature** merged; only finalization docs PR remains). Prior: none in flight (PR #175 closed `feat_agent_propose_search_space` on 2026-05-21; only finalization docs PR remains for the 21st MVP1 feature). Prior — none in flight (PR #168 closed `feat_cluster_target_filter` + PR #169 closed `chore_seed_meaningful_demos` on 2026-05-20; only finalization docs PR remains for the 20th MVP1 feature). Prior — none in flight (PR #165 closed `feat_create_study_target_autocomplete` + the bundled `bug_get_schema_unhandled_connect_error` fix on 2026-05-20). Prior — none in flight (PR #163 closed `feat_create_study_search_space_builder` + the `bug_judgment_lists_listing_ignores_query_set_filter` bundled fix on 2026-05-20). PR #168 closed `feat_cluster_target_filter` + PR #169 closed `chore_seed_meaningful_demos` (sibling). **Three PRs shipped 2026-05-15:** PR #122 (Phase 1, 16th MVP1 feature — Tooltip primitive + 26 placements on create-study modal + study detail), PR #123 (Phase 1 finalization docs), PR #124 (Phases 2 + 3 — 17th MVP1 feature; 21 additional tooltips on judgments + proposals + cluster registration + 2 new first-run components: chat ExamplePrompts strip + Stripe-style StartHereChecklist on home page). The original "MVP1 Phase 1 only" scope-lock was reversed mid-day: operator decided to ship Phases 2 + 3 together with a Stripe-style design call rather than wait for MVP2. PR #124 took 2 hours from idea-folder reuse to merge. 47 total tooltip placements + 2 new first-run components live in `main`. **PR #122 shipped 2026-05-15 morning** — `feat_contextual_help` Phase 1 (16th MVP1 feature). Adds the first Tooltip primitive (`@radix-ui/react-tooltip@~1.2.8` + shadcn-style wrapper at `ui/src/components/ui/tooltip.tsx`), two glossary-backed wrappers (`InfoTooltip` standalone + asChild modes; `HelpPopover` click-to-open with `react-markdown` safety filter), and a 49-key glossary source-of-truth at `ui/src/lib/glossary.ts` (8 enum groups parity-tested against `enums.ts`). 26 tooltip placements across the create-study modal (Step 1 target + Step 3 template + 9 Step 5 inputs), study-header (status badge dynamic key + Best metric + Trials), trials-table (5 column headers + Sort label), and digest panel (5 section labels + Open PR enabled + Open PR disabled). The disabled Open PR button refactored from native `disabled` to `aria-disabled="true"` so it stays focusable and the tooltip reveals on focus (AC-11). Gemini Code Assist: 2 findings (1 accepted + fixed, 1 rejected with cited counter-evidence). Final GPT-5.5 review: 1 Medium accepted-framing-but-deferred. Spec converged at GPT-5.5 cycle 3 (24 findings, 23 accepted + 1 rejected); plan converged at cycle 2 (12 findings, 10 accepted + 1 rejected + 1 spec patch). UI vitest now **279 passing across 48 files** (was 249 across 45 — +3 new test files, +30 cases). Playwright E2E **8 passing** (was 5 — +3 new contextual-help tests). One follow-up filed: `infra_e2e_seed_completed_study/idea.md` tracks the E2E gap for digest-panel triggers + AC-11 (cross-subsystem helper for seeding a completed study with digest + proposal; component-level coverage is in place). Phases 2 + 3 deferred to MVP2 via `feat_contextual_help_mvp2/` (judgments + proposals tooltips; chat + cluster + home onboarding; the home-page "Start here" panel is the only product-design-shaped item).
 
 **Earlier — seven PRs shipped 2026-05-14:** `feat_judgments_periodic_resume_sweep` (PR #104, 14th MVP1 feature), `bug_query_inline_crud_since_filter_uuidv7_ms_collision` (PR #106 — UUIDv7 ms-collision test flake), `infra_dashboard_regen_pre_commit_conflict §2+§4` (PR #108 — dashboard regen idempotency + relative-link rewriting), `infra_make_targets_split_backend_only` (PR #110 — `make backend-fmt/lint/typecheck` + symmetric `ui-fmt` so Node-18 contributors aren't blocked), `chore_digest_worker_narrow_except` (PR #112 — narrowed `except Exception` allowlist to `(ValueError,)` + ERROR-level `digest_importance_failed_unexpected` event), `infra_structlog_test_helpers` (PR #114 — factored the two structlog test-assertion patterns into `backend/tests/_log_helpers.py`), and `chore_chat_last_message_preview` (PR #117 — `last_message_preview` + `last_message_at` on `ConversationSummary` via LATERAL JOIN; frontend shows preview under title + swaps displayed timestamp from `created_at` to `last_message_at`). Plus PR #116 dropped `chore_studies_ui_shadcn_polish` as won't-do (forward-compat audit on NavigationMenu primitive + ClusterFilterSelect precedent on native `<select>` for page-level controls). The 7-PR day drained the operational-friction surfaces and the last `/bug-fix`-shaped item. **MVP1 alpha + 17 features shipped (all 3 phases of feat_contextual_help on 2026-05-15) + 26 backlog items drained.** Manual maintainer steps still pending from [`release-checklist.md`](docs/03_runbooks/release-checklist.md): §4 fresh-VM hosted-OpenAI walkthrough, §5 local-LLM walkthrough, §8 feedback Discussion + design-partner channel shares. **Remaining MVP1 backlog: zero actionable items** — the MVP1 dashboard now reads `Path to MVP1: 0`. Two ideas held for MVP2 (visible on the new `MVP2_DASHBOARD.md`): `infra_arq_subprocess_test_mvp2` (folder renamed 2026-05-14 to mirror the `_mvp2` precedent — trigger-locked: arq pin bump, 3rd cron, or MVP3 hardening opt-in) and `bug_chat_long_conversation_truncation_mvp2` (paired with the just-shipped chat preview but kept on MVP2 hold per its preflight — needs `/pipeline` for migration + new LLM round-trip). Plus 4 keep-deferred items by operator decision + `infra_dashboard_regen_pre_commit_conflict §3` follow-up (runbook addendum, MVP1-eligible but low-leverage).
 - **Alembic head:** `0015_trials_per_query_metrics` (added by `feat_pr_metric_confidence` Epic 1 — adds nullable `trials.per_query_metrics JSONB` + `trials_per_query_metrics_object_check` CHECK constraint enforcing `IS NULL OR jsonb_typeof = 'object'`; round-trip + pre-existing-row-NULL behavior verified). Prior head was `0014_clusters_target_filter` (PR #168 — nullable `clusters.target_filter VARCHAR(256)`).
@@ -19,6 +19,8 @@
 
 ## Most recent meaningful changes (newest first)
 
+- **2026-05-21 — `chore_e2e_test_rows_isolation` merged into `main` as PR #186 squash `a444b94`.** **24th MVP1 feature shipped**, 2 stories across 1 epic. Closes the operator-visible-dev-DB pollution: every Playwright E2E run now drains its seeded rows after the suite. **Backend (Story 1.1)**: 6 new `DELETE /api/v1/_test/{proposals,digests,studies,judgment-lists,query-sets,query-templates}/{id}` endpoints, gated by `Depends(_require_development_env)` so they 404 outside `ENVIRONMENT=development`. Hard-delete with preflight `SELECT EXISTS` for non-cascade dependents → 409 with resource-specific `<RESOURCE>_HAS_DEPENDENT_<DEPENDENT>` code. 11 strictly-new error codes (3 `_NOT_FOUND` + 8 `_HAS_DEPENDENT_*`) documented in `docs/01_architecture/api-conventions.md`. 6 new `hard_delete_<resource>(db, id) -> bool` repo functions using the fetch-then-delete pattern (matches `soft_delete_cluster` precedent — sidesteps mypy's `Result[Any].rowcount` gap). **Frontend (Story 1.2)**: Per-worker JSONL cleanup registry — every `seedXxx()` helper appends `(resource, id)` to `test-results/.cleanup/worker-${TEST_WORKER_INDEX}.jsonl`. New `globalSetup.ts` clears stale artifacts. New `globalTeardown.ts` reads → dedupes → reorders FK-safe (proposals → digests → studies → judgment_lists → query_sets → query_templates → clusters) → DELETEs against the live backend → writes `test-results/cleanup-summary.json` with shape `{registered, registered_deduped, attempted, deleted, failed, skipped_404, parse_failures, details}`. New `cleanup-reporter.ts` Playwright Reporter implements `onEnd` and asserts the invariant `registered_deduped === attempted === deleted + failed + skipped_404 AND failed === 0`. Best-effort contract: teardown wrapped in try/catch/finally so unexpected errors never reject the promise; AbortController 5s timeout per fetch; parse failures count toward `failed` so registry corruption surfaces in the reporter. Pure `cleanup-core.ts` module extracted (`ResourceType`, `RESOURCE_PATH_MAP`, `DRAIN_ORDER`, `dedupeEntries`, `orderEntries`, `buildDeleteUrl`, `readCleanupEntriesFromDir`) so the dedupe/order/URL-build/read logic is unit-testable without fs/network mocks. `vitest.config.ts` extended to `include: ['tests/e2e/**/*.test.ts']` since the new vitest files live alongside Playwright `.spec.ts` files. `playwright.config.ts` gained `testMatch: ['**/*.spec.ts']` (post-merge fix on the same branch — CI smoke job tried to load vitest `.test.ts` files as Playwright specs and crashed importing `vitest`; the testMatch pin keeps the two runners in their own lanes). **Cross-model review**: spec converged at GPT-5.5 cycle 3 (26 findings: 25 accepted + 1 deferred to `PLAYWRIGHT_CLEANUP_STRICT=1` v2 follow-up); plan converged at cycle 3 (20 findings, all accepted). Gemini Code Assist: 3 Medium findings, all rejected with SQLAlchemy AsyncSession-concurrency counter-evidence — Gemini suggested `asyncio.gather`-ing the preflight `SELECT EXISTS` calls, but per [SQLAlchemy docs](https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html#using-asyncsession-with-concurrent-tasks) "AsyncSession is not safe for use in concurrent tasks"; the existing `asyncio.gather` precedent at `backend/app/api/health.py:233` parallelizes separate engine/client probes, not multiple queries on the same DB session. Final GPT-5.5 review: 1 High finding rejected (truncated-diff false positive claiming `hard_delete_digest` was imported in the `conversation` block; actual import block at `backend/app/db/repo/__init__.py:38–42` is correct, verified empirically with `uv run python -c "from backend.app.db.repo import hard_delete_*"`). **Tests**: 1040 backend unit (unchanged); backend integration +20 cases (6 happy + 6 parameterized 404 + 8 409 — covers all 11 strictly-new + 3 reused codes); backend contract +6 env-guard cases + 2 source-presence cases asserting all 11 strictly-new error-code literals appear in `_test.py` + 6 new OpenAPI tuples; UI vitest **630** across 85 files (was 601 — +29 across 2 new files: 19 cleanup-core + 10 global-teardown). CI green on final HEAD (5/5 jobs incl. smoke 70/70 Playwright). **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. Tangential capture: `chore_e2e_seed_acme_helper_dead/idea.md` (Backlog) — `seedAcmeProductsChain` at `ui/tests/e2e/helpers/seed.ts:378` has zero spec callers; dead helper, either delete or wire a spec. Helper-coverage audit at `ui/tests/e2e/helpers/coverage-audit.md` documents 8 of 9 helpers covered.
+
 - **2026-05-21 — `feat_study_target_judgment_mismatch_guard` merged into `main` as PR #184 squash `ce3fcf4`.** **23rd MVP1 feature shipped**, 3 stories across 1 epic. Closes the literal study2 incident (study UUID `019e4be6-207e-7c32-9889-f6c3003f57c2` — 1000 trials × 4.5 minutes × `best_metric=0.0` because the judgment list was authored against `e2e-target` but the study queried `docs-articles`). `POST /api/v1/studies` now rejects two mismatch classes at create time with specific 422 codes: **`JUDGMENT_CLUSTER_MISMATCH`** (judgment list and study point at different physical clusters; doc IDs are cluster-scoped) and **`JUDGMENT_TARGET_MISMATCH`** (same cluster, different target index/collection). Cluster fires before target; both fire AFTER FK resolution + the existing `VALIDATION_ERROR` query_set check. **Backend (Story 1.1)**: `JudgmentListSummary` gains required `target: str` field; new `?target=` Query param on `GET /api/v1/judgment-lists` (min_length=1, max_length=255 — ES/OpenSearch index-name ceiling); thread to `repo.list_judgment_lists` + `count_judgment_lists` with AND-semantics alongside the existing `query_set_id` / `cluster_id` filters. `ui/src/lib/types.ts` regenerated from live OpenAPI. **Backend (Story 1.2)**: two new validator blocks in `studies.py` between the existing query_set check and the config serialization. `docs/01_architecture/api-conventions.md` gains both new error-code rows (in firing order) in the studies-endpoint table. **Frontend (Story 2.1)**: create-study modal Step-2 dropdown call passes `{ query_set_id, cluster_id, target, limit: 200 }` to `useJudgmentLists`; manual-mode `<Input>` uses a hoisted `targetReg = form.register('target')` (RHF register preserved) with `onChange={(e) => { targetReg.onChange(e); form.setValue('judgment_list_id', ''); }}` for cascade reset; dropdown-mode target `onChange` mirrors the same cascade; new `emptyState` on the judgment-list `<EntitySelect>` substitutes the watched target value + CTA href="/judgments". `useJudgmentLists` filter type extended; `target` threaded through both params AND queryKey for cache scoping. **Drive-by fix bundled (per inline-fix rubric)**: E2E seed helpers (`seedJudgmentList`, `seedFullChain`, `seedStudy`) gain optional `target` overrides — `seedJudgmentList` default changed from `'e2e-target'` to `'products'` matching `seedStudy`'s default; 3 specs updated (`studies-create-validation.spec.ts` fill→'products', `studies-create-builder.spec.ts` passes `judgmentListTarget: 'e2e-builder-target'` to align with modal fill, `studies-create-target-dropdown.spec.ts` passes the alpha seeded ES index name). **Cross-model review**: spec converged at GPT-5.5 cycle 3 (17 findings — all 17 accepted, 1 rejected with cited counter-evidence at `create-study-modal.tsx:508`); plan converged at cycle 3 (16 findings, all accepted); Gemini Code Assist 2 findings (1 accepted in `035af0a` — IIFE → hoisted register; 1 rejected with precedent counter-evidence at `test_judgments_api_contract.py:215`); final GPT-5.5 review 10 findings — 2 accepted in `a358a71` (Story 1.1 over-bound 422 test on `?target=<256 chars>`), 8 rejected (5 truncation false positives + 3 plan/precedent rejects). Adjudication summary posted at https://github.com/SoundMindsAI/relyloop/pull/184#issuecomment-4513369521. **Tests**: 1040 backend unit (unchanged — validators are inline conditionals); backend integration +7 cases (target mismatch + cluster mismatch + cluster-fires-before-target + qs-fires-first ordering + GET pre-existing 200 negative test + `?target=` AND-semantics across 4 lists × 2 clusters × 2 query-sets + summary shape + over-bound 422); backend contract +2 cases (firing-order source-presence lock at `test_studies_api_contract.py` + summary `target` shape lock at `test_judgments_api_contract.py`); UI vitest **572** across 83 files (was 567/83 — +5: hook wire-filter, manual cascade, dropdown cascade, cluster regression-lock, empty-state CTA). CI green on final `a358a71` (5/5 jobs incl. 70/70 Playwright smoke). **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer.
 
 - **2026-05-21 — `feat_agent_propose_search_space` merged into `main` as PR #175 squash `5d29355`.** **21st MVP1 feature shipped**, 10 stories across 5 epics. New 20th agent tool `propose_search_space` (read-only, NOT in `MUTATING_TOOL_NAMES`) builds a deterministic starter search space from a template's `declared_params` via the same heuristic that powers the create-study wizard's auto-fill — Python port of `ui/src/lib/search-space-defaults.ts` lives at `backend/app/domain/study/search_space_defaults.py` with a TS↔Python parity test driven by a shared JSON fixture (`backend/tests/_fixtures/search_space_defaults_parity.json`, 18 rows). Cap-aware overflow now raises `InvalidSearchSpaceError`/throws on both Python AND TS sides — fixes a latent bug where the TS implementation silently returned an invalid `SearchSpace` when given 8+ fall-through floats (6⁸ > 10⁶ exhausts the cap-aware fallback). Optional `prior_study_id` arg narrows numeric param bounds via `winner ± |winner| × bracket` for sign-symmetric math (a Gemini find — naive `winner * 0.5 / winner * 1.5` inverted bounds for negative winners) with the `bracket` arg actually threaded through both linear paths; log-uniform float keeps fixed √2 factor per spec FR-3. Graceful degrade paths emit WARN logs: `agent.propose_search_space.prior_template_mismatch` (different template_id) + `agent.propose_search_space.missing_winner_trial` (cascade-delete race) + non-numeric winner skip. `ToolContext` gained required `conversation_id: str` field plumbed from `orchestrator.run_turn` so adherence telemetry can correlate offline — paired INFO events `agent.search_space_proposed` + `agent.create_study.invoked` tagged with same conversation_id per spec FR-6 (operator grep recipe in `docs/03_runbooks/agent-debugging.md` §5). New `repo.get_trial(db, trial_id)` parallels `repo.get_study`. System prompt updated: 19→20 tools, "Studies (4)" lists `propose_search_space` FIRST, new chain-guidance bullet under Rule #1 directs LLM to call propose before create. `ProposeSearchSpaceArgs` uses `ConfigDict(extra="forbid")` so hallucinated LLM args fail loudly. Spec converged at GPT-5.5 cycle 3 (19 findings, all accepted); plan converged at cycle 3 (8 findings, all accepted). Post-merge review: Gemini Code Assist 3 findings all accepted + fixed in `642b5b9` (negative-winner narrowing bug × 2 + bracket arg threading); GPT-5.5 final review 6 findings — 1 accepted + fixed in `945e833` (`ConfigDict(extra="forbid")`), 1 deferred (structlog migration — broader codebase decision), 4 rejected with cited counter-evidence (truncated-diff false positives — all claimed-missing content was present in the PR). Adjudication summary posted at https://github.com/SoundMindsAI/relyloop/pull/175#issuecomment-4504624481. Tests: +87 new backend unit cases (44 search_space_defaults + 19 parity + 3 ToolContext + 19 propose_search_space + 7 telemetry + 5 prompt snapshot) + 2 repo.get_trial integration + 1 propose→create integration; UI vitest 38 lib + 19 parity + 66 modal all green; 1000 backend unit tests pass locally. Backend lint + mypy --strict clean. **Alembic head unchanged at `0014_clusters_target_filter`** — feature is purely additive at the application layer (no migration).