diff --git a/docs/00_overview/DASHBOARD.md b/docs/00_overview/DASHBOARD.md index 971a6cb8..15c29466 100644 --- a/docs/00_overview/DASHBOARD.md +++ b/docs/00_overview/DASHBOARD.md @@ -6,7 +6,7 @@ _Top-level index across MVP1 → GA v1+ as of **2026-05-22**. Click a release na | Release | Theme | Progress | Status | |---|---|---|---| -| [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 58 / 59 scoped done · 7 remaining | **In progress** | +| [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 59 / 59 scoped done · 6 remaining | **In progress** | | [MVP2 / v0.2](MVP2_DASHBOARD.md) | Observable | 1 / 1 scoped done · 1 remaining | **In progress** | | MVP3 / v0.3 | Production Stacks | — | **Not yet scoped** | | MVP4 / v0.4 | Multi-tenant, Multi-LLM | — | **Not yet scoped** | diff --git a/docs/00_overview/MVP1_DASHBOARD.md b/docs/00_overview/MVP1_DASHBOARD.md index 0183d0d5..b93b0819 100644 --- a/docs/00_overview/MVP1_DASHBOARD.md +++ b/docs/00_overview/MVP1_DASHBOARD.md @@ -6,34 +6,28 @@ _Reflects feature-folder state as of **2026-05-22** (latest mtime of any planned ## Next up -**[chore_e2e_test_rows_isolation](../02_product/planned_features/chore_e2e_test_rows_isolation/feature_spec.md)** — Chore, currently in **Plan** +All scoped MVP1 features shipped 🎉 -> Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the registry in FK-safe order at the end of the - -Plan approved; run /impl-execute to ship - -```bash -/impl-execute docs/02_product/planned_features/chore_e2e_test_rows_isolation/implementation_plan.md --all -``` +Pull from the Idea backlog or capture a new feature spec. ## MVP1 Progress | Metric | Value | |---|---| -| Scoped items done | **58 / 59** (98%) — feat_/infra_/chore_/epic_ past idea stage | -| Pending work | **14** items (every not-done feat/infra/chore/bug across all priorities) | -| → P0 — do next | **1** unblocking / paying daily cost | +| Scoped items done | **59 / 59** (100%) — feat_/infra_/chore_/epic_ past idea stage | +| Pending work | **13** items (every not-done feat/infra/chore/bug across all priorities) | +| → P0 — do next | **0** unblocking / paying daily cost | | → P1 | **6** high-value, ready when P0 clears | | → P2 (default) | 6 important to file, not blocking | | → Backlog | 1 captured for record, not planned | | Open bugs | 0 | -| Legacy "Path to MVP1" | 7 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) | +| Legacy "Path to MVP1" | 6 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) | | Backlog ideas | 7 idea-only feat/infra (not yet scoped into MVP1) | | In flight | 0 feature(s) actively shipping | ## Pipeline -### Done (70) +### Done (71) | Feature | Type | One-liner | Depends on | Status | |---|---|---|---|---| @@ -78,6 +72,7 @@ Plan approved; run /impl-execute to ship | [chore_data_table_columnvisibility_tanstack](implemented_features/2026_05_19_chore_data_table_columnvisibility_tanstack/idea.md) | Chore | Complete | — | Complete | | [chore_detail_page_shell_primitive](implemented_features/2026_05_19_chore_detail_page_shell_primitive/idea.md) | Chore | Complete | — | Complete | | [chore_digest_worker_narrow_except](implemented_features/2026_05_14_chore_digest_worker_narrow_except/idea.md) | Chore | Complete | — | Complete | +| [chore_e2e_test_rows_isolation](implemented_features/2026_05_21_chore_e2e_test_rows_isolation/feature_spec.md) | Chore | Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the reg | — | [PR #186](https://github.com/SoundMindsAI/relyloop/pull/186) merged 2026-05-21 | | [chore_env_guard_extend_deny_pattern](implemented_features/2026_05_13_chore_env_guard_extend_deny_pattern/idea.md) | Chore | Complete | — | Complete | | [chore_extract_shadcn_select_test_mock](implemented_features/2026_05_19_chore_extract_shadcn_select_test_mock/idea.md) | Chore | Complete | — | Complete | | [chore_form_dropdown_guide_screenshot_refresh](implemented_features/2026_05_19_chore_form_dropdown_guide_screenshot_refresh/idea.md) | Chore | Complete | — | Complete | @@ -112,11 +107,9 @@ Plan approved; run /impl-execute to ship _None._ -### Plan (1) +### Plan (0) -| Priority | Feature | Type | One-liner | Depends on | Status | -|---|---|---|---|---|---| -| P0 | [chore_e2e_test_rows_isolation](../02_product/planned_features/chore_e2e_test_rows_isolation/feature_spec.md) | Chore | Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the reg | — | [PR #182](https://github.com/SoundMindsAI/relyloop/pull/182) | +_None._ ### Spec (0) @@ -151,8 +144,6 @@ graph LR classDef plan fill:#fef9c3,stroke:#854d0e,color:#854d0e; classDef spec fill:#dbeafe,stroke:#1e40af,color:#1e40af; classDef idea fill:#f1f5f9,stroke:#334155,color:#334155; - chore_e2e_test_rows_isolation["e2e test rows isolation"] - class chore_e2e_test_rows_isolation plan; infra_foundation["foundation"] class infra_foundation done; feat_study_lifecycle["study lifecycle"] @@ -255,6 +246,8 @@ graph LR class feat_create_study_search_space_builder done; feat_create_study_target_autocomplete["create study target autocomplete"] class feat_create_study_target_autocomplete done; + chore_e2e_test_rows_isolation["e2e test rows isolation"] + class chore_e2e_test_rows_isolation done; chore_guide_01_screenshot_refresh_target_filter["guide 01 screenshot refresh target filter"] class chore_guide_01_screenshot_refresh_target_filter done; chore_guide_06_screenshot_refresh_target_picker["guide 06 screenshot refresh target picker"] diff --git a/docs/00_overview/dashboard.html b/docs/00_overview/dashboard.html index a6af878f..858eb316 100644 --- a/docs/00_overview/dashboard.html +++ b/docs/00_overview/dashboard.html @@ -384,7 +384,7 @@

Releases

MVP1 / v0.1
The Loop
-
58 / 59 scoped done · 7 remaining
+
59 / 59 scoped done · 6 remaining
In progress
diff --git a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/feature_spec.md b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/feature_spec.md similarity index 100% rename from docs/02_product/planned_features/chore_e2e_test_rows_isolation/feature_spec.md rename to docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/feature_spec.md diff --git a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/idea.md b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/idea.md similarity index 100% rename from docs/02_product/planned_features/chore_e2e_test_rows_isolation/idea.md rename to docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/idea.md diff --git a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/implementation_plan.md b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/implementation_plan.md similarity index 99% rename from docs/02_product/planned_features/chore_e2e_test_rows_isolation/implementation_plan.md rename to docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/implementation_plan.md index 3c3ac273..7024ec44 100644 --- a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/implementation_plan.md +++ b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/implementation_plan.md @@ -1,7 +1,7 @@ # Implementation Plan — chore_e2e_test_rows_isolation **Date:** 2026-05-21 -**Status:** Draft +**Status:** Complete (PR #186 squash `a444b94`, merged 2026-05-21) **Primary spec:** [feature_spec.md](feature_spec.md) **Policy source(s):** [api-conventions.md](../../../01_architecture/api-conventions.md), [CLAUDE.md](../../../../CLAUDE.md), spec §19 Decision log diff --git a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/pipeline_status.md b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/pipeline_status.md similarity index 61% rename from docs/02_product/planned_features/chore_e2e_test_rows_isolation/pipeline_status.md rename to docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/pipeline_status.md index 069fd4ec..52bf2e13 100644 --- a/docs/02_product/planned_features/chore_e2e_test_rows_isolation/pipeline_status.md +++ b/docs/00_overview/implemented_features/2026_05_21_chore_e2e_test_rows_isolation/pipeline_status.md @@ -25,4 +25,16 @@ - Critical cycle-3 findings: parse failures didn't count toward `failed` invariant (now do); stdout log misstated `entries.length` vs distinct-resource count. ## Implementation -- Status: Not started +- Status: Complete +- Date: 2026-05-21 +- PR: #186 (squash `a444b94`, merged into `main` 2026-05-21) +- Branch: `chore/e2e-test-rows-isolation` (deleted post-merge) +- Stories shipped: 2 of 2 (1.1 backend 6 DELETE endpoints + 20 integration cases + 6 env-guard contract + 11 strictly-new error-code source-presence + 7 OpenAPI tuples; 1.2 frontend per-worker JSONL registry + globalSetup/Teardown + cleanup-reporter + 29 vitest cases) +- CI: green on final HEAD (5/5 jobs incl. smoke 70/70 Playwright) +- Reviews: Gemini Code Assist 3 Medium findings (all rejected with SQLAlchemy AsyncSession-concurrency counter-evidence at `backend/app/api/v1/_test.py:269/353/415`); GPT-5.5 final review 1 High finding (rejected — truncated-diff false positive at `backend/app/db/repo/__init__.py:38–42`). +- Post-merge fix: one follow-up commit on the same branch added `testMatch: ['**/*.spec.ts']` to `ui/playwright.config.ts` after the smoke job tried to load vitest `.test.ts` files as Playwright specs. +- Tangential capture: `chore_e2e_seed_acme_helper_dead/idea.md` — `seedAcmeProductsChain` is dead code (Backlog). + +## Done +- Status: Merged +- Date: 2026-05-21 diff --git a/docs/00_overview/mvp1_dashboard.html b/docs/00_overview/mvp1_dashboard.html index 5b48564f..f5a83b3e 100644 --- a/docs/00_overview/mvp1_dashboard.html +++ b/docs/00_overview/mvp1_dashboard.html @@ -382,12 +382,12 @@

RelyLoop MVP1 Dashboard

-
-
Next up — Chore, currently in Plan
- -
Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the registry in FK-safe order at the end of the
-
Plan approved; run /impl-execute to ship
- /impl-execute docs/02_product/planned_features/chore_e2e_test_rows_isolation/implementation_plan.md --all +
+
Next up
+
All scoped MVP1 features shipped 🎉
+
+ Pull from the Idea backlog or capture a new feature spec. +
@@ -395,15 +395,15 @@

RelyLoop MVP1 Dashboard

MVP1 Progress

-
+
Scoped items done
-
58 / 59
-
98% of feat_/infra_/chore_/epic_ items past idea stage
-
+
59 / 59
+
100% of feat_/infra_/chore_/epic_ items past idea stage
+
Pending work
-
14
+
13
every not-done feat/infra/chore/bug across all priorities
@@ -411,9 +411,9 @@

MVP1 Progress

0
tracked bug_* idea files
-
+
P0 — do next
-
1
+
0
unblocking / paying daily cost
@@ -435,7 +435,7 @@

MVP1 Progress

Legacy "Path to MVP1"
-
7
+
6
scoped not-done + bugs + chore-ideas only (excludes feat/infra ideas)
@@ -641,19 +641,7 @@

Spec 0

-

Plan 1

- -
- -
- Chore - P0 - PR #182 -
-
Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the reg
- - -
+

Plan 0

@@ -663,7 +651,7 @@

Implementing 0

-

Done 70

+

Done 71

@@ -1198,6 +1186,19 @@

Done 70

+
+ +
+ Chore + + PR #186merged 2026-05-21 +
+
Every Playwright spec that creates rows registers them against a file-based cleanup registry (per-worker JSONL files); a `globalTeardown` hook in `playwright.config.ts` reads + merges + drains the reg
+ + +
+ +
@@ -1587,8 +1588,6 @@

Dependency graph (feat_ + infra_)

classDef plan fill:#fef9c3,stroke:#854d0e,color:#854d0e; classDef spec fill:#dbeafe,stroke:#1e40af,color:#1e40af; classDef idea fill:#f1f5f9,stroke:#334155,color:#334155; - chore_e2e_test_rows_isolation["e2e test rows isolation"] - class chore_e2e_test_rows_isolation plan; infra_foundation["foundation"] class infra_foundation done; feat_study_lifecycle["study lifecycle"] @@ -1691,6 +1690,8 @@

Dependency graph (feat_ + infra_)

class feat_create_study_search_space_builder done; feat_create_study_target_autocomplete["create study target autocomplete"] class feat_create_study_target_autocomplete done; + chore_e2e_test_rows_isolation["e2e test rows isolation"] + class chore_e2e_test_rows_isolation done; chore_guide_01_screenshot_refresh_target_filter["guide 01 screenshot refresh target filter"] class chore_guide_01_screenshot_refresh_target_filter done; chore_guide_06_screenshot_refresh_target_picker["guide 06 screenshot refresh target picker"] @@ -1798,8 +1799,6 @@

Dependency graph (feat_ + infra_)

classDef plan fill:#fef9c3,stroke:#854d0e,color:#854d0e; classDef spec fill:#dbeafe,stroke:#1e40af,color:#1e40af; classDef idea fill:#f1f5f9,stroke:#334155,color:#334155; - chore_e2e_test_rows_isolation["e2e test rows isolation"] - class chore_e2e_test_rows_isolation plan; infra_foundation["foundation"] class infra_foundation done; feat_study_lifecycle["study lifecycle"] @@ -1902,6 +1901,8 @@

Dependency graph (feat_ + infra_)

class feat_create_study_search_space_builder done; feat_create_study_target_autocomplete["create study target autocomplete"] class feat_create_study_target_autocomplete done; + chore_e2e_test_rows_isolation["e2e test rows isolation"] + class chore_e2e_test_rows_isolation done; chore_guide_01_screenshot_refresh_target_filter["guide 01 screenshot refresh target filter"] class chore_guide_01_screenshot_refresh_target_filter done; chore_guide_06_screenshot_refresh_target_picker["guide 06 screenshot refresh target picker"] diff --git a/state.md b/state.md index 97749b7f..40be439b 100644 --- a/state.md +++ b/state.md @@ -2,14 +2,14 @@ > Read this first. Snapshots the active branch, what just shipped, what's in flight, what's queued, and where the project currently sits in the MVP1 → GA roadmap. Updated whenever a feature lands or a priority shifts. -**Last updated:** 2026-05-21 (after `feat_study_target_judgment_mismatch_guard` merged into `main` as PR #184 squash `ce3fcf4` — **23rd MVP1 feature shipped**, 3 stories across 1 epic. Closes the literal study2 incident: `POST /api/v1/studies` now rejects two mismatch classes at create time with specific 422 codes — `JUDGMENT_CLUSTER_MISMATCH` (judgment list and study point at different physical clusters; doc IDs are cluster-scoped so same target name on two clusters still produces zero overlap) and `JUDGMENT_TARGET_MISMATCH` (same cluster but different target index/collection). Cluster fires before target. Both checks fire AFTER FK resolution + the existing `query_set_id` `VALIDATION_ERROR` check. New `?target=` wire filter on `GET /api/v1/judgment-lists` (min_length=1, max_length=255) + `target: str` required field on `JudgmentListSummary` (additive; OpenAPI snapshot + ui/src/lib/types.ts regenerated). Frontend create-study modal Step-2 dropdown now passes `{ query_set_id, cluster_id, target, limit: 200 }` to `useJudgmentLists`; manual-mode `` uses hoisted `targetReg.onChange(e)` (RHF register preserved) then cascade-resets `judgment_list_id`; dropdown-mode target picker mirrors the same reset; new empty-state copy substitutes the target value + CTA href="/judgments". Drive-by fix bundled: E2E seed helpers (`seedJudgmentList`, `seedFullChain`, `seedStudy`) gain optional `target` overrides; 3 specs updated to align target values so the new FR-1 validator doesn't reject chained POSTs. Cross-model review: spec 3 cycles (17 findings, all accepted, 1 rejected with cited counter-evidence at create-study-modal.tsx:508), plan 3 cycles (16 findings, all accepted, 1 rejected); Gemini Code Assist 2 findings (1 accepted in `035af0a` — IIFE → hoisted register; 1 rejected with precedent counter-evidence at `test_judgments_api_contract.py:215-234`); final GPT-5.5 10 findings (2 accepted in `a358a71` — over-bound 422 test; 8 rejected — 5 truncation false positives + 3 plan/precedent rejects). Tests: 1040 backend unit (unchanged — inline conditionals), backend integration +7 cases (target/cluster mismatch + ordering + AND-semantics + summary shape + over-bound + GET-pre-existing-200), backend contract +2 cases (firing-order lock in `test_studies_api_contract.py` + summary `target` shape lock in `test_judgments_api_contract.py`), UI vitest 567 → 572 (+5: hook wire-filter, dropdown cascade, manual cascade, cluster regression-lock, empty-state CTA). CI green on `a358a71` (5/5 jobs incl. 70/70 Playwright). Alembic head unchanged at `0015_trials_per_query_metrics` — feature is purely additive at the application layer. Prior — after `feat_pr_metric_confidence` merged into `main` as PR #180 squash `d0a8358` — **22nd MVP1 feature shipped**, 9 stories across 2 epics. Backend persistence (migration `0015_trials_per_query_metrics` adds nullable JSONB column behind CHECK), analytics (`backend/app/domain/study/confidence.py` — pure-Python orchestrator + bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome helpers under FR-7 graceful-degradation), and three consumer surfaces — `StudyDetail.confidence` API enrichment, `## Confidence` PR body section, and digest narrative `` + `` Jinja blocks. Frontend ships `` on `/studies/[id]` (between StudyHeader and trials Card) + 6 glossary entries (text lifted verbatim from spec §11 tooltip table) + 2 real-backend Playwright E2E cases. Cross-model review: GPT-5.5 cycle 1 (Epic 1 gate) returned 12 findings — 5 rejected with cited counter-evidence (truncated-diff false positives), 2 deferred, 5 accepted + fixed inline; Gemini Code Assist clean pass; final GPT-5.5 review 3 Low findings all accepted + fixed inline. Tests: 1039 backend unit (+5 digest + 29 confidence + 13 studies confidence + extras), 189 contract (+2 OpenAPI shape lock + 4 PR-body section + 1 endpoint guard for the extended _test seed endpoint), 527 in-container integration (+13 StudyDetail.confidence + 5 migration round-trip + 1 open_pr plumbing + 2 Story 1.2 worker), 567 UI vitest (+14 ConfidencePanel — 13 layout + 1 tooltip-trigger inventory), 10/10 Playwright E2E (+2 ConfidencePanel real-backend). Three follow-ups filed: `chore_guides_glossary_route` (render `glossary.ts` as a `/guide/glossary` route), `chore_guides_faq` (curated operator-judgment Q&A), `chore_guide_06_screenshot_refresh_confidence_panel` (regenerate guide-06 screenshots). Alembic head moves to `0015_trials_per_query_metrics`. Prior — after `feat_pr_metric_confidence` Epic 1 landed locally on the `feat_pr_metric_confidence` branch — backend persistence + analytics + PR-body + digest-prompt surfaces complete, Epic 2 frontend ConfidencePanel ahead. Migration `0015_trials_per_query_metrics` adds the nullable JSONB column behind a CHECK constraint; new pure-Python `backend/app/domain/study/confidence.py` owns bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome classification under FR-7's graceful-degradation contract; new `backend/app/services/study_confidence.py` glues the 4-query read pattern onto the orchestrator and is consumed from `studies._detail()`, the `open_pr` worker, and the digest worker. GPT-5.5 cycle-1 review found 12 issues — 5 rejected as truncated-diff false positives, 2 deferred (plan/code interface drift; full-worker integration test deferred to feat_github_pr_worker's existing suite), 5 accepted + fixed inline (convergence `total_trials = max_trial_number + 1` instead of count; convergence KeyError guard when winner not in summary; pre-existing-row-stays-NULL migration test; Trial model docstring drift on metric key shape; state + architecture docs). 1039 backend unit tests pass (+5 digest prompt cases, +1 convergence assertion), 189 contract, 527/527 in-container integration. Prior — after `feat_agent_propose_search_space` shipped as PR #175 squash `5d29355`). **21st MVP1 feature merged** — 10 stories across 5 epics, all complete. New read-only agent tool `propose_search_space` (the 20th in the registry) builds a deterministic starter search space from a template's `declared_params` using the same heuristic that powers the create-study wizard's auto-fill — a Python port (`backend/app/domain/study/search_space_defaults.py`) of `ui/src/lib/search-space-defaults.ts` with a TS↔Python parity test driven by a shared JSON fixture (18 rows, byte-identical assertions on both sides). Cap-aware overflow guard added on both Python AND TS sides (fixes a latent bug where TS silently returned an invalid space when 8+ fall-through floats blew past 10⁶). Optional `prior_study_id` arg narrows numeric bounds via `winner ± |winner| × bracket` for sign-symmetric math (Gemini #1/#2 fix) with `bracket` threaded through the linear paths (Gemini #3 fix); log-uniform stays at √2. Graceful degrade on template mismatch + missing trial row + non-numeric winner — emits WARN logs (`agent.propose_search_space.prior_template_mismatch` / `.missing_winner_trial`). `ToolContext` gained `conversation_id: str` plumbed from `orchestrator.run_turn` for paired adherence telemetry — INFO events `agent.search_space_proposed` (propose-side) + `agent.create_study.invoked` (create-side) correlate offline by conversation_id per spec FR-6 (grep recipe in `docs/03_runbooks/agent-debugging.md` §5). New `repo.get_trial(db, trial_id)` parallels `repo.get_study`. System prompt updated: 19→20 tools, "Studies (4)" with `propose_search_space` first, new chain-guidance bullet. `ProposeSearchSpaceArgs` uses `ConfigDict(extra="forbid")` (GPT-5.5 F6 fix) so hallucinated LLM args fail Pydantic validation loudly. Spec converged at GPT-5.5 cycle 3 (19 findings, all accepted); plan converged at cycle 3 (8 findings, all accepted). Post-merge review: Gemini 3 findings all fixed in `642b5b9`; GPT-5.5 final review 6 findings — 1 fixed in `945e833`, 1 deferred (structlog migration), 4 rejected with cited counter-evidence (truncated-diff false positives). Tests: 1000 backend unit pass (+87 new cases) + 19 Python parity + 19 TS parity; 38 TS lib + 66 modal still green. Alembic head unchanged at `0014_clusters_target_filter` — feature is purely additive at the application layer. Earlier 2026-05-20 (after `feat_cluster_target_filter` shipped as PR #168 squash `57d3ba0` + follow-up `chore_seed_meaningful_demos` shipped as PR #169 squash `c44d774`). **20th MVP1 feature merged** + demo-state durability gap closed in the same session. PR #168: 5 stories (B1 migration 0014 + ORM column; B3 Pydantic + service plumb-through + responses; B2 adapter Protocol + ElasticAdapter + StubAdapter + router; F1 register modal Target filter input; F2 create-study modal filter-aware empty-state + EntitySelect accessibility improvement). Plus 4 post-impl fix commits (test_migrations head bump, register modal overflow-y-auto, EntitySelect sr-only Gemini fix, spec drift cleanup + OpenAPI shape-lock contract test from GPT-5.5 final review). PR #169: `scripts/seed_meaningful_demos.py` + `make seed-demo` target (idempotent: TRUNCATE clusters CASCADE + DELETE matching ES/OS indices + reseed with per-cluster `target_filter` values baked in — closes the gap where integration tests kept wiping the dev DB with no durable reseed mechanism). 529/529 vitest across 79 files (was 525/78), 903 backend unit tests (was 899), 50 cluster-API integration tests (was 45) + 3 new migration round-trip tests + 7 contract validator cases + OpenAPI shape-lock test. **Alembic head moved to `0014_clusters_target_filter`.** Cross-model review pre-impl: spec + plan both converged at GPT-5.5 cycle 2 (12 findings total, all accepted). Post-impl: Gemini Code Assist 3 findings (2 accepted: EntitySelect sr-only on #168, http() auth type hint on #169; 1 rejected with cited counter-evidence: out-of-scope test file from #168). GPT-5.5 final review on #168: 2 findings, both accepted (spec drift + OpenAPI shape-lock). **Process feedback captured:** `.claude/projects/.../memory/feedback_one_branch_per_session.md` — should have bundled the seed chore into PR #168 rather than spinning a sibling PR. End-to-end smoke verified live before both merges. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Bundled the `get_schema` + `explain` connect-error fix per `bug_get_schema_unhandled_connect_error` in the same PR. 525/525 vitest across 78 files, 33 adapter unit tests + contract suite + integration tests all green twice (initial + post-cycle-2). Gemini Code Assist: 1 finding rejected with cited counter-evidence (pre-existing list-shape assumption matches the wire contract). GPT-5.5 final review: 2 findings — 1 accepted in `19d9d51` (contract-layer TARGETS_FORBIDDEN + CLUSTER_UNREACHABLE envelope assertions), 1 deferred with counter-evidence (dropdown E2E `test.skip`'d; AC coverage satisfied by 8 hook unit + 6 modal unit + integration + contract tests). Two follow-up ideas filed in-PR: `bug_e2e_target_dropdown_flake` + `chore_guide_06_screenshot_refresh_target_picker`.) Earlier — same day (after `feat_create_study_search_space_builder` shipped as PR #163 squash commit `c703953`, bundling the search-space builder feature + the `bug_judgment_lists_listing_ignores_query_set_filter` backend fix surfaced during local verification. 18th MVP1 feature. The builder + bug-fix bundle reflects the single-developer series workflow: rather than spin a sibling backend PR off `main`, the bug fix landed in the same branch since the dev was already in verification mode. PR #163 went through 3 spec cycles (16 findings) + 3 plan cycles (27 findings) + 3 Gemini Code Assist findings + 2 GPT-5.5 final-review passes (1 second-pass Low finding accepted on test coverage) = 47 review findings all accepted with cited fixes. 512 vitest assertions across 77 files, 4 real-backend Playwright e2e cases against the builder, 2 new backend tests for the bundled filter fix. Two follow-up idea files captured during local verification: `feat_create_study_target_autocomplete` (Step-1 free-text target field has no autocomplete from cluster indexes — pre-existing UX debt deferred) and the now-closed `bug_judgment_lists_listing_ignores_query_set_filter` (bundled into this PR).) Earlier (also 2026-05-20) — PR #161 `0879df2` `chore_create_study_modal_e2e_stability` (un-skipped the deferred Playwright spec via `dispatchEvent('click')` on the Radix trigger), PR #160 `160ff6b` `bug_err_metric_frontend_backend_drift` (wire-enum trim — `err` removed from frontend + backend Literal), PR #159 `52e106d` `bug_tutorial_template_param_boost_naming` (heuristic extension for `_boost` suffix). Earlier (also 2026-05-20) — PR #157 `chore_create_study_wizard_polish` — squash commit `075c46b` — merged into `main`. Ships the 4-surface chore: backend template-mismatch validation at create time (two new error codes `SEARCH_SPACE_UNKNOWN_PARAM` + `SEARCH_SPACE_MISSING_DECLARED_PARAM`), Step-4 auto-fill via the new `ui/src/lib/search-space-defaults.ts` heuristic + cap-aware fallback + TS↔Python cardinality parity fixture, 4 new `study.search_space.*` glossary entries (one dual + three short-only) and 6 extended per-metric entries with k-tier clauses, Step-5 tri-state metric+k rendering with new `K_IGNORED` predicate, plus client-side validation mirror + zero-declared block + 404/transient template-fetch recovery + `__placeholder__` warning. 16 new test files + 2 modified + 1 shared JSON fixture across backend unit/integration/contract + frontend unit/component + 1 skipped E2E. Three follow-up ideas captured: `bug_tutorial_template_param_boost_naming` (tutorial template uses `_boost` suffix not matched by the locked heuristic), `chore_create_study_modal_e2e_stability` (re-enable the skipped Playwright spec once EntitySelect disabled gating stabilizes), `bug_err_metric_frontend_backend_drift` (`err` selectable in wizard but unsupported by `scoring.py`). Gemini Code Assist + GPT-5.5 final-pass both adjudicated on the PR — 2 Gemini findings + 7 GPT-5.5 findings, all addressed or filed.) Earlier 2026-05-19 (after a 4-PR shipping run drained the actionable post-MVP1 chore backlog: PR #152 `chore_ci_prettier_check` (`476db78`) + PR #153 `chore_extract_shadcn_select_test_mock` (`199e225`) + PR #154 `chore_form_dropdown_guide_screenshot_refresh` (`ed4121f`) + PR #155 `chore_detail_page_shell_primitive` (`9a72514`). PR #155 is the third primitive after `` and `` — 6 detail-page migrations + new lint guard + flattens a latent UX bug where only `proposals/[id]` discriminated 404 from network error. Earlier the same session: PR #150 (`chore_data_table_columnvisibility_tanstack`, `c1e4545`) — closes the residual DataTable follow-ups: item 5 migrates the primitive from `columns.filter(...)` to TanStack's `state.columnVisibility` API (memoized per Gemini feedback), item 3 locked the flat-prop `DataTableProps` API as canonical with a "Shipped contract addendum" on the historical implementation plan's Story 2.6. Folder renamed `chore_data_table_primitive_followups` → `chore_data_table_columnvisibility_tanstack`. Earlier 2026-05-19 PR #148 (`infra_e2e_wire_seed_helper_into_studies_spec`, squash `65f4150`) — restored the 2 digest-panel E2E tests deferred from PR #130, diagnosed and fixed the real root cause of the original smoke-lane failure (`GET /api/v1/proposals` was silently ignoring the `?study_id=` filter, returning the most-recent global pending proposal), added 5-case integration regression coverage at `backend/tests/integration/test_proposals_study_filter.py`. Plus: (a) earlier 2026-05-18 PR #146 (`bug_install_skip_ui_rebuild`, squash `7299fca`) made `make up` rebuild every Compose service (`docker compose build` no-args), switched `make down` to `docker compose down`, and added a `verify_install_builds_all_services.sh` CI gate to lock the contract; (b) earlier 2026-05-18 PR #147 captured `chore_detail_page_shell_primitive` idea (squash `8854e47`). Two new follow-ups filed: `chore_ci_prettier_check` (CI's frontend job has no `prettier --check` step — surfaced when PR #136 drift in 2 unrelated files blocked an unrelated commit) and the in-flight `chore_detail_page_shell_primitive` (third primitive after DataTable + EntitySelect).) +**Last updated:** 2026-05-21 (after `chore_e2e_test_rows_isolation` merged into `main` as PR #186 squash `a444b94` — **24th MVP1 feature shipped**, 2 stories across 1 epic. Closes the operator-visible-dev-DB pollution: every Playwright E2E run now drains its seeded rows after the suite via a per-worker JSONL cleanup registry, 6 new test-only `DELETE /api/v1/_test/*` endpoints gated by `_require_development_env`, FK-safe drain order (proposals → digests → studies → judgment_lists → query_sets → query_templates → clusters), and a new `cleanup-reporter.ts` Playwright Reporter that asserts `registered_deduped == attempted == deleted + failed + skipped_404 AND failed == 0` after every run. 11 strictly-new error codes (3 `_NOT_FOUND` + 8 `_HAS_DEPENDENT_*`) documented in [`docs/01_architecture/api-conventions.md`](docs/01_architecture/api-conventions.md). Pure `cleanup-core.ts` module extracted from `global-teardown.ts` so the dedupe/order/URL-build logic is unit-testable without fs/network mocks. Cross-model review: GPT-5.5 — spec 3 cycles (26 findings, 25 accepted + 1 deferred to PLAYWRIGHT_CLEANUP_STRICT=1 v2), plan 3 cycles (20 findings, all accepted); Gemini Code Assist 3 Medium findings (all rejected with SQLAlchemy AsyncSession-concurrency counter-evidence — `asyncio.gather` on the same session is forbidden); final GPT-5.5 1 High finding (rejected — truncated-diff false positive on `repo/__init__.py:38–42` import block; verified empirically `from backend.app.db.repo import hard_delete_*` works for all 6). Post-merge CI fix on the same branch: `testMatch: ['**/*.spec.ts']` added to `ui/playwright.config.ts` after the smoke job tried to load vitest `.test.ts` files as Playwright specs. Tests: 1040 backend unit (unchanged); backend integration +20 cases (6 happy + 6 parameterized 404 + 8 409 — covers all 11 strictly-new + 3 reused codes); backend contract +6 env-guard cases + 2 source-presence cases + 6 OpenAPI tuples; UI vitest **630** (was 601 — +29: 19 cleanup-core + 10 global-teardown). CI green on `01acc04` (5/5 jobs incl. smoke 70/70 Playwright). **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. Tangential capture: `chore_e2e_seed_acme_helper_dead/idea.md` (Backlog) — `seedAcmeProductsChain` has no spec caller. Earlier — after `feat_study_target_judgment_mismatch_guard` merged into `main` as PR #184 squash `ce3fcf4` — **23rd MVP1 feature shipped**, 3 stories across 1 epic. Closes the literal study2 incident: `POST /api/v1/studies` now rejects two mismatch classes at create time with specific 422 codes — `JUDGMENT_CLUSTER_MISMATCH` (judgment list and study point at different physical clusters; doc IDs are cluster-scoped so same target name on two clusters still produces zero overlap) and `JUDGMENT_TARGET_MISMATCH` (same cluster but different target index/collection). Cluster fires before target. Both checks fire AFTER FK resolution + the existing `query_set_id` `VALIDATION_ERROR` check. New `?target=` wire filter on `GET /api/v1/judgment-lists` (min_length=1, max_length=255) + `target: str` required field on `JudgmentListSummary` (additive; OpenAPI snapshot + ui/src/lib/types.ts regenerated). Frontend create-study modal Step-2 dropdown now passes `{ query_set_id, cluster_id, target, limit: 200 }` to `useJudgmentLists`; manual-mode `` uses hoisted `targetReg.onChange(e)` (RHF register preserved) then cascade-resets `judgment_list_id`; dropdown-mode target picker mirrors the same reset; new empty-state copy substitutes the target value + CTA href="/judgments". Drive-by fix bundled: E2E seed helpers (`seedJudgmentList`, `seedFullChain`, `seedStudy`) gain optional `target` overrides; 3 specs updated to align target values so the new FR-1 validator doesn't reject chained POSTs. Cross-model review: spec 3 cycles (17 findings, all accepted, 1 rejected with cited counter-evidence at create-study-modal.tsx:508), plan 3 cycles (16 findings, all accepted, 1 rejected); Gemini Code Assist 2 findings (1 accepted in `035af0a` — IIFE → hoisted register; 1 rejected with precedent counter-evidence at `test_judgments_api_contract.py:215-234`); final GPT-5.5 10 findings (2 accepted in `a358a71` — over-bound 422 test; 8 rejected — 5 truncation false positives + 3 plan/precedent rejects). Tests: 1040 backend unit (unchanged — inline conditionals), backend integration +7 cases (target/cluster mismatch + ordering + AND-semantics + summary shape + over-bound + GET-pre-existing-200), backend contract +2 cases (firing-order lock in `test_studies_api_contract.py` + summary `target` shape lock in `test_judgments_api_contract.py`), UI vitest 567 → 572 (+5: hook wire-filter, dropdown cascade, manual cascade, cluster regression-lock, empty-state CTA). CI green on `a358a71` (5/5 jobs incl. 70/70 Playwright). Alembic head unchanged at `0015_trials_per_query_metrics` — feature is purely additive at the application layer. Prior — after `feat_pr_metric_confidence` merged into `main` as PR #180 squash `d0a8358` — **22nd MVP1 feature shipped**, 9 stories across 2 epics. Backend persistence (migration `0015_trials_per_query_metrics` adds nullable JSONB column behind CHECK), analytics (`backend/app/domain/study/confidence.py` — pure-Python orchestrator + bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome helpers under FR-7 graceful-degradation), and three consumer surfaces — `StudyDetail.confidence` API enrichment, `## Confidence` PR body section, and digest narrative `` + `` Jinja blocks. Frontend ships `` on `/studies/[id]` (between StudyHeader and trials Card) + 6 glossary entries (text lifted verbatim from spec §11 tooltip table) + 2 real-backend Playwright E2E cases. Cross-model review: GPT-5.5 cycle 1 (Epic 1 gate) returned 12 findings — 5 rejected with cited counter-evidence (truncated-diff false positives), 2 deferred, 5 accepted + fixed inline; Gemini Code Assist clean pass; final GPT-5.5 review 3 Low findings all accepted + fixed inline. Tests: 1039 backend unit (+5 digest + 29 confidence + 13 studies confidence + extras), 189 contract (+2 OpenAPI shape lock + 4 PR-body section + 1 endpoint guard for the extended _test seed endpoint), 527 in-container integration (+13 StudyDetail.confidence + 5 migration round-trip + 1 open_pr plumbing + 2 Story 1.2 worker), 567 UI vitest (+14 ConfidencePanel — 13 layout + 1 tooltip-trigger inventory), 10/10 Playwright E2E (+2 ConfidencePanel real-backend). Three follow-ups filed: `chore_guides_glossary_route` (render `glossary.ts` as a `/guide/glossary` route), `chore_guides_faq` (curated operator-judgment Q&A), `chore_guide_06_screenshot_refresh_confidence_panel` (regenerate guide-06 screenshots). Alembic head moves to `0015_trials_per_query_metrics`. Prior — after `feat_pr_metric_confidence` Epic 1 landed locally on the `feat_pr_metric_confidence` branch — backend persistence + analytics + PR-body + digest-prompt surfaces complete, Epic 2 frontend ConfidencePanel ahead. Migration `0015_trials_per_query_metrics` adds the nullable JSONB column behind a CHECK constraint; new pure-Python `backend/app/domain/study/confidence.py` owns bootstrap CI + runner-up gap + late-trial noise floor + convergence regime + per-query outcome classification under FR-7's graceful-degradation contract; new `backend/app/services/study_confidence.py` glues the 4-query read pattern onto the orchestrator and is consumed from `studies._detail()`, the `open_pr` worker, and the digest worker. GPT-5.5 cycle-1 review found 12 issues — 5 rejected as truncated-diff false positives, 2 deferred (plan/code interface drift; full-worker integration test deferred to feat_github_pr_worker's existing suite), 5 accepted + fixed inline (convergence `total_trials = max_trial_number + 1` instead of count; convergence KeyError guard when winner not in summary; pre-existing-row-stays-NULL migration test; Trial model docstring drift on metric key shape; state + architecture docs). 1039 backend unit tests pass (+5 digest prompt cases, +1 convergence assertion), 189 contract, 527/527 in-container integration. Prior — after `feat_agent_propose_search_space` shipped as PR #175 squash `5d29355`). **21st MVP1 feature merged** — 10 stories across 5 epics, all complete. New read-only agent tool `propose_search_space` (the 20th in the registry) builds a deterministic starter search space from a template's `declared_params` using the same heuristic that powers the create-study wizard's auto-fill — a Python port (`backend/app/domain/study/search_space_defaults.py`) of `ui/src/lib/search-space-defaults.ts` with a TS↔Python parity test driven by a shared JSON fixture (18 rows, byte-identical assertions on both sides). Cap-aware overflow guard added on both Python AND TS sides (fixes a latent bug where TS silently returned an invalid space when 8+ fall-through floats blew past 10⁶). Optional `prior_study_id` arg narrows numeric bounds via `winner ± |winner| × bracket` for sign-symmetric math (Gemini #1/#2 fix) with `bracket` threaded through the linear paths (Gemini #3 fix); log-uniform stays at √2. Graceful degrade on template mismatch + missing trial row + non-numeric winner — emits WARN logs (`agent.propose_search_space.prior_template_mismatch` / `.missing_winner_trial`). `ToolContext` gained `conversation_id: str` plumbed from `orchestrator.run_turn` for paired adherence telemetry — INFO events `agent.search_space_proposed` (propose-side) + `agent.create_study.invoked` (create-side) correlate offline by conversation_id per spec FR-6 (grep recipe in `docs/03_runbooks/agent-debugging.md` §5). New `repo.get_trial(db, trial_id)` parallels `repo.get_study`. System prompt updated: 19→20 tools, "Studies (4)" with `propose_search_space` first, new chain-guidance bullet. `ProposeSearchSpaceArgs` uses `ConfigDict(extra="forbid")` (GPT-5.5 F6 fix) so hallucinated LLM args fail Pydantic validation loudly. Spec converged at GPT-5.5 cycle 3 (19 findings, all accepted); plan converged at cycle 3 (8 findings, all accepted). Post-merge review: Gemini 3 findings all fixed in `642b5b9`; GPT-5.5 final review 6 findings — 1 fixed in `945e833`, 1 deferred (structlog migration), 4 rejected with cited counter-evidence (truncated-diff false positives). Tests: 1000 backend unit pass (+87 new cases) + 19 Python parity + 19 TS parity; 38 TS lib + 66 modal still green. Alembic head unchanged at `0014_clusters_target_filter` — feature is purely additive at the application layer. Earlier 2026-05-20 (after `feat_cluster_target_filter` shipped as PR #168 squash `57d3ba0` + follow-up `chore_seed_meaningful_demos` shipped as PR #169 squash `c44d774`). **20th MVP1 feature merged** + demo-state durability gap closed in the same session. PR #168: 5 stories (B1 migration 0014 + ORM column; B3 Pydantic + service plumb-through + responses; B2 adapter Protocol + ElasticAdapter + StubAdapter + router; F1 register modal Target filter input; F2 create-study modal filter-aware empty-state + EntitySelect accessibility improvement). Plus 4 post-impl fix commits (test_migrations head bump, register modal overflow-y-auto, EntitySelect sr-only Gemini fix, spec drift cleanup + OpenAPI shape-lock contract test from GPT-5.5 final review). PR #169: `scripts/seed_meaningful_demos.py` + `make seed-demo` target (idempotent: TRUNCATE clusters CASCADE + DELETE matching ES/OS indices + reseed with per-cluster `target_filter` values baked in — closes the gap where integration tests kept wiping the dev DB with no durable reseed mechanism). 529/529 vitest across 79 files (was 525/78), 903 backend unit tests (was 899), 50 cluster-API integration tests (was 45) + 3 new migration round-trip tests + 7 contract validator cases + OpenAPI shape-lock test. **Alembic head moved to `0014_clusters_target_filter`.** Cross-model review pre-impl: spec + plan both converged at GPT-5.5 cycle 2 (12 findings total, all accepted). Post-impl: Gemini Code Assist 3 findings (2 accepted: EntitySelect sr-only on #168, http() auth type hint on #169; 1 rejected with cited counter-evidence: out-of-scope test file from #168). GPT-5.5 final review on #168: 2 findings, both accepted (spec drift + OpenAPI shape-lock). **Process feedback captured:** `.claude/projects/.../memory/feedback_one_branch_per_session.md` — should have bundled the seed chore into PR #168 rather than spinning a sibling PR. End-to-end smoke verified live before both merges. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Earlier 2026-05-20 (after `feat_create_study_target_autocomplete` shipped as PR #165 squash commit `bd4516a` — 19th MVP1 feature. Bundled the `get_schema` + `explain` connect-error fix per `bug_get_schema_unhandled_connect_error` in the same PR. 525/525 vitest across 78 files, 33 adapter unit tests + contract suite + integration tests all green twice (initial + post-cycle-2). Gemini Code Assist: 1 finding rejected with cited counter-evidence (pre-existing list-shape assumption matches the wire contract). GPT-5.5 final review: 2 findings — 1 accepted in `19d9d51` (contract-layer TARGETS_FORBIDDEN + CLUSTER_UNREACHABLE envelope assertions), 1 deferred with counter-evidence (dropdown E2E `test.skip`'d; AC coverage satisfied by 8 hook unit + 6 modal unit + integration + contract tests). Two follow-up ideas filed in-PR: `bug_e2e_target_dropdown_flake` + `chore_guide_06_screenshot_refresh_target_picker`.) Earlier — same day (after `feat_create_study_search_space_builder` shipped as PR #163 squash commit `c703953`, bundling the search-space builder feature + the `bug_judgment_lists_listing_ignores_query_set_filter` backend fix surfaced during local verification. 18th MVP1 feature. The builder + bug-fix bundle reflects the single-developer series workflow: rather than spin a sibling backend PR off `main`, the bug fix landed in the same branch since the dev was already in verification mode. PR #163 went through 3 spec cycles (16 findings) + 3 plan cycles (27 findings) + 3 Gemini Code Assist findings + 2 GPT-5.5 final-review passes (1 second-pass Low finding accepted on test coverage) = 47 review findings all accepted with cited fixes. 512 vitest assertions across 77 files, 4 real-backend Playwright e2e cases against the builder, 2 new backend tests for the bundled filter fix. Two follow-up idea files captured during local verification: `feat_create_study_target_autocomplete` (Step-1 free-text target field has no autocomplete from cluster indexes — pre-existing UX debt deferred) and the now-closed `bug_judgment_lists_listing_ignores_query_set_filter` (bundled into this PR).) Earlier (also 2026-05-20) — PR #161 `0879df2` `chore_create_study_modal_e2e_stability` (un-skipped the deferred Playwright spec via `dispatchEvent('click')` on the Radix trigger), PR #160 `160ff6b` `bug_err_metric_frontend_backend_drift` (wire-enum trim — `err` removed from frontend + backend Literal), PR #159 `52e106d` `bug_tutorial_template_param_boost_naming` (heuristic extension for `_boost` suffix). Earlier (also 2026-05-20) — PR #157 `chore_create_study_wizard_polish` — squash commit `075c46b` — merged into `main`. Ships the 4-surface chore: backend template-mismatch validation at create time (two new error codes `SEARCH_SPACE_UNKNOWN_PARAM` + `SEARCH_SPACE_MISSING_DECLARED_PARAM`), Step-4 auto-fill via the new `ui/src/lib/search-space-defaults.ts` heuristic + cap-aware fallback + TS↔Python cardinality parity fixture, 4 new `study.search_space.*` glossary entries (one dual + three short-only) and 6 extended per-metric entries with k-tier clauses, Step-5 tri-state metric+k rendering with new `K_IGNORED` predicate, plus client-side validation mirror + zero-declared block + 404/transient template-fetch recovery + `__placeholder__` warning. 16 new test files + 2 modified + 1 shared JSON fixture across backend unit/integration/contract + frontend unit/component + 1 skipped E2E. Three follow-up ideas captured: `bug_tutorial_template_param_boost_naming` (tutorial template uses `_boost` suffix not matched by the locked heuristic), `chore_create_study_modal_e2e_stability` (re-enable the skipped Playwright spec once EntitySelect disabled gating stabilizes), `bug_err_metric_frontend_backend_drift` (`err` selectable in wizard but unsupported by `scoring.py`). Gemini Code Assist + GPT-5.5 final-pass both adjudicated on the PR — 2 Gemini findings + 7 GPT-5.5 findings, all addressed or filed.) Earlier 2026-05-19 (after a 4-PR shipping run drained the actionable post-MVP1 chore backlog: PR #152 `chore_ci_prettier_check` (`476db78`) + PR #153 `chore_extract_shadcn_select_test_mock` (`199e225`) + PR #154 `chore_form_dropdown_guide_screenshot_refresh` (`ed4121f`) + PR #155 `chore_detail_page_shell_primitive` (`9a72514`). PR #155 is the third primitive after `` and `` — 6 detail-page migrations + new lint guard + flattens a latent UX bug where only `proposals/[id]` discriminated 404 from network error. Earlier the same session: PR #150 (`chore_data_table_columnvisibility_tanstack`, `c1e4545`) — closes the residual DataTable follow-ups: item 5 migrates the primitive from `columns.filter(...)` to TanStack's `state.columnVisibility` API (memoized per Gemini feedback), item 3 locked the flat-prop `DataTableProps` API as canonical with a "Shipped contract addendum" on the historical implementation plan's Story 2.6. Folder renamed `chore_data_table_primitive_followups` → `chore_data_table_columnvisibility_tanstack`. Earlier 2026-05-19 PR #148 (`infra_e2e_wire_seed_helper_into_studies_spec`, squash `65f4150`) — restored the 2 digest-panel E2E tests deferred from PR #130, diagnosed and fixed the real root cause of the original smoke-lane failure (`GET /api/v1/proposals` was silently ignoring the `?study_id=` filter, returning the most-recent global pending proposal), added 5-case integration regression coverage at `backend/tests/integration/test_proposals_study_filter.py`. Plus: (a) earlier 2026-05-18 PR #146 (`bug_install_skip_ui_rebuild`, squash `7299fca`) made `make up` rebuild every Compose service (`docker compose build` no-args), switched `make down` to `docker compose down`, and added a `verify_install_builds_all_services.sh` CI gate to lock the contract; (b) earlier 2026-05-18 PR #147 captured `chore_detail_page_shell_primitive` idea (squash `8854e47`). Two new follow-ups filed: `chore_ci_prettier_check` (CI's frontend job has no `prettier --check` step — surfaced when PR #136 drift in 2 unrelated files blocked an unrelated commit) and the in-flight `chore_detail_page_shell_primitive` (third primitive after DataTable + EntitySelect).) --- ## Current branch / execution context -- **Branch:** `docs/finalize-study-target-judgment-mismatch-guard` — finalization docs PR after PR #184 (`ce3fcf4`) merged 2026-05-21. `feature/study-target-judgment-mismatch-guard` branch deleted post-merge. Earlier: `docs/finalize-pr-metric-confidence` — finalization docs PR after PR #180 (`d0a8358`) merged 2026-05-21. `feat_pr_metric_confidence` branch deleted post-merge. Earlier: `docs/finalize-agent-propose-search-space` — finalization docs PR after PR #175 (`5d29355`) merged 2026-05-21. `feature/agent-propose-search-space` deleted post-merge. Earlier: `docs/finalize-cluster-target-filter` — finalization docs PR after PR #168 (`57d3ba0`) + PR #169 (`c44d774`) both merged. Prior `main` post-merge of PR #168 squash `57d3ba0` (`feat_cluster_target_filter`) + PR #169 squash `c44d774` (`chore_seed_meaningful_demos`) 2026-05-20. Earlier: PR #165 squash commit `bd4516a` 2026-05-20. Finalization docs branch `docs/finalize-create-study-target-autocomplete`. Prior squash same day: PR #163 `c703953` (`feat_create_study_search_space_builder`). Finalization docs PR off `docs/finalize-create-study-search-space-builder`. Prior squashes (same day): PR #161 `0879df2` (`chore_create_study_modal_e2e_stability`), PR #160 `160ff6b` (`bug_err_metric_frontend_backend_drift`), PR #159 `52e106d` (`bug_tutorial_template_param_boost_naming`), PR #158 `308c315` (finalize chore_create_study_wizard_polish), PR #157 `075c46b` (`chore_create_study_wizard_polish`). Prior squash: PR #155 `9a72514` 2026-05-19. Prior squashes: PR #154 `ed4121f` 2026-05-19 (`chore_form_dropdown_guide_screenshot_refresh`), PR #153 `199e225` 2026-05-19 (`chore_extract_shadcn_select_test_mock`), PR #152 `476db78` 2026-05-19 (`chore_ci_prettier_check`), PR #151 `110dc5a` 2026-05-19 (finalize chore_data_table_columnvisibility_tanstack), PR #150 `c1e4545` 2026-05-19 (`chore_data_table_columnvisibility_tanstack`), PR #149 `da9506b` 2026-05-19 (finalize infra_e2e_wire_seed_helper_into_studies_spec), PR #148 `65f4150` 2026-05-19 (`infra_e2e_wire_seed_helper_into_studies_spec` — `?study_id=` filter bug + E2E test restore), PR #147 `8854e47` 2026-05-18 (capture chore_detail_page_shell_primitive idea), PR #146 `7299fca` 2026-05-18 (bug_install_skip_ui_rebuild — `make up`/`make down` lifecycle fix), PR #136 `cb7d9ee` 2026-05-18 (chore_form_dropdown_primitive), PR #132 `ee4c8d4` 2026-05-17 (chore_data_table_primitive_followups items 1+2+4+6), PR #130 `13b3383` 2026-05-17 (infra_e2e_seed_completed_study), PR #128 `73459d2` 2026-05-17 (bug_cursor_decode_value_validation), PR #126 `d6115b3` 2026-05-16 (feat_data_table_primitive). `v0.1.0` annotated tag still on `main` commit `d099536` 2026-05-13; GitHub Release at https://github.com/SoundMindsAI/relyloop/releases/tag/v0.1.0. -- **Active feature:** none in flight (PR #184 closed `feat_study_target_judgment_mismatch_guard` on 2026-05-21 as the **23rd MVP1 feature** merged; only finalization docs PR remains). Prior: none in flight (PR #180 closed `feat_pr_metric_confidence` on 2026-05-21 as the **22nd MVP1 feature** merged; only finalization docs PR remains). Prior: none in flight (PR #175 closed `feat_agent_propose_search_space` on 2026-05-21; only finalization docs PR remains for the 21st MVP1 feature). Prior — none in flight (PR #168 closed `feat_cluster_target_filter` + PR #169 closed `chore_seed_meaningful_demos` on 2026-05-20; only finalization docs PR remains for the 20th MVP1 feature). Prior — none in flight (PR #165 closed `feat_create_study_target_autocomplete` + the bundled `bug_get_schema_unhandled_connect_error` fix on 2026-05-20). Prior — none in flight (PR #163 closed `feat_create_study_search_space_builder` + the `bug_judgment_lists_listing_ignores_query_set_filter` bundled fix on 2026-05-20). PR #168 closed `feat_cluster_target_filter` + PR #169 closed `chore_seed_meaningful_demos` (sibling). **Three PRs shipped 2026-05-15:** PR #122 (Phase 1, 16th MVP1 feature — Tooltip primitive + 26 placements on create-study modal + study detail), PR #123 (Phase 1 finalization docs), PR #124 (Phases 2 + 3 — 17th MVP1 feature; 21 additional tooltips on judgments + proposals + cluster registration + 2 new first-run components: chat ExamplePrompts strip + Stripe-style StartHereChecklist on home page). The original "MVP1 Phase 1 only" scope-lock was reversed mid-day: operator decided to ship Phases 2 + 3 together with a Stripe-style design call rather than wait for MVP2. PR #124 took 2 hours from idea-folder reuse to merge. 47 total tooltip placements + 2 new first-run components live in `main`. **PR #122 shipped 2026-05-15 morning** — `feat_contextual_help` Phase 1 (16th MVP1 feature). Adds the first Tooltip primitive (`@radix-ui/react-tooltip@~1.2.8` + shadcn-style wrapper at `ui/src/components/ui/tooltip.tsx`), two glossary-backed wrappers (`InfoTooltip` standalone + asChild modes; `HelpPopover` click-to-open with `react-markdown` safety filter), and a 49-key glossary source-of-truth at `ui/src/lib/glossary.ts` (8 enum groups parity-tested against `enums.ts`). 26 tooltip placements across the create-study modal (Step 1 target + Step 3 template + 9 Step 5 inputs), study-header (status badge dynamic key + Best metric + Trials), trials-table (5 column headers + Sort label), and digest panel (5 section labels + Open PR enabled + Open PR disabled). The disabled Open PR button refactored from native `disabled` to `aria-disabled="true"` so it stays focusable and the tooltip reveals on focus (AC-11). Gemini Code Assist: 2 findings (1 accepted + fixed, 1 rejected with cited counter-evidence). Final GPT-5.5 review: 1 Medium accepted-framing-but-deferred. Spec converged at GPT-5.5 cycle 3 (24 findings, 23 accepted + 1 rejected); plan converged at cycle 2 (12 findings, 10 accepted + 1 rejected + 1 spec patch). UI vitest now **279 passing across 48 files** (was 249 across 45 — +3 new test files, +30 cases). Playwright E2E **8 passing** (was 5 — +3 new contextual-help tests). One follow-up filed: `infra_e2e_seed_completed_study/idea.md` tracks the E2E gap for digest-panel triggers + AC-11 (cross-subsystem helper for seeding a completed study with digest + proposal; component-level coverage is in place). Phases 2 + 3 deferred to MVP2 via `feat_contextual_help_mvp2/` (judgments + proposals tooltips; chat + cluster + home onboarding; the home-page "Start here" panel is the only product-design-shaped item). +- **Branch:** `docs/finalize-e2e-test-rows-isolation` — finalization docs PR after PR #186 (`a444b94`) merged 2026-05-21. `chore/e2e-test-rows-isolation` branch deleted post-merge. Earlier: `docs/finalize-study-target-judgment-mismatch-guard` — finalization docs PR after PR #184 (`ce3fcf4`) merged 2026-05-21. `feature/study-target-judgment-mismatch-guard` branch deleted post-merge. Earlier: `docs/finalize-pr-metric-confidence` — finalization docs PR after PR #180 (`d0a8358`) merged 2026-05-21. `feat_pr_metric_confidence` branch deleted post-merge. Earlier: `docs/finalize-agent-propose-search-space` — finalization docs PR after PR #175 (`5d29355`) merged 2026-05-21. `feature/agent-propose-search-space` deleted post-merge. Earlier: `docs/finalize-cluster-target-filter` — finalization docs PR after PR #168 (`57d3ba0`) + PR #169 (`c44d774`) both merged. Prior `main` post-merge of PR #168 squash `57d3ba0` (`feat_cluster_target_filter`) + PR #169 squash `c44d774` (`chore_seed_meaningful_demos`) 2026-05-20. Earlier: PR #165 squash commit `bd4516a` 2026-05-20. Finalization docs branch `docs/finalize-create-study-target-autocomplete`. Prior squash same day: PR #163 `c703953` (`feat_create_study_search_space_builder`). Finalization docs PR off `docs/finalize-create-study-search-space-builder`. Prior squashes (same day): PR #161 `0879df2` (`chore_create_study_modal_e2e_stability`), PR #160 `160ff6b` (`bug_err_metric_frontend_backend_drift`), PR #159 `52e106d` (`bug_tutorial_template_param_boost_naming`), PR #158 `308c315` (finalize chore_create_study_wizard_polish), PR #157 `075c46b` (`chore_create_study_wizard_polish`). Prior squash: PR #155 `9a72514` 2026-05-19. Prior squashes: PR #154 `ed4121f` 2026-05-19 (`chore_form_dropdown_guide_screenshot_refresh`), PR #153 `199e225` 2026-05-19 (`chore_extract_shadcn_select_test_mock`), PR #152 `476db78` 2026-05-19 (`chore_ci_prettier_check`), PR #151 `110dc5a` 2026-05-19 (finalize chore_data_table_columnvisibility_tanstack), PR #150 `c1e4545` 2026-05-19 (`chore_data_table_columnvisibility_tanstack`), PR #149 `da9506b` 2026-05-19 (finalize infra_e2e_wire_seed_helper_into_studies_spec), PR #148 `65f4150` 2026-05-19 (`infra_e2e_wire_seed_helper_into_studies_spec` — `?study_id=` filter bug + E2E test restore), PR #147 `8854e47` 2026-05-18 (capture chore_detail_page_shell_primitive idea), PR #146 `7299fca` 2026-05-18 (bug_install_skip_ui_rebuild — `make up`/`make down` lifecycle fix), PR #136 `cb7d9ee` 2026-05-18 (chore_form_dropdown_primitive), PR #132 `ee4c8d4` 2026-05-17 (chore_data_table_primitive_followups items 1+2+4+6), PR #130 `13b3383` 2026-05-17 (infra_e2e_seed_completed_study), PR #128 `73459d2` 2026-05-17 (bug_cursor_decode_value_validation), PR #126 `d6115b3` 2026-05-16 (feat_data_table_primitive). `v0.1.0` annotated tag still on `main` commit `d099536` 2026-05-13; GitHub Release at https://github.com/SoundMindsAI/relyloop/releases/tag/v0.1.0. +- **Active feature:** none in flight (PR #186 closed `chore_e2e_test_rows_isolation` on 2026-05-21 as the **24th MVP1 feature** merged; only finalization docs PR remains). Prior: none in flight (PR #184 closed `feat_study_target_judgment_mismatch_guard` on 2026-05-21 as the **23rd MVP1 feature** merged; only finalization docs PR remains). Prior: none in flight (PR #180 closed `feat_pr_metric_confidence` on 2026-05-21 as the **22nd MVP1 feature** merged; only finalization docs PR remains). Prior: none in flight (PR #175 closed `feat_agent_propose_search_space` on 2026-05-21; only finalization docs PR remains for the 21st MVP1 feature). Prior — none in flight (PR #168 closed `feat_cluster_target_filter` + PR #169 closed `chore_seed_meaningful_demos` on 2026-05-20; only finalization docs PR remains for the 20th MVP1 feature). Prior — none in flight (PR #165 closed `feat_create_study_target_autocomplete` + the bundled `bug_get_schema_unhandled_connect_error` fix on 2026-05-20). Prior — none in flight (PR #163 closed `feat_create_study_search_space_builder` + the `bug_judgment_lists_listing_ignores_query_set_filter` bundled fix on 2026-05-20). PR #168 closed `feat_cluster_target_filter` + PR #169 closed `chore_seed_meaningful_demos` (sibling). **Three PRs shipped 2026-05-15:** PR #122 (Phase 1, 16th MVP1 feature — Tooltip primitive + 26 placements on create-study modal + study detail), PR #123 (Phase 1 finalization docs), PR #124 (Phases 2 + 3 — 17th MVP1 feature; 21 additional tooltips on judgments + proposals + cluster registration + 2 new first-run components: chat ExamplePrompts strip + Stripe-style StartHereChecklist on home page). The original "MVP1 Phase 1 only" scope-lock was reversed mid-day: operator decided to ship Phases 2 + 3 together with a Stripe-style design call rather than wait for MVP2. PR #124 took 2 hours from idea-folder reuse to merge. 47 total tooltip placements + 2 new first-run components live in `main`. **PR #122 shipped 2026-05-15 morning** — `feat_contextual_help` Phase 1 (16th MVP1 feature). Adds the first Tooltip primitive (`@radix-ui/react-tooltip@~1.2.8` + shadcn-style wrapper at `ui/src/components/ui/tooltip.tsx`), two glossary-backed wrappers (`InfoTooltip` standalone + asChild modes; `HelpPopover` click-to-open with `react-markdown` safety filter), and a 49-key glossary source-of-truth at `ui/src/lib/glossary.ts` (8 enum groups parity-tested against `enums.ts`). 26 tooltip placements across the create-study modal (Step 1 target + Step 3 template + 9 Step 5 inputs), study-header (status badge dynamic key + Best metric + Trials), trials-table (5 column headers + Sort label), and digest panel (5 section labels + Open PR enabled + Open PR disabled). The disabled Open PR button refactored from native `disabled` to `aria-disabled="true"` so it stays focusable and the tooltip reveals on focus (AC-11). Gemini Code Assist: 2 findings (1 accepted + fixed, 1 rejected with cited counter-evidence). Final GPT-5.5 review: 1 Medium accepted-framing-but-deferred. Spec converged at GPT-5.5 cycle 3 (24 findings, 23 accepted + 1 rejected); plan converged at cycle 2 (12 findings, 10 accepted + 1 rejected + 1 spec patch). UI vitest now **279 passing across 48 files** (was 249 across 45 — +3 new test files, +30 cases). Playwright E2E **8 passing** (was 5 — +3 new contextual-help tests). One follow-up filed: `infra_e2e_seed_completed_study/idea.md` tracks the E2E gap for digest-panel triggers + AC-11 (cross-subsystem helper for seeding a completed study with digest + proposal; component-level coverage is in place). Phases 2 + 3 deferred to MVP2 via `feat_contextual_help_mvp2/` (judgments + proposals tooltips; chat + cluster + home onboarding; the home-page "Start here" panel is the only product-design-shaped item). **Earlier — seven PRs shipped 2026-05-14:** `feat_judgments_periodic_resume_sweep` (PR #104, 14th MVP1 feature), `bug_query_inline_crud_since_filter_uuidv7_ms_collision` (PR #106 — UUIDv7 ms-collision test flake), `infra_dashboard_regen_pre_commit_conflict §2+§4` (PR #108 — dashboard regen idempotency + relative-link rewriting), `infra_make_targets_split_backend_only` (PR #110 — `make backend-fmt/lint/typecheck` + symmetric `ui-fmt` so Node-18 contributors aren't blocked), `chore_digest_worker_narrow_except` (PR #112 — narrowed `except Exception` allowlist to `(ValueError,)` + ERROR-level `digest_importance_failed_unexpected` event), `infra_structlog_test_helpers` (PR #114 — factored the two structlog test-assertion patterns into `backend/tests/_log_helpers.py`), and `chore_chat_last_message_preview` (PR #117 — `last_message_preview` + `last_message_at` on `ConversationSummary` via LATERAL JOIN; frontend shows preview under title + swaps displayed timestamp from `created_at` to `last_message_at`). Plus PR #116 dropped `chore_studies_ui_shadcn_polish` as won't-do (forward-compat audit on NavigationMenu primitive + ClusterFilterSelect precedent on native `` uses a hoisted `targetReg = form.register('target')` (RHF register preserved) with `onChange={(e) => { targetReg.onChange(e); form.setValue('judgment_list_id', ''); }}` for cascade reset; dropdown-mode target `onChange` mirrors the same cascade; new `emptyState` on the judgment-list `` substitutes the watched target value + CTA href="/judgments". `useJudgmentLists` filter type extended; `target` threaded through both params AND queryKey for cache scoping. **Drive-by fix bundled (per inline-fix rubric)**: E2E seed helpers (`seedJudgmentList`, `seedFullChain`, `seedStudy`) gain optional `target` overrides — `seedJudgmentList` default changed from `'e2e-target'` to `'products'` matching `seedStudy`'s default; 3 specs updated (`studies-create-validation.spec.ts` fill→'products', `studies-create-builder.spec.ts` passes `judgmentListTarget: 'e2e-builder-target'` to align with modal fill, `studies-create-target-dropdown.spec.ts` passes the alpha seeded ES index name). **Cross-model review**: spec converged at GPT-5.5 cycle 3 (17 findings — all 17 accepted, 1 rejected with cited counter-evidence at `create-study-modal.tsx:508`); plan converged at cycle 3 (16 findings, all accepted); Gemini Code Assist 2 findings (1 accepted in `035af0a` — IIFE → hoisted register; 1 rejected with precedent counter-evidence at `test_judgments_api_contract.py:215`); final GPT-5.5 review 10 findings — 2 accepted in `a358a71` (Story 1.1 over-bound 422 test on `?target=<256 chars>`), 8 rejected (5 truncation false positives + 3 plan/precedent rejects). Adjudication summary posted at https://github.com/SoundMindsAI/relyloop/pull/184#issuecomment-4513369521. **Tests**: 1040 backend unit (unchanged — validators are inline conditionals); backend integration +7 cases (target mismatch + cluster mismatch + cluster-fires-before-target + qs-fires-first ordering + GET pre-existing 200 negative test + `?target=` AND-semantics across 4 lists × 2 clusters × 2 query-sets + summary shape + over-bound 422); backend contract +2 cases (firing-order source-presence lock at `test_studies_api_contract.py` + summary `target` shape lock at `test_judgments_api_contract.py`); UI vitest **572** across 83 files (was 567/83 — +5: hook wire-filter, manual cascade, dropdown cascade, cluster regression-lock, empty-state CTA). CI green on final `a358a71` (5/5 jobs incl. 70/70 Playwright smoke). **Alembic head unchanged at `0015_trials_per_query_metrics`** — feature is purely additive at the application layer. - **2026-05-21 — `feat_agent_propose_search_space` merged into `main` as PR #175 squash `5d29355`.** **21st MVP1 feature shipped**, 10 stories across 5 epics. New 20th agent tool `propose_search_space` (read-only, NOT in `MUTATING_TOOL_NAMES`) builds a deterministic starter search space from a template's `declared_params` via the same heuristic that powers the create-study wizard's auto-fill — Python port of `ui/src/lib/search-space-defaults.ts` lives at `backend/app/domain/study/search_space_defaults.py` with a TS↔Python parity test driven by a shared JSON fixture (`backend/tests/_fixtures/search_space_defaults_parity.json`, 18 rows). Cap-aware overflow now raises `InvalidSearchSpaceError`/throws on both Python AND TS sides — fixes a latent bug where the TS implementation silently returned an invalid `SearchSpace` when given 8+ fall-through floats (6⁸ > 10⁶ exhausts the cap-aware fallback). Optional `prior_study_id` arg narrows numeric param bounds via `winner ± |winner| × bracket` for sign-symmetric math (a Gemini find — naive `winner * 0.5 / winner * 1.5` inverted bounds for negative winners) with the `bracket` arg actually threaded through both linear paths; log-uniform float keeps fixed √2 factor per spec FR-3. Graceful degrade paths emit WARN logs: `agent.propose_search_space.prior_template_mismatch` (different template_id) + `agent.propose_search_space.missing_winner_trial` (cascade-delete race) + non-numeric winner skip. `ToolContext` gained required `conversation_id: str` field plumbed from `orchestrator.run_turn` so adherence telemetry can correlate offline — paired INFO events `agent.search_space_proposed` + `agent.create_study.invoked` tagged with same conversation_id per spec FR-6 (operator grep recipe in `docs/03_runbooks/agent-debugging.md` §5). New `repo.get_trial(db, trial_id)` parallels `repo.get_study`. System prompt updated: 19→20 tools, "Studies (4)" lists `propose_search_space` FIRST, new chain-guidance bullet under Rule #1 directs LLM to call propose before create. `ProposeSearchSpaceArgs` uses `ConfigDict(extra="forbid")` so hallucinated LLM args fail loudly. Spec converged at GPT-5.5 cycle 3 (19 findings, all accepted); plan converged at cycle 3 (8 findings, all accepted). Post-merge review: Gemini Code Assist 3 findings all accepted + fixed in `642b5b9` (negative-winner narrowing bug × 2 + bracket arg threading); GPT-5.5 final review 6 findings — 1 accepted + fixed in `945e833` (`ConfigDict(extra="forbid")`), 1 deferred (structlog migration — broader codebase decision), 4 rejected with cited counter-evidence (truncated-diff false positives — all claimed-missing content was present in the PR). Adjudication summary posted at https://github.com/SoundMindsAI/relyloop/pull/175#issuecomment-4504624481. Tests: +87 new backend unit cases (44 search_space_defaults + 19 parity + 3 ToolContext + 19 propose_search_space + 7 telemetry + 5 prompt snapshot) + 2 repo.get_trial integration + 1 propose→create integration; UI vitest 38 lib + 19 parity + 66 modal all green; 1000 backend unit tests pass locally. Backend lint + mypy --strict clean. **Alembic head unchanged at `0014_clusters_target_filter`** — feature is purely additive at the application layer (no migration).