@@ -2002,8 +2003,6 @@
Dependency graph (feat_ + infra_)
classDef plan fill:#fef9c3,stroke:#854d0e,color:#854d0e;
classDef spec fill:#dbeafe,stroke:#1e40af,color:#1e40af;
classDef idea fill:#f1f5f9,stroke:#334155,color:#334155;
- chore_dashboard_regen_quoted_pr_false_positive["dashboard regen quoted pr false positive"]
- class chore_dashboard_regen_quoted_pr_false_positive plan;
infra_agent_sibling_worktree_isolation["agent sibling worktree isolation"]
class infra_agent_sibling_worktree_isolation implement;
infra_foundation["foundation"]
@@ -2156,6 +2155,8 @@
Dependency graph (feat_ + infra_)
class feat_digest_executable_followups_swap_template done;
feat_home_demo_reseed_endpoint["home demo reseed endpoint"]
class feat_home_demo_reseed_endpoint done;
+ chore_dashboard_regen_quoted_pr_false_positive["dashboard regen quoted pr false positive"]
+ class chore_dashboard_regen_quoted_pr_false_positive done;
chore_e2e_seed_acme_idea_obsolete["e2e seed acme idea obsolete"]
class chore_e2e_seed_acme_idea_obsolete done;
feat_study_baseline_trial["study baseline trial"]
@@ -2217,8 +2218,6 @@
Dependency graph (feat_ + infra_)
classDef plan fill:#fef9c3,stroke:#854d0e,color:#854d0e;
classDef spec fill:#dbeafe,stroke:#1e40af,color:#1e40af;
classDef idea fill:#f1f5f9,stroke:#334155,color:#334155;
- chore_dashboard_regen_quoted_pr_false_positive["dashboard regen quoted pr false positive"]
- class chore_dashboard_regen_quoted_pr_false_positive plan;
infra_agent_sibling_worktree_isolation["agent sibling worktree isolation"]
class infra_agent_sibling_worktree_isolation implement;
infra_foundation["foundation"]
@@ -2371,6 +2370,8 @@
Dependency graph (feat_ + infra_)
class feat_digest_executable_followups_swap_template done;
feat_home_demo_reseed_endpoint["home demo reseed endpoint"]
class feat_home_demo_reseed_endpoint done;
+ chore_dashboard_regen_quoted_pr_false_positive["dashboard regen quoted pr false positive"]
+ class chore_dashboard_regen_quoted_pr_false_positive done;
chore_e2e_seed_acme_idea_obsolete["e2e seed acme idea obsolete"]
class chore_e2e_seed_acme_idea_obsolete done;
feat_study_baseline_trial["study baseline trial"]
diff --git a/docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/pipeline_status.md b/docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/pipeline_status.md
deleted file mode 100644
index 1d60ca91..00000000
--- a/docs/02_product/planned_features/chore_dashboard_regen_quoted_pr_false_positive/pipeline_status.md
+++ /dev/null
@@ -1,29 +0,0 @@
-# Pipeline Status — chore_dashboard_regen_quoted_pr_false_positive
-
-## Idea
-- Status: Complete
-- File: idea.md
-- /idea-preflight verdict (2026-05-25): Ready after 3-edit patch (line 572→581 drift, PR-TBD→PR #221 for sibling chore, "Why deferred" status clarification)
-
-## Spec
-- Status: Approved
-- Date: 2026-05-25
-- File: feature_spec.md
-- Cross-model review: GPT-5.5 converged after 3 cycles
- - Cycle 1: 1 Low finding (AC-7 missed single-line triple-backtick fences) — accepted, added AC-12
- - Cycle 2: 2 Low findings (regex hint `+` would skip empty spans; 4 stale "6 tests" residuals) — both accepted, patched
- - Cycle 3: 0 findings → stop rule satisfied
-- Phases: 1 (single phase, two-PR rollout — see §3 Phase boundaries)
-- FRs: 5 (FR-1 helper, FR-2 wire-in, FR-3 test class, FR-4 docstring, FR-5 post-merge finalization)
-- ACs: 7 (AC-6 through AC-12)
-
-## Plan
-- Status: Approved
-- Date: 2026-05-25
-- File: implementation_plan.md
-- Cross-model review: GPT-5.5 cycle 1 produced 2 findings (1 Low, 1 Medium); both accepted and patched (gate arithmetic 6→7; regex `\`{3,}` for spec-compliance with "3-or-more" fence delimiter). Cycle 2 = 0 findings → stop rule satisfied.
-- Stories: 5 total across 2 epics (Epic 1 = Stories 1.1–1.4 in PR A; Epic 2 = Story 2.1 in PR B)
-- Phases covered: single phase (two-PR rollout per spec §3)
-
-## Implementation
-- Status: Not started
diff --git a/state.md b/state.md
index 8eae0a99..0352e375 100644
--- a/state.md
+++ b/state.md
@@ -2,7 +2,7 @@
> Read this first. Snapshots the active branch, what just shipped, what's in flight, what's queued, and where the project currently sits in the MVP1 → GA roadmap. Updated whenever a feature lands or a priority shifts.
-**Last updated:** 2026-05-25 (after `chore_e2e_seed_acme_idea_obsolete` admin-merged into `main` as PR #250 squash `05f3d486` — 39th MVP1-era artifact. Doc-only chore that closes the OBE'd `chore_e2e_seed_acme_helper_dead` idea (Option A — in-place `**Status:**` line edit at line 4, dashboard regen picks up the closure via `_extract_status_line` at `scripts/build_mvp1_dashboard.py:213`) and refreshes `ui/tests/e2e/helpers/coverage-audit.md` to 9-of-9 helper coverage (the `seedAcmeProductsChain` "0 specs — currently uncalled" framing was OBE'd by commit `2cbcb93b` which wired the helper into `ui/tests/e2e/guides/06_create_and_monitor_study.spec.ts`). **5-cycle / 13-finding cross-model review**: spec-gen 3 cycles (`**Status (...):**` regex mismatch, state.md scope drift, residual "prepend" contradictions, two-PR rollout shape); impl-plan-gen 2 cycles (dashboard-regen anti-pattern unenforceable given pre-commit hook, story numbering, file-count, AC-2 substring checks); Epic 1 phase-gate 0 findings; final review cycle 1 caught stale-base after sibling-worktree Phase 2 merged mid-flight → rebased onto `bfa8799f` with `-X ours` for dashboard conflicts; final review cycle 2 = 0 findings. Gemini Code Assist posted 3 line-level findings on `feature_spec.md` claiming 3-up paths should be 4-up — all 3 rejected with empirical `ls -d` counter-evidence (hunk-isolated path-counting false positives). CI smoke failed (pre-existing 5+ main pushes failing the same way; captured in `bug_smoke_dashboard_demo_state_locator_missing`); other 6 checks green. Two-PR rollout: PR #250 = content (FRs 1–4), PR B = this finalization commit (FR-5 folder move). **Earlier today** `infra_agent_sibling_worktree_isolation` Phase 1 + Phase 2 admin-merged into `main` as PR #249 squash `22f878f` — 38th MVP1-era artifact. Adds the `## Working in sibling worktrees` section to CLAUDE.md + 5-test regression suite locking its invariants + `scripts/run-tests-in-worktree.sh` automation + `make test-worktree` + 8-test smoke + runbook. Two tangentials captured: `chore_state_md_size_compression` and `bug_dockerfile_venv_root_owned_after_user_switch`. Phase 3 deferred per `phase3_idea.md`. 11-cycle / 39-finding cross-model review; all accepted. **Earlier today** `feat_study_baseline_trial` admin-merged as PR #245 squash `53be6c63` — 36th MVP1-era artifact. Implements the deferred Phase 2 of `feat_pr_metric_confidence` (PR #180): orchestrator runs a single non-Optuna baseline trial before Optuna via the 4-tier params resolver (parent_proposal → parent_study → operator-supplied → template-defaults), persists it as a real `Trial` row with `is_baseline=TRUE` + `optuna_trial_number=-1` sentinel, stamps `studies.baseline_trial_id` + `baseline_metric` via the new `services.study_state.stamp_baseline_trial` chokepoint (FR-12). Existing data-driven consumers flip automatically from "vs runner-up" to "vs baseline" — confidence per-query outcomes (FR-4), auto-followup chain gate (FR-5, now direction-aware — closes a latent minimize-direction bug as a side effect), digest narrative framing, PR body, ConfidencePanel label. New trials-table "Show baseline trial" UI toggle (FR-9). Migration 0020 adds `studies.baseline_trial_id` VARCHAR(36) NULL + `trials.is_baseline` BOOLEAN NOT NULL DEFAULT FALSE + partial unique index `uq_trials_study_baseline_complete` (defense layer 2 of the 3-layer resume-race guard per D-16). **3-cycle spec + 3-cycle plan + 1 CI-fix round.** CI-fix root cause: `test_study_cancel` hung 8min in backend-full because `_wait_for_baseline_trial_by_*` didn't check `study.status` for cancel (production bug — without this, operator-initiated cancel mid-baseline would wait the full `_BASELINE_WAIT_FLOOR_S` 60s before noticing); fixed by adding `_study_cancelled()` helper to bail on every poll tick + extending `_InProcessPool.enqueue_job` to dispatch `run_baseline_trial` inline + monkeypatching `_BASELINE_WAIT_FLOOR_S` = 2.0 in `_running_orchestrator`. Gemini Code Assist: 4 findings, all rejected with cited counter-evidence (3 High duplicates were false positives on `create_trial`'s `**fields: object` signature; 1 Medium was a redundant check already enforced at `FloatParam.model_validator`). **Admin-merged** because smoke is still red on the orthogonal pre-existing dashboard-banner E2E (same `dashboard.spec.ts` + `dashboard-reseed.spec.ts` failures from PR #232/#234/#236), and `ui/tests/e2e/` is untouched by this branch. **Alembic head advanced 0019 → 0020.** Only this finalization docs PR remains. Earlier: `feat_study_clone_from_previous` merged into `main` as PR #243 squash `34118ade` — admin-merged because the pre-existing dashboard demo-state-locator smoke regression (`bug_smoke_dashboard_demo_state_locator_missing`) is still blocking the smoke gate. The feature ships the full clone flow end-to-end: backend `parent_study_id` field + early-placement validation (`PARENT_STUDY_NOT_FOUND` 404, `PARENT_STUDY_WRONG_CLUSTER` 422) + persistence (no migration — column was already present from `0003_study_lifecycle_schema.py`); frontend `PrefillValues` widening + `buildPrefillFromStudy` helper + "Clone study" button on `StudyActionBar` (with running-source confirmation `AlertDialog` per FR-11) + cloned-from banner in `CreateStudyModal` (UI-only `cloneSource` never reaches the wire per D-12); deep-link `?clone_from=
` reader on `/studies` with one-shot `useRef` guard + automatic re-arm on `cloneFromId` change (Gemini PR #243 #1 fix); 22 new frontend vitest cases + 7 backend integration cases + 1 Playwright real-backend E2E spec. **15 FRs / 17 ACs** all covered. **3-cycle spec + 3-cycle plan + 1-cycle Epic-1 phase-gate + 1-cycle Epic-2 phase-gate + 1-cycle final-pass + 1-cycle Gemini** = all reviews adjudicated (15 findings total, 2 accepted+fixed, 1 deferred-as-non-regression, 12 rejected with cited counter-evidence or resolved-by-merge). **Two tangential bug ideas surfaced:** [`bug_datatable_col_vis_density_localstorage_undefined_jsdom`](docs/02_product/planned_features/bug_datatable_col_vis_density_localstorage_undefined_jsdom/idea.md) (pre-existing vitest localStorage failures) + [`bug_smoke_dashboard_demo_state_locator_missing`](docs/02_product/planned_features/bug_smoke_dashboard_demo_state_locator_missing/idea.md) (pre-existing smoke regression on dashboard demo-state locators — same failure reproduces on main run #26397500888). **Follow-up:** [`feat_study_clone_narrow_bounds`](docs/02_product/planned_features/feat_study_clone_narrow_bounds/idea.md) (smart-rewrite of search-space bounds around the source's winner trial) remains in `planned_features/` for future scoping. Earlier: after `bug_demo_clusters_unreachable_in_healthz` merged into `main` as PR #236 squash `70b2ae46` — admin-merged because the pre-existing dashboard banner E2E failure still blocks smoke. Closes BOTH smoke-cascade `/healthz` observability bugs (PR #234 + PR #236) surfaced during the PR #232 unblock. **The fix:** new `run_cluster_health_warmup_background` service module spawned from the FastAPI lifespan hook + FR-7 fix to `get_or_probe_health`'s `CredentialsMissing` branch (now writes synthetic unreachable to cache instead of returning without caching). Within ~5s of API startup, `/healthz` reports truthful `elasticsearch_clusters` aggregate counts. **3-cycle spec + 3-cycle plan + 1-cycle phase-gate + 1-cycle final cross-model review** = 4 cycles total of GPT-5.5 review (32 findings, all accepted). **3 CI fix rounds after PR open:** (1) per-page session lifecycle refactor to release asyncpg connections before HTTP probes; (2) env-var gate `RELYLOOP_DISABLE_STARTUP_WARMUP=1` for integration tests to avoid asyncio interleaving with the latent webhook merge-handler row-lock race; (3) `monkeypatch.delenv` for unit test isolation. **Notable tangential bug captured:** [`bug_webhook_concurrent_merge_race_timing_sensitive/idea.md`](docs/02_product/planned_features/bug_webhook_concurrent_merge_race_timing_sensitive/idea.md) — real production-correctness bug in the webhook merge handler's row-lock, deterministically reproducible by adding ANY second lifespan task, masked on main today by pure asyncio-scheduling luck; the next feature that adds a lifespan task will trip it. P2 next-ticket. **Architecture doc** updated with three-path cache-population subsection (registration / lazy on-demand / startup warmup) + race-window caveat. **No new migration; no /healthz response shape change.** Earlier: after `bug_openai_capability_check_incapable_on_valid_key` merged into `main` as PR #234 squash `d69189db` — admin-merged because the pre-existing `bug_demo_clusters_unreachable_in_healthz` failure still blocks the smoke gate at the dashboard-banner E2E layer; this PR fixes the OTHER smoke-cascade bug (the openai capability observability gap). `/healthz` `openai_capabilities` block now carries 5 required fields: the existing `chat / function_calling / structured_output` plus new `models_endpoint: Literal["ok","fail","untested"]` and required-but-nullable `models_endpoint_status_code: int | None`. `_probe_models_endpoint` return contract widened to `tuple[bool, int | None]`; status code captured only on `>= 400` HTTP failure (never on success or network errors — and the response body is NEVER captured, only the integer status, per CLAUDE.md Absolute Rule #10). Cached `CapabilityResult.models_endpoint` schema stays 2-valued — `"untested"` only widens on the response model. Backwards compat verified: pre-fix Redis cache rows deserialize cleanly via Pydantic optional-field defaulting. Spec converged at GPT-5.5 cycle 3 (13 findings, 12 accepted + 1 rejected with counter-evidence); plan at cycle 3 (17 findings all accepted); phase-gate at cycle 1 (3 findings, 2 accepted + 1 rejected — dashboard regen is the auto-run pre-commit hook); final review at cycle 1 (1 Low finding deferred as non-regression — test-helper type hints matching existing file convention). Gemini Code Assist: clean review, zero findings. Tests: +15 cases (7 in test_capability_check.py incl `TestSecurityRedaction` for AC-10 + 5 in test_health.py incl AC-10 end-to-end through `check_capabilities` → Redis JSON round-trip → /healthz + 1 defensive in test_probes.py + 2 in test_health_contract.py). Architecture doc updated with success/failure response examples + repo-secret-vs-`.env` divergence note + cascade explanation. Remaining smoke-cascade item: [`bug_demo_clusters_unreachable_in_healthz`](docs/02_product/planned_features/bug_demo_clusters_unreachable_in_healthz/idea.md) (P2). Earlier: after `feat_home_demo_reseed_endpoint` merged into `main` as PR #228 squash `ad6ff826`. Dev-only `POST /api/v1/_test/demo/reseed` endpoint that wipes the 10 demo Postgres tables + 4 ES/OS indices and re-seeds the 4 demo scenarios from `scripts/seed_meaningful_demos.py`. Dashboard now renders a "Reset to demo state" disclosure inside `StartHereChecklist` whenever all three first-run signals are false. Architecture: dual httpx clients (api + engine) + session-level Postgres advisory lock on a dedicated pinned `AsyncConnection` + NO outer wall-clock timeout (per-call HTTP ceiling only) + TRUNCATE-commits-before-self-call invariant (AC-13) + cleanup-on-failure pass via a fresh DB connection. 14 GPT-5.5 spec cycles + 14 plan cycles to convergence; 2 Gemini Medium findings + 1 GPT-5.5 High + 2 Medium accepted, 1 Medium rejected as stale, 1 Low deferred. The High-severity fix: the in-container OpenSearch port resolver was mapping `localhost:9201` → `opensearch:9201` but the OS container actually listens on `:9200` inside the Compose network (the host `:9201` is just the port-mapping to avoid colliding with ES on the host). Tests: backend unit+contract 1560 pass (+45 vs the prior baseline); 10 integration tests at `backend/tests/integration/test_demo_seeding{,_timeout}.py` covering AC-1..AC-5 + AC-12..AC-16 (skip outside CI service containers); 21 dashboard vitest cases; 1 Playwright spec at `ui/tests/e2e/dashboard-reseed.spec.ts`. New runbook at [`docs/03_runbooks/demo-reseed-debugging.md`](docs/03_runbooks/demo-reseed-debugging.md). Tangential capture: [`bug_vitest_jsdom_localstorage_failures/idea.md`](docs/02_product/planned_features/bug_vitest_jsdom_localstorage_failures/idea.md) — 31 pre-existing vitest failures in 4 files all touching `window.localStorage`, confirmed unrelated to this PR by stashing the feature branch and reproducing on baseline. **Alembic head unchanged at `0019_digests_suggested_followups_jsonb`** — no schema change.) Prior update: 2026-05-23 (after `chore_study_default_stop_conditions` merged into `main` as PR #215 squash `370c87d9` — **first MVP1.0-cleanup chore shipped** in the operator's stated "finish MVP1.0 before MVP1.5" sweep. Frontend-only chore (~175 LOC production code + 12 vitest cases): pre-fills `max_trials = 200` in the create-study modal's Step-5 form (FR-1, baked into `useForm` `defaultValues`), adds a 4-button Stop-condition preset selector (Focused 50 / Standard 200 / Deep 1000 / Custom) above the numeric inputs (FR-2..FR-4 + FR-7), refreshes `study.max_trials` + `study.time_budget_min` glossary copy with dimensionality-keyed framing and adds a new `study.preset` glossary entry (FR-5 + FR-8), updates the chat orchestrator system prompt to recommend `max_trials=200` by default with the dimensionality scaling guidance (FR-6), and ships a 12-case vitest suite in [`ui/src/__tests__/components/studies/create-study-modal.stop-conditions.test.tsx`](ui/src/__tests__/components/studies/create-study-modal.stop-conditions.test.tsx) covering AC-1..AC-6 + 2 bug-guards + AC-8 + AC-10 + the type="button" check (FR-9). `activePreset` is derived purely from form values via `useMemo` (no `useState` + watcher `useEffect`); Custom click is a no-op (Custom == "values don't match any preset"). Cross-model review: spec converged at GPT-5.5 cycle 3 (12 findings — 11 accepted across cycles 1-3, 1 rejected with cited counter-evidence at `backend/app/api/errors.py:62, 118` re: VALIDATION_ERROR envelope claim); plan converged at cycle 2 (5 cycle-1 findings, all accepted; 0 cycle-2 findings); impl-diff GPT-5.5 cycle 1 raised 1 Medium finding (modal-open form-field reset gap, Radix Dialog mount-persistence bug), addressed in subsequent fix iterations; Gemini Code Assist 1 Medium finding (Defer — pre-existing form-state persistence across modal toggles, out of scope for this chore). **Late-stage E2E regression + root-cause fix**: `studies-create-builder.spec.ts:130` + `studies-create-target-dropdown.spec.ts:48` started failing against the production UI image — Playwright's `.fill('10')` on a non-empty Max trials input (200 default) was triggering a stray form-submit event before the test's explicit submit-button click, leaving the button stuck in `Submitting…` while Playwright retried the click against a vanishing button. Seven mechanical fix attempts (drop `form` from useEffect deps, modal-overflow scrolling, prev-open `useRef` gating, in-effect setValue removal, RHF subscription watcher, useMemo-derived activePreset, Enter-key suppression) didn't move the needle. The actual fix decouples submission from the form's `onSubmit` event entirely: `