diff --git a/docs/00_overview/DASHBOARD.md b/docs/00_overview/DASHBOARD.md index fca45e4e..b429677a 100644 --- a/docs/00_overview/DASHBOARD.md +++ b/docs/00_overview/DASHBOARD.md @@ -7,7 +7,7 @@ _Top-level index across MVP1 → GA v1+ as of **2026-06-05**. Click a release na | Release | Theme | Progress | Status | |---|---|---|---| | [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 98 / 98 scoped done | **Complete** | -| [MVP2 / v0.2](MVP2_DASHBOARD.md) | Three-Engine + Real Signals | 22 / 26 scoped done · 18 remaining | **In progress** | +| [MVP2 / v0.2](MVP2_DASHBOARD.md) | Three-Engine + Real Signals | 22 / 25 scoped done · 17 remaining | **In progress** | | MVP3 / v0.3 | Observable | — | **Not yet scoped** | | GA v1 / v1.0 | Production-ready | — | **Not yet scoped** | diff --git a/docs/00_overview/MVP2_DASHBOARD.md b/docs/00_overview/MVP2_DASHBOARD.md index 5ac90677..08d8ff5f 100644 --- a/docs/00_overview/MVP2_DASHBOARD.md +++ b/docs/00_overview/MVP2_DASHBOARD.md @@ -20,15 +20,15 @@ Plan approved; run /impl-execute to ship | Metric | Value | |---|---| -| Filed under MVP2 | **44** folders total (done + specced not-done + idea backlog + bugs) | -| Specced features done | **22 / 26** (85%) — of features *past the idea stage* (those with a spec); the idea backlog below is NOT in this denominator, so 100% ≠ release complete | -| Pending work | **20** items (every not-done feat/infra/chore/bug across all priorities) | +| Filed under MVP2 | **43** folders total (done + specced not-done + idea backlog + bugs) | +| Specced features done | **22 / 25** (88%) — of features *past the idea stage* (those with a spec); the idea backlog below is NOT in this denominator, so 100% ≠ release complete | +| Pending work | **19** items (every not-done feat/infra/chore/bug across all priorities) | | → P0 — do next | **0** unblocking / paying daily cost | | → P1 | **0** high-value, ready when P0 clears | -| → P2 (default) | 17 important to file, not blocking | +| → P2 (default) | 16 important to file, not blocking | | → Backlog | 3 captured for record, not planned | | Open bugs | 7 | -| Legacy "Path to MVP2" | 18 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) | +| Legacy "Path to MVP2" | 17 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) | | Backlog ideas | 2 idea-only feat/infra (not yet scoped into MVP2) | | In flight | 0 feature(s) actively shipping | @@ -67,14 +67,13 @@ Plan approved; run /impl-execute to ship _None._ -### Plan (4) +### Plan (3) | # | Priority | Feature | Type | One-liner | Depends on | Status | |---|---|---|---|---|---|---| | 1 | P2 | [feat_apply_path_normalizer_declaration](planned_features/02_mvp2/feat_apply_path_normalizer_declaration/feature_spec.md) | Feature | The winning normalizer ships as a **structured, language-agnostic manifest** in the config-repo PR — not just prose. | — | — | | 2 | P2 | [feat_query_normalizer_typed_pipeline](planned_features/02_mvp2/feat_query_normalizer_typed_pipeline/feature_spec.md) | Feature | A new typed search-space member `NormalizerPipelineParam` lets a template declare an **ordered list of normalization steps**; the Optuna loop samples over the powerset of declared steps and proposes t | — | — | | 3 | P2 | [chore_demo_seeding_integration_tests_rewrite](planned_features/02_mvp2/chore_demo_seeding_integration_tests_rewrite/feature_spec.md) | Chore | The 9 skipped cases are rewritten to the async "POST + poll-until-terminal" shape, the timeout case is re-homed to the worker layer, a new `AC-Async` case asserts the `running → complete` polling tran | — | [PR #286](https://github.com/SoundMindsAI/relyloop/pull/286) | -| 4 | P2 | [chore_studies_post_arq_spy_fixture](planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/feature_spec.md) | Chore | A reusable `arq_pool_spy` integration fixture that records every `enqueue_job(name, *args)` call, letting studies-POST tests positively assert `spy.calls == []` on rejection and `spy.calls == [("start | — | — | ### Spec (0) @@ -114,8 +113,6 @@ graph LR classDef idea fill:#f1f5f9,stroke:#334155,color:#334155; chore_demo_seeding_integration_tests_rewrite["demo seeding integration tests rewrite"] class chore_demo_seeding_integration_tests_rewrite plan; - chore_studies_post_arq_spy_fixture["studies post arq spy fixture"] - class chore_studies_post_arq_spy_fixture plan; feat_apply_path_normalizer_declaration["apply path normalizer declaration"] class feat_apply_path_normalizer_declaration plan; feat_query_normalizer_typed_pipeline["query normalizer typed pipeline"] diff --git a/docs/00_overview/dashboard.html b/docs/00_overview/dashboard.html index f518855b..f3999d14 100644 --- a/docs/00_overview/dashboard.html +++ b/docs/00_overview/dashboard.html @@ -392,7 +392,7 @@

Releases

MVP2 / v0.2
Three-Engine + Real Signals
-
22 / 26 scoped done · 18 remaining
+
22 / 25 scoped done · 17 remaining
In progress
diff --git a/docs/00_overview/planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/feature_spec.md b/docs/00_overview/implemented_features/2026_06_05_studies_post_arq_spy_fixture/feature_spec.md similarity index 100% rename from docs/00_overview/planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/feature_spec.md rename to docs/00_overview/implemented_features/2026_06_05_studies_post_arq_spy_fixture/feature_spec.md diff --git a/docs/00_overview/planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/idea.md b/docs/00_overview/implemented_features/2026_06_05_studies_post_arq_spy_fixture/idea.md similarity index 100% rename from docs/00_overview/planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/idea.md rename to docs/00_overview/implemented_features/2026_06_05_studies_post_arq_spy_fixture/idea.md diff --git a/docs/00_overview/planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/implementation_plan.md b/docs/00_overview/implemented_features/2026_06_05_studies_post_arq_spy_fixture/implementation_plan.md similarity index 99% rename from docs/00_overview/planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/implementation_plan.md rename to docs/00_overview/implemented_features/2026_06_05_studies_post_arq_spy_fixture/implementation_plan.md index e3ebbef7..52f33fff 100644 --- a/docs/00_overview/planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/implementation_plan.md +++ b/docs/00_overview/implemented_features/2026_06_05_studies_post_arq_spy_fixture/implementation_plan.md @@ -1,7 +1,7 @@ # Implementation Plan — `arq_pool_spy` fixture for POST /api/v1/studies integration tests **Date:** 2026-06-02 -**Status:** Ready for Execution +**Status:** Complete (PR #476, squash-merged `ed85d84` 2026-06-05) **Primary spec:** [`feature_spec.md`](feature_spec.md) **Policy source(s):** [`CLAUDE.md`](../../../../../CLAUDE.md) (Integration Test Mocking Policy, test layers); [`docs/05_quality/testing.md`](../../../../05_quality/testing.md) diff --git a/docs/00_overview/planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/pipeline_status.md b/docs/00_overview/implemented_features/2026_06_05_studies_post_arq_spy_fixture/pipeline_status.md similarity index 55% rename from docs/00_overview/planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/pipeline_status.md rename to docs/00_overview/implemented_features/2026_06_05_studies_post_arq_spy_fixture/pipeline_status.md index 8ed0c601..2c9e7a3c 100644 --- a/docs/00_overview/planned_features/02_mvp2/chore_studies_post_arq_spy_fixture/pipeline_status.md +++ b/docs/00_overview/implemented_features/2026_06_05_studies_post_arq_spy_fixture/pipeline_status.md @@ -1,5 +1,7 @@ # Pipeline Status — `arq_pool_spy` fixture for POST /api/v1/studies tests +**Release:** mvp2 + ## Idea - Status: Complete (preflighted 2026-06-02) - File: idea.md @@ -20,4 +22,10 @@ - Phases covered: 1 of 1 (single-phase) ## Implementation -- Status: Not started +- Status: Complete +- Date: 2026-06-05 +- PR: #476 (squash-merged `ed85d84`) +- CI: all 19 `pr.yml` checks green (smoke skipped — opt-in/off) +- Stories completed: 2 of 2 (1.1, 1.2) +- Cross-model review: GPT-5.5 unreachable in this env → Opus self-review substitution (test-only, zero production diff) — clean +- Gemini Code Assist: 1 theme / 2 line comments — 1 accepted (drop redundant `@pytest.mark.asyncio`), 1 rejected with cited evidence (module `pytestmark` unnecessary under `asyncio_mode = "auto"`) diff --git a/docs/00_overview/mvp2_dashboard.html b/docs/00_overview/mvp2_dashboard.html index a763bf80..808f4291 100644 --- a/docs/00_overview/mvp2_dashboard.html +++ b/docs/00_overview/mvp2_dashboard.html @@ -397,13 +397,13 @@

MVP2 Progress

Specced features done
-
22 / 26
-
85% specced · 44 filed under MVP2
-
+
22 / 25
+
88% specced · 43 filed under MVP2
+
Pending work
-
20
+
19
every not-done feat/infra/chore/bug across all priorities
@@ -425,7 +425,7 @@

MVP2 Progress

P2 (default)
-
17
+
16
important to file, not blocking
@@ -435,7 +435,7 @@

MVP2 Progress

Legacy "Path to MVP2"
-
18
+
17
scoped not-done + bugs + chore-ideas only (excludes feat/infra ideas)
@@ -680,7 +680,7 @@

Spec 0

-

Plan 4

+

Plan 3

@@ -718,19 +718,6 @@

Plan 4

The 9 skipped cases are rewritten to the async "POST + poll-until-terminal" shape, the timeout case is re-homed to the worker layer, a new `AC-Async` case asserts the `running → complete` polling tran
-
- - -
- -
- Chore - P2 - -
-
A reusable `arq_pool_spy` integration fixture that records every `enqueue_job(name, *args)` call, letting studies-POST tests positively assert `spy.calls == []` on rejection and `spy.calls == [("start
- -
@@ -1069,8 +1056,6 @@

Dependency graph (feat_ + infra_)

classDef idea fill:#f1f5f9,stroke:#334155,color:#334155; chore_demo_seeding_integration_tests_rewrite["demo seeding integration tests rewrite"] class chore_demo_seeding_integration_tests_rewrite plan; - chore_studies_post_arq_spy_fixture["studies post arq spy fixture"] - class chore_studies_post_arq_spy_fixture plan; feat_apply_path_normalizer_declaration["apply path normalizer declaration"] class feat_apply_path_normalizer_declaration plan; feat_query_normalizer_typed_pipeline["query normalizer typed pipeline"] @@ -1128,8 +1113,6 @@

Dependency graph (feat_ + infra_)

classDef idea fill:#f1f5f9,stroke:#334155,color:#334155; chore_demo_seeding_integration_tests_rewrite["demo seeding integration tests rewrite"] class chore_demo_seeding_integration_tests_rewrite plan; - chore_studies_post_arq_spy_fixture["studies post arq spy fixture"] - class chore_studies_post_arq_spy_fixture plan; feat_apply_path_normalizer_declaration["apply path normalizer declaration"] class feat_apply_path_normalizer_declaration plan; feat_query_normalizer_typed_pipeline["query normalizer typed pipeline"] diff --git a/state.md b/state.md index 09ec81cd..b8ef40ec 100644 --- a/state.md +++ b/state.md @@ -16,8 +16,8 @@ MVP1 (v0.1) **shipped** — all six differentiators live (Bayesian/TPE optimizer ## Current branch / execution context -- **Branch:** `main` (PR #474 `chore_ubi_reader_search_after_pagination` just merged `d9afbce`, 2026-06-05). All 19 `pr.yml` checks green (smoke skipped — opt-in/off). -- **Active feature:** None in flight — the 3-feature queue is fully shipped: #1 `bug_judgment_header_omits_click_bucket` (PR #470), #2 `feat_fts_rank_ordering` (PR #472), #3 `chore_ubi_reader_search_after_pagination` (PR #474). Pull the next item from the MVP2 backlog (run `/pipeline status`). +- **Branch:** `main` (PR #476 `chore_studies_post_arq_spy_fixture` just merged `ed85d84`, 2026-06-05). All 19 `pr.yml` checks green (smoke skipped — opt-in/off). +- **Active feature:** Mid a 3-chore queue (2026-06-05 afternoon): #1 `chore_studies_post_arq_spy_fixture` **shipped** (PR #476); **next #2 `chore_demo_seeding_integration_tests_rewrite`** (plan-stage, GPT-5.5-reviewed plan exists → executes via impl-execute); #3 `chore_pr_yml_parallelize_backend_job` (idea-stage → full pipeline). GPT-5.5 unreachable in this env → Opus self-review substitution. - **Alembic head:** `0023_proposals_superseded_status` (unchanged — `feat_fts_rank_ordering` is no-migration; head last moved by `feat_overnight_final_solution_phase3` PR #457). - **Python:** 3.13. **Frontend stack:** Next 16 (App Router + Turbopack), React 19, Tailwind 4 (CSS-first), Vitest 4, ESLint 9 (flat), TypeScript 6, Playwright (chromium, single worker) for E2E. - **Coverage gates:** backend 80% (`fail_under` in pyproject), UI vitest + tsc + ESLint + Next build, plus a full-stack smoke E2E job. Live pass counts: see the latest `pr.yml` run (the historical per-feature counts moved to `state_history.md`). @@ -26,17 +26,17 @@ MVP1 (v0.1) **shipped** — all six differentiators live (Bayesian/TPE optimizer Detail + reasoning for each is in [`state_history.md`](state_history.md). +- **2026-06-05** — `chore_studies_post_arq_spy_fixture` (PR #476, squash-merged `ed85d84`). **`arq_pool_spy` integration fixture for studies-POST enqueue assertions.** Test-infra only — **zero production diff, no migration** (head stays `0023`). 2 stories / 1 epic: (1.1) `SpyArqPool` recording double (flattened `(name, *args)`, truthy sentinel return) + `install_arq_pool_spy(app)` contextmanager (captures prior `app.state.arq_pool` via `_UNSET`, restores exactly — `delattr` when originally unset, reassign otherwise) + `arq_pool_spy` pytest-asyncio fixture (depends on `async_client` for install-after-lifespan ordering, NOT autouse) in `integration/conftest.py` + 5 unit tests; (1.2) wired the fixture into 13 studies-POST tests — 10 rejection paths assert `spy.calls == []`, 3 success paths assert `spy.calls == [("start_study", )]` (proves the spy not the real pool received the enqueue → confirms install ordering) — plus 2 AC-3 restore tests each forcing their own precondition so both restore branches run deterministically. Cross-model: **GPT-5.5 unreachable → Opus self-review substituted** (test-only, converged clean). Gemini 1 theme / 2 line comments: 1 accepted (drop redundant `@pytest.mark.asyncio` — redundant under `asyncio_mode = "auto"`), 1 rejected w/ evidence (module `pytestmark` unnecessary; `test_ubi_reader.py` has 17 async tests + 0 markers). 5 unit + 13 wired + 2 restore integration; backend unit 2476. All 19 `pr.yml` checks green. `docs/05_quality/testing.md` note added. Finalization bundled the dashboard + public-roadmap regen (no extra PR). - **2026-06-05** — `chore_ubi_reader_search_after_pagination` (PR #474, squash-merged `d9afbce`). **Exact full-traffic UBI aggregation via cursor pagination (`scan_all`).** Replaces `UbiReader`'s single-page `search_batch` (a silent 10k-row sample) with a `scan_all` loop walking the full `ubi_events`/`ubi_queries` stream behind the `SearchAdapter` Protocol, so dense (>10k-event) clusters get exact judgment aggregation. **Backend + tests only, no migration** (head stays `0023`). 5 stories / 3 epics: (1.1) Protocol surface — `ScanPage(hits, cursor)` + `scan_all(target, body, *, page_size, cursor, fl, request_id)` + `close_scan(cursor, …)`, opaque round-tripped cursor; (2.1) `ElasticAdapter` ES+OpenSearch via PIT + `search_after` over `[{timestamp:asc},{_shard_doc:asc}]` — PIT id rotation into the cursor, engine-branched open/close endpoints + bodies (ES `/_pit` `{id}` / OpenSearch `/_search/point_in_time` `{pit_id:[…]}`, both unindexed-DELETE per the read-only invariant), narrow 405/501/400-unsupported fallback (configured `ubi_no_pit_tiebreaker_field` else sampled+WARN, never `_id` sort), adapter-owned pagination keys stripped before BOTH PIT and no-PIT construction (incl. `pit` per P5-A1), best-effort cleanup on exception (P3-A2) + terminal (P4-A3), cursor-before-fold (P1-B2); (2.2) `SolrAdapter` via `cursorMark` over POST `/select` form body (large `{!terms f=query_id}` fq → body not URL, P1-B1), uniqueKey-terminated sort, `start`/`rows`/`cursorMark`/`sort` stripped (P4-A2), `close_scan` no-op; (3.1) `UbiReader` loops `scan_all` per scan, folds incrementally, enforces an **exact** ceiling (`min(ES_MAX_RESULT_WINDOW, remaining)` per page + final-page slice), chunks `query_id` by count AND encoded byte length (`_chunk_query_ids` engine-aware fragment size), closes the cursor in `finally`, emits `ubi_reader_scan_truncated` only on genuine truncation; (3.2) 5 non-secret `Settings` ceiling fields injected by worker + dispatcher (readiness stays at defaults — probe-only, documented invariant). **Cross-model: GPT-5.5 unreachable in this env → Opus self-review substituted per `feat_fts_rank_ordering` precedent** — 0 High/1 Med/2 Low; F2 (truncation-WARN false-positive on terminal exact-fill) + F3 (`scanned` = inspected not kept rows) fixed; F1 (readiness ceiling injection) adjudicated as documented-invariant (provably dead — readiness only probes — + plan §3.2-resolved; injecting broke 6 tests). Gemini 3 Med: 2 accepted (`isinstance(error, dict)` guard on 4 sites + 2 regression tests), 1 rejected with cited counter-evidence (chunker "O(N²)" is O(N×max_count) bounded by `max_count`; the suggested byte approximation is numerically wrong — 27≠29-byte wrapper, missed the `, ` separator — so it would breach the hard byte ceiling). CI surfaced 2 unrelated failures, both fixed inline: the integration UBI stub mocked the old `search_batch` (added `scan_all`/`close_scan` mocks serving the same data); **also fixed a pre-existing ~50% flake in `feat_fts_rank_ordering`'s `test_no_q_does_not_rank`** — `_seed_clusters_rank` created both rows in one transaction so `created_at` (`func.now()` = transaction time) tied, falling to the random-UUID `id DESC` tiebreak; stamped explicit distinct timestamps. 18 ES + 15 Solr + 4 ScanPage + 33 reader (AC-5/6/9/12/13/14 + P1-B2 + truncation accounting) + 5 no-writes (PIT allowlist) unit; backend unit 2469. All 19 `pr.yml` checks green (smoke skipped — opt-in/off). Finalization bundled the dashboard + public-roadmap regen (no extra PR). - **2026-06-05** — `feat_fts_rank_ordering` (PR #472, squash-merged `f970c05`). **Rank-order FTS results by `ts_rank` when `?q=` is active.** When a list request carries `?q=` and no explicit `?sort=`, the 6 searchable endpoints (clusters, studies, query_sets, query_templates, judgment_lists, conversations) now order by relevance — `floor(ts_rank*1e6) DESC, id DESC` — instead of `created_at`; `?sort=` overrides (rank is the implicit default, not a column sort-key). **Backend + small frontend, no migration** (head stays `0023`). Keyset stays exact: the cursor encodes the **same** int `rank_bucket` the ORDER BY uses, so the existing `parsed=None` 2-tuple keyset predicate is exact (rows share a bucket → tiebreak `id DESC`, UUIDv7 ≈ newest-first); the computed bucket is stashed as a transient `_fts_rank_bucket` for the router's next cursor (read via `rank_bucket_of`). Helpers `rank_bucket_expr`/`rank_active`/`rank_bucket_of`/`rows_with_rank` in `_fts.py` + an additive rank branch in 6 repos + 6 routers (only fires when `q` present & no sort → non-search listing byte-identical) + a "Sorted by relevance" `` pill + `fts.relevance_sort` glossary. **Cross-model: GPT-5.5 unreachable in this env (no key, egress 403) → Opus self-review substituted per operator decision** (spec/plan §13/§8). Gemini 6 findings (one per router) ALL accepted (`a48a6d3`): a stale datetime cursor reused on the rank path now 422s instead of 500ing on the int rank_bucket comparison. CI run-1 tripped ruff F841 (unused oracle local, added after the last local lint) — fixed inline. 10 unit (in-memory keyset oracle proving no-skip/no-dupe locally) + 11-case DB integration matrix (relevance order, cursor pagination exactness, explicit-sort override, no-q regression, tampered + stale-datetime cursor 422) + 4 vitest pill; full vitest 1243. All 19 `pr.yml` checks green. Finalization bundled the dashboard + public-roadmap regen (no extra PR). - **2026-06-05** — `bug_judgment_header_omits_click_bucket` (PR #470, squash-merged `66d1873`). **Judgment-list header renders the `click` (UBI) source bucket.** The detail header showed only `LLM / Human`, omitting the `click` bucket, so the displayed source terms didn't sum to the total (and the component's "renders all three buckets" doc-claim was false). **Frontend-only — no backend, no migration** (head stays `0023`). 2 stories / 1 epic: (1.1) render a third slash-joined term `source_breakdown.click`, relabel `LLM / Human / Clicks`, add a source-of-truth comment pointing at backend `_SourceBreakdown`, and add an `InfoTooltip` on the label reusing the existing `judgment.source.click` glossary key (FR-4 implemented, no new key); presentational-only, no fetch/types:gen; (1.2) extend the real-backend `ubi-source-filter.spec.ts` to assert `header-breakdown` shows the three-term breakdown on a pure-CTR list. 9 vitest cases (3 existing chip + 6 new) + the E2E; 1239 vitest green. Gemini 2 medium: 1 accepted (`570861d`: locale-robust E2E digit-parse over `toLocaleString()` comparison), 1 rejected-with-evidence (optional-chain the `click` field — it's non-optional in the TS `_SourceBreakdown` type, the sibling `llm`/`human` don't guard, and the Compose deploy is atomic, so the guard would mask a contract violation). Final GPT-5.5 skipped (frontend-only, ≤3 files). **Finalization ran the in-line dashboard + public-roadmap regen (per `f4b5ed0`) so the dashboards + relyloop.com roadmap stay fresh — bundled into the finalization PR, no 3rd PR.** All 19 `pr.yml` checks green. - **2026-06-05** — `bug_baseline_phase_test_isolation` (PR #466, squash-merged `6298e77`). **Hermetic baseline-wait unit tests via lazy settings read.** `_compute_baseline_wait_s` (`backend/workers/orchestrator.py`) called `get_settings()` unconditionally, so the three explicit-timeout `TestComputeBaselineWaitS` cases only passed when an earlier test module had already seeded `DATABASE_URL_FILE`/`POSTGRES_PASSWORD_FILE` — they failed standalone (`3 failed, 1 passed`). **Backend test-only — no migration** (head stays `0023`). 2 stories / 1 epic: (1.1) defer the `get_settings()` read into the missing/falsy-`trial_timeout_s` branch so explicit-timeout callers never construct `Settings` (return values unchanged, falsy-fallback semantics preserved); (1.2) add an autouse `_settings_env_and_restore` fixture (seeds the secret env vars at `/dev/null` + clears the canonical `get_settings` lru_cache — not `orch.get_settings`, which `test_missing_trial_timeout_uses_settings_default` monkeypatches) so the module is hermetic regardless of collection order, plus a `test_explicit_timeout_does_not_read_settings` regression that fails on pre-fix code and passes post-fix. Standalone with secrets unset: 14 passed; full unit suite 2400 passed. No Gemini findings; final GPT-5.5 skipped (≤40 LOC, test-only). All 19 `pr.yml` checks green. -- **2026-06-05** — `chore_cluster_detail_rung_badge` (PR #464, squash-merged `3e03ce7`). **Cluster-detail UBI readiness card with rung badge.** A new `ClusterDetailUbiReadinessCard` on `/clusters/[id]` (between the action bar and indices card) lets the operator see a cluster's UBI readiness rung without opening the generate-judgments dialog. **Frontend-only — no backend, no migration** (head stays `0023`). 8 stories / 1 epic, executed 8→1→…→7: (S8) shared `useUbiReadiness` gains `placeholderData: keepPreviousData` so the rung persists across `(query_set_id, target)` edits without a skeleton flash (no-op for the dialog consumer); (S1) card scaffold + `` on the title (reachable in every state); (S2) query-set picker (`limit=50`, `has_more` "Browse all" footer, empty-state Link, explicit **Clear** button since Radix `` + 200ms `useDebouncedValue`; (S4) separate `limit=2` auto-seed probe + once-locked `useEffect` (seeds when `rows.length===1 && !has_more && target_filter` set; locks on first success **or** error so a later refetch can't overwrite operator input — same `set-state-in-effect` disable precedent as `studies/page.tsx:56`); (S5) `useUbiReadiness` gated on `pickerReady` + dual leak gate (`targetRaw.trim()` for instant hide + debounced `target`) → `` + relocated `` (deleted from `ClusterDetailSummary`); (S6) first-fetch skeleton (gated on `data==null` so it never replaces a placeholderData-preserved badge), unified 404/503 fallback caption, inline error + `invalidateQueries` retry; (S7) 13-case vitest (MSW network mocking + real `QueryClientProvider` so AC-8 genuinely exercises `placeholderData`; AC-9 reads the card source and asserts zero inline `rung_[0-3]` literals) + summary regression (chip absent) + one gated real-backend Playwright spec + demo-ubi surface #3 re-anchored to the new placement. CI: prettier flagged the test file on run 1 (local `pnpm lint` doesn't run `prettier --check`) — fixed inline (`6b88e72`). Gemini 2 medium findings, both accepted+fixed (`b0063dd`): instant-clear via `targetRaw.trim().length>0` in `pickerStateValid` (spec "hides immediately"); `useUbiReadiness` queryFn null-param guard against a manual-refetch `?query_set_id=null`. Cross-model final GPT-5.5 skipped (frontend-only, ≤8 files, no studies/judgments/adapter/migration surface). 1233 vitest green. All 19 `pr.yml` checks green. -_(older entries — full narrative in [`state_history.md`](state_history.md): `feat_ubi_llm_study_comparison` PR #461, `feat_query_normalization_tuning` PR #459, `feat_overnight_final_solution_phase3` PR #457, `feat_study_wizard_inline_judgment_generation` PR #453, `feat_walkthrough_video_cursor_captions` PR #451, `feat_website_walkthrough_guides` PR #448, `feat_proposal_full_param_space_view` PR #446, `feat_overnight_studies_summary_card` PR #444, `feat_overnight_final_solution_phase2` PR #442, `feat_overnight_final_solution` PR #440, `feat_studies_list_trial_convergence_columns` PR #438, `feat_list_count_columns` PR #436, `infra_generated_artifact_freshness_gate` PR #433, `chore_scorecard_pin_deps_postcss` PR #430, `bug_llm_capability_cache_no_refresh` PR #426, `infra_smoke_reseed_runtime_budget` PR #424, `feat_studies_convergence_visibility` PR #421/#422, `bug/cli-seed-ubi-missing-engine-type` PR #419, `chore_template_library_expansion` PR #416, `infra_solr_smoke_stability` PR #383, `infra_solr_ci_readiness` Phase 1 PR #367, MVP2 backlog batch PR #364, `feat_study_convergence_indicator` PR #352, `feat_overnight_autopilot` PR #343, `infra_adapter_solr` PR #336, …)_ +_(older entries — full narrative in [`state_history.md`](state_history.md): `chore_cluster_detail_rung_badge` PR #464, `feat_ubi_llm_study_comparison` PR #461, `feat_query_normalization_tuning` PR #459, `feat_overnight_final_solution_phase3` PR #457, `feat_study_wizard_inline_judgment_generation` PR #453, `feat_walkthrough_video_cursor_captions` PR #451, `feat_website_walkthrough_guides` PR #448, `feat_proposal_full_param_space_view` PR #446, `feat_overnight_studies_summary_card` PR #444, `feat_overnight_final_solution_phase2` PR #442, `feat_overnight_final_solution` PR #440, `feat_studies_list_trial_convergence_columns` PR #438, `feat_list_count_columns` PR #436, `infra_generated_artifact_freshness_gate` PR #433, `chore_scorecard_pin_deps_postcss` PR #430, `bug_llm_capability_cache_no_refresh` PR #426, `infra_smoke_reseed_runtime_budget` PR #424, `feat_studies_convergence_visibility` PR #421/#422, `bug/cli-seed-ubi-missing-engine-type` PR #419, `chore_template_library_expansion` PR #416, `infra_solr_smoke_stability` PR #383, `infra_solr_ci_readiness` Phase 1 PR #367, MVP2 backlog batch PR #364, `feat_study_convergence_indicator` PR #352, `feat_overnight_autopilot` PR #343, `infra_adapter_solr` PR #336, …)_ ## In flight -- **Nothing in flight** — the 2026-06-05 3-feature queue is fully shipped (`bug_judgment_header_omits_click_bucket` PR #470, `feat_fts_rank_ordering` PR #472, `chore_ubi_reader_search_after_pagination` PR #474). Pull the next item from the MVP2 backlog. -- **Plan-stage, `/impl-execute`-ready (no gates):** `chore_studies_post_arq_spy_fixture` (PR #413 pair) + `chore_demo_seeding_integration_tests_rewrite` (PR #286, DB-only integration — CI-only verification). **Note (drift correction 2026-06-05):** the two PR #364 normalizer siblings `feat_apply_path_normalizer_declaration` + `feat_query_normalizer_typed_pipeline` are **NOT** auto-executable — `feat_query_normalization_tuning` Phase 1 (PR #459) only cleared the G-1 *dependency* gate; both still carry **product gates** (apply-path: G-2 operator-friction evidence; typed-pipeline: open Q-1 Product decision + a "Do NOT `/impl-execute`" design-ahead banner). They stay product-gated/design-ahead until those clear. (`chore_arq_pool_aclose_deprecation`, `chore_cluster_detail_rung_badge`, `bug_baseline_phase_test_isolation`, `bug_judgment_header_omits_click_bucket`, `chore_ubi_reader_search_after_pagination` shipped — PRs #463/#464/#466/#470/#474.) +- **3-chore queue (2026-06-05 afternoon):** #1 `chore_studies_post_arq_spy_fixture` **shipped** (PR #476); **#2 `chore_demo_seeding_integration_tests_rewrite`** (plan-stage → `/impl-execute`) next; #3 `chore_pr_yml_parallelize_backend_job` (idea-stage → full pipeline) queued. (Earlier 3-feature queue fully shipped: PR #470/#472/#474.) +- **Plan-stage, `/impl-execute`-ready (no gates):** `chore_demo_seeding_integration_tests_rewrite` (PR #286, DB-only integration — CI-only verification). **Note (drift correction 2026-06-05):** the two PR #364 normalizer siblings `feat_apply_path_normalizer_declaration` + `feat_query_normalizer_typed_pipeline` are **NOT** auto-executable — `feat_query_normalization_tuning` Phase 1 (PR #459) only cleared the G-1 *dependency* gate; both still carry **product gates** (apply-path: G-2 operator-friction evidence; typed-pipeline: open Q-1 Product decision + a "Do NOT `/impl-execute`" design-ahead banner). They stay product-gated/design-ahead until those clear. (`chore_arq_pool_aclose_deprecation`, `chore_cluster_detail_rung_badge`, `bug_baseline_phase_test_isolation`, `bug_judgment_header_omits_click_bucket`, `chore_ubi_reader_search_after_pagination`, `chore_studies_post_arq_spy_fixture` shipped — PRs #463/#464/#466/#470/#474/#476.) ## Queued (priority-ordered by dashboard / dep graph) @@ -46,7 +46,7 @@ _(older entries — full narrative in [`state_history.md`](state_history.md): `f - **Headliners (idea-stage):** `feat_fts_rank_ordering`, `feat_query_normalization_tuning`, `feat_study_convergence_indicator`, `feat_ubi_llm_study_comparison` (side-by-side UBI-vs-LLM study comparison view + deferred cluster-detail rung badge, P2 — Phase 2 split out of `feat_demo_ubi_study_comparison`), plus the Phase-2/3 split-outs from the 2026-05-31 planning batch still pending (`feat_query_normalizer_typed_pipeline`, `feat_apply_path_normalizer_declaration`). `feat_overnight_autopilot` shipped (PR #343, 2026-05-31); `feat_overnight_studies_summary_card` shipped (PR #444, 2026-06-04). - **Bugs held for MVP2:** `bug_chat_long_conversation_truncation` (investigation `bug_fix.md` exists; pullable forward but deferred for scope discipline — latency-of-impact is zero today), `bug_webhook_concurrent_merge_race_timing_sensitive`, `bug_seed_meaningful_demos_silent_bulk_errors`. -- **Chores/infra:** `chore_auto_followup_parent_advisory_lock`, `chore_demo_seeding_integration_tests_rewrite`, `chore_studies_post_arq_spy_fixture`, `chore_template_library_expansion`, `chore_ubi_hybrid_template_render` (P3, vestigial-template contract cleanup — spun out of `feat_ubi_judgments`), `infra_arq_subprocess_test`. (`chore_ubi_reader_search_after_pagination` shipped — PR #474.) +- **Chores/infra:** `chore_auto_followup_parent_advisory_lock`, `chore_demo_seeding_integration_tests_rewrite`, `chore_template_library_expansion`, `chore_ubi_hybrid_template_render` (P3, vestigial-template contract cleanup — spun out of `feat_ubi_judgments`), `infra_arq_subprocess_test`. (`chore_ubi_reader_search_after_pagination` PR #474, `chore_studies_post_arq_spy_fixture` PR #476 shipped.) **Other buckets:** `03_mvp3/` (Observable — includes `infra_optuna_orphan_reaper`, deferred from MVP1 per spec §11 operational tolerance), `04_ga/`, `99_backlog/` (4 defer-until-incident items), `00_unsure/` (`bug_seed_meaningful_demos_silent_bulk_errors`). diff --git a/state_history.md b/state_history.md index fc773439..3ad11f55 100644 --- a/state_history.md +++ b/state_history.md @@ -4,6 +4,20 @@ --- +### `chore_studies_post_arq_spy_fixture` — `arq_pool_spy` integration fixture for studies-POST enqueue assertions (PR #476, 2026-06-05) + +**What shipped.** The studies-POST integration tests asserted "no studies row inserted" on rejection but couldn't prove the handler also didn't *enqueue* `start_study`, and the success tests couldn't prove the spy (not the real Redis pool) received the enqueue. This adds an opt-in `arq_pool_spy` fixture that closes both gaps. **Test-infra only — zero production diff, no migration** (Alembic head stays `0023`). 2 stories / 1 epic. First of a 3-chore afternoon queue. + +**The fixture (Story 1.1, `integration/conftest.py`).** `SpyArqPool.enqueue_job` records each call as a flattened `(name, *args)` tuple (so the handler's `enqueue_job("start_study", study_id)` records `("start_study", study_id)`) and returns a truthy sentinel mirroring `ArqRedis.enqueue_job`'s "returns a Job on accept" contract. `install_arq_pool_spy(app)` is a contextmanager that captures the prior `app.state.arq_pool` via a `_UNSET` sentinel and restores it exactly on exit — `delattr` when the attr was originally unset (the Redis-down boot case), reassign otherwise. The `arq_pool_spy` pytest-asyncio fixture depends on `async_client` so it installs **after** the lifespan built (or skipped) the real pool — the install-after-lifespan ordering is the load-bearing detail (a success-path recorded call proves the spy, not the real pool, got the enqueue). NOT autouse, so tests that don't request it are unaffected. 5 unit tests (`test_arq_pool_spy.py`) lock the recording shape + truthy return + coroutine contract with no DB. + +**The assertions (Story 1.2, `test_studies_api.py`).** 10 rejection-path tests (400 `INVALID_SEARCH_SPACE`, 422 mismatch/overlap ×4, 404 not-found ×4) gain the param + `assert spy.calls == []`. 3 success-path tests gain it + `assert spy.calls == [("start_study", )]`. Plus 2 AC-3 restore tests (`_restores_existing_pool` / `_restores_unset_attr`) that each FORCE their own precondition (inject a sentinel pool / delete the attr) so both restore branches run deterministically every CI run regardless of whether Redis was up at boot, and restore the original boot state in `finally` so the real pool isn't stranded from lifespan shutdown's `aclose()`. Counts verified: 13 params, 10 `== []`, 3 `== [("start_study", …)]`. + +**Cross-model + Gemini.** GPT-5.5 unreachable in this env → Opus self-review substituted (test-only, zero production diff, converged clean). Gemini posted one theme across two line comments on the unit test's asyncio markers. **Accepted** the line-53 comment (remove the lone `@pytest.mark.asyncio` — genuinely redundant under the project's `asyncio_mode = "auto"`, pyproject.toml:216). **Rejected** the line-18 comment (add a module-level `pytestmark = pytest.mark.asyncio`) with cited counter-evidence: Gemini's premise *"especially if asyncio_mode = strict"* is false — it's `auto`, so async tests are auto-collected with no marker and none are silently skipped; the dominant convention (`test_ubi_reader.py`: 17 async tests, 0 asyncio markers) relies on auto-collection, so a module marker would be redundant and divergent. Net fix removed the decorator + the now-unused `import pytest` and added a comment documenting the auto-mode convention — resolving the inconsistency toward the codebase norm. + +**Verification.** 5 unit + 13 wired + 2 restore integration tests; backend unit 2476 passing; `ruff` + `mypy --strict` clean across 600 files; all 19 `pr.yml` checks green (smoke skipped). The integration lane (Postgres + ES service containers) is the real validator and ran green. `docs/05_quality/testing.md` gained a one-paragraph note documenting `arq_pool_spy` under the integration mocking policy. + +**Context.** Feature #1 of the operator's second 3-chore queue (arq-spy-fixture → demo-seeding-rewrite → pr.yml-parallelize), executed plan-stage onward via `/impl-execute` (spec + plan were GPT-5.5-converged 2026-06-02, so the unreachable GPT-5.5 wasn't a blocker). The "other enqueueing endpoints" generalization is a spec §3 out-of-scope follow-up, not a deferred phase. + ### `chore_ubi_reader_search_after_pagination` — exact full-traffic UBI aggregation via cursor pagination (`scan_all`) (PR #474, 2026-06-05) **What shipped.** `UbiReader` read UBI signals with a single `search_batch` call clamped to the engine's 10k `max_result_window` — on dense clusters (>10k events in a window) that silently sampled the first 10k rows, biasing CTR/dwell ratings. This replaces the single-page read with a generic **cursor-scan** looped behind the `SearchAdapter` Protocol, so judgment generation aggregates the *full* event/query stream. **Backend + tests only, no migration** (Alembic head stays `0023`). Spun out of `feat_ubi_judgments` (P2). 5 stories / 3 epics, executed 1.1 → 2.1 → 2.2 → 3.2(folded) → 3.1. diff --git a/website/docs/roadmap.md b/website/docs/roadmap.md index ed86b4c8..241dbc6e 100644 --- a/website/docs/roadmap.md +++ b/website/docs/roadmap.md @@ -202,7 +202,7 @@ - 🟡 [Arq Subprocess Test](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/infra_arq_subprocess_test) - 🟡 [Smoke Fork PR Secret Skip](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/infra_smoke_fork_pr_secret_skip) -??? note "Maintenance & fixes (20)" +??? note "Maintenance & fixes (19)" - ✅ [Backend Suite Nondeterministic Caplog Isolation](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/implemented_features/2026_06_01_bug_backend_suite_nondeterministic_caplog_isolation) · [#364](https://github.com/SoundMindsAI/relyloop/pull/364) - ✅ [Contract Allowlists Outdated After Mvp2 Features](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/implemented_features/2026_06_01_bug_contract_allowlists_outdated_after_mvp2_features) · [#364](https://github.com/SoundMindsAI/relyloop/pull/364) @@ -221,7 +221,6 @@ - 🟡 [Seed Meaningful Demos Silent Bulk Errors](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/bug_seed_meaningful_demos_silent_bulk_errors) - 🟡 [Solr Post Pipeline Followups](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/chore_solr_post_pipeline_followups) - 🟡 [Studies Detail Vitest Intermittent Timeout](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/bug_studies_detail_vitest_intermittent_timeout) - - 🟡 [Studies Post Arq Spy Fixture](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/chore_studies_post_arq_spy_fixture) - 🟡 [UBI Hybrid Template Render](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/chore_ubi_hybrid_template_render) - 🟡 [Webhook Concurrent Merge Race Timing Sensitive](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/bug_webhook_concurrent_merge_race_timing_sensitive)