diff --git a/docs/00_overview/DASHBOARD.md b/docs/00_overview/DASHBOARD.md index b78ad09b..9db8f2c0 100644 --- a/docs/00_overview/DASHBOARD.md +++ b/docs/00_overview/DASHBOARD.md @@ -7,7 +7,7 @@ _Top-level index across MVP1 → GA v1+ as of **2026-06-01**. Click a release na | Release | Theme | Progress | Status | |---|---|---|---| | [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 94 / 94 scoped done | **Complete** | -| [MVP2 / v0.2](MVP2_DASHBOARD.md) | Three-Engine + Real Signals | 6 / 10 scoped done · 22 remaining | **In progress** | +| [MVP2 / v0.2](MVP2_DASHBOARD.md) | Three-Engine + Real Signals | 7 / 10 scoped done · 21 remaining | **In progress** | | MVP3 / v0.3 | Observable | — | **Not yet scoped** | | GA v1 / v1.0 | Production-ready | — | **Not yet scoped** | diff --git a/docs/00_overview/MVP2_DASHBOARD.md b/docs/00_overview/MVP2_DASHBOARD.md index 7d48e7f7..2a25088c 100644 --- a/docs/00_overview/MVP2_DASHBOARD.md +++ b/docs/00_overview/MVP2_DASHBOARD.md @@ -21,26 +21,27 @@ Plan approved; run /impl-execute to ship | Metric | Value | |---|---| | Filed under MVP2 | **34** folders total (done + specced not-done + idea backlog + bugs) | -| Specced features done | **6 / 10** (60%) — of features *past the idea stage* (those with a spec); the idea backlog below is NOT in this denominator, so 100% ≠ release complete | -| Pending work | **28** items (every not-done feat/infra/chore/bug across all priorities) | +| Specced features done | **7 / 10** (70%) — of features *past the idea stage* (those with a spec); the idea backlog below is NOT in this denominator, so 100% ≠ release complete | +| Pending work | **27** items (every not-done feat/infra/chore/bug across all priorities) | | → P0 — do next | **0** unblocking / paying daily cost | | → P1 | **0** high-value, ready when P0 clears | -| → P2 (default) | 24 important to file, not blocking | +| → P2 (default) | 23 important to file, not blocking | | → Backlog | 4 captured for record, not planned | | Open bugs | 9 | -| Legacy "Path to MVP2" | 22 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) | +| Legacy "Path to MVP2" | 21 items — scoped-not-done + bugs + chore-ideas only (excludes feat/infra ideas) | | Backlog ideas | 6 idea-only feat/infra (not yet scoped into MVP2) | | In flight | 0 feature(s) actively shipping | ## Pipeline -### Done (6) +### Done (7) | Feature | Type | One-liner | Depends on | Status | |---|---|---|---|---| | [feat_contextual_help_mvp2](implemented_features/2026_05_15_feat_contextual_help_mvp2/idea.md) | Feature | Phase 1 covered the create-study modal + study-detail surface — the steepest onboarding cliff. Two clusters of surfaces remain that a relevance engineer encounters after running their first study: | — | [PR #124](https://github.com/SoundMindsAI/relyloop/pull/124) merged 2026-05-15 | | [feat_demo_ubi_study_comparison](implemented_features/2026_05_30_feat_demo_ubi_study_comparison/feature_spec.md) | Feature | After this feature, the home-button reseed (and the | — | [PR #320](https://github.com/SoundMindsAI/relyloop/pull/320) merged 2026-05-30 | | [feat_overnight_autopilot](implemented_features/2026_05_31_feat_overnight_autopilot/feature_spec.md) | Feature | an operator can (a) discover the overnight path while creating a study because the wizard control is reframed as a labeled "🌙 Run overnight (compound automatically)" toggle with explicit copy about th | — | [PR #343](https://github.com/SoundMindsAI/relyloop/pull/343) merged 2026-05-31 | +| [feat_study_convergence_indicator](implemented_features/2026_06_01_feat_study_convergence_indicator/feature_spec.md) | Feature | Every completed study carries a plain-language **convergence verdict** — `converged` / `still_improving` / `too_few_trials` — backed by a best-metric-so-far curve. | — | [PR #352](https://github.com/SoundMindsAI/relyloop/pull/352) merged 2026-06-01 | | [feat_study_sub_warmup_guard](implemented_features/2026_05_29_feat_study_sub_warmup_guard/feature_spec.md) | Feature | A non-blocking inline warning appears under the `max_trials` input whenever the derived preset is `custom` AND `max_trials < STUDIES_TPE_WARMUP_FLOOR (= 50)`, naming Focused/Standard as one-click reme | — | [PR #316](https://github.com/SoundMindsAI/relyloop/pull/316) merged 2026-05-29 | | [feat_ubi_judgments](implemented_features/2026_05_29_feat_ubi_judgments/feature_spec.md) | Feature | Operators with the OpenSearch / ES UBI plugin installed (today; Solr's first-party `solr.UBIComponent` lights up with the sibling `infra_adapter_solr` MVP2 release) can derive judgments from real clic | — | [PR #317](https://github.com/SoundMindsAI/relyloop/pull/317) merged 2026-05-29 | | [infra_adapter_solr](implemented_features/2026_05_31_infra_adapter_solr/feature_spec.md) | Infra | A single `SolrAdapter` implements the `SearchAdapter` Protocol against Apache Solr 9.x and 10.x (both SolrCloud and standalone), pivoting on a capability probe at construction time. | `feat_ubi_judgments` | [PR #336](https://github.com/SoundMindsAI/relyloop/pull/336) merged 2026-05-31 | @@ -49,14 +50,13 @@ Plan approved; run /impl-execute to ship _None._ -### Plan (4) +### Plan (3) | # | Priority | Feature | Type | One-liner | Depends on | Status | |---|---|---|---|---|---|---| | 1 | P2 | [feat_query_normalization_tuning](planned_features/02_mvp2/feat_query_normalization_tuning/feature_spec.md) | Feature | A template that opts in by declaring `query_normalizer` as a Categorical param gets the Optuna loop deciding empirically — on the operator's judgment set — whether lowercasing, trimming, or contractio | — | — | -| 2 | P2 | [feat_study_convergence_indicator](planned_features/02_mvp2/feat_study_convergence_indicator/feature_spec.md) | Feature | Every completed study carries a plain-language **convergence verdict** — `converged` / `still_improving` / `too_few_trials` — backed by a best-metric-so-far curve. | — | [PR #316](https://github.com/SoundMindsAI/relyloop/pull/316) | -| 3 | P2 | [feat_ubi_llm_study_comparison](planned_features/02_mvp2/feat_ubi_llm_study_comparison/feature_spec.md) | Feature | A single dedicated route `/studies/compare?a={id}&b={id}` renders the two studies side-by-side with a per-panel diff column: a sentence-level digest-narrative diff, a best-trial parameter table with s | — | [PR #320](https://github.com/SoundMindsAI/relyloop/pull/320) | -| 4 | P2 | [chore_demo_seeding_integration_tests_rewrite](planned_features/02_mvp2/chore_demo_seeding_integration_tests_rewrite/feature_spec.md) | Chore | The 9 skipped cases are rewritten to the async "POST + poll-until-terminal" shape, the timeout case is re-homed to the worker layer, a new `AC-Async` case asserts the `running → complete` polling tran | — | [PR #286](https://github.com/SoundMindsAI/relyloop/pull/286) | +| 2 | P2 | [feat_ubi_llm_study_comparison](planned_features/02_mvp2/feat_ubi_llm_study_comparison/feature_spec.md) | Feature | A single dedicated route `/studies/compare?a={id}&b={id}` renders the two studies side-by-side with a per-panel diff column: a sentence-level digest-narrative diff, a best-trial parameter table with s | — | [PR #320](https://github.com/SoundMindsAI/relyloop/pull/320) | +| 3 | P2 | [chore_demo_seeding_integration_tests_rewrite](planned_features/02_mvp2/chore_demo_seeding_integration_tests_rewrite/feature_spec.md) | Chore | The 9 skipped cases are rewritten to the async "POST + poll-until-terminal" shape, the timeout case is re-homed to the worker layer, a new `AC-Async` case asserts the `running → complete` polling tran | — | [PR #286](https://github.com/SoundMindsAI/relyloop/pull/286) | ### Spec (0) @@ -106,8 +106,6 @@ graph LR class chore_demo_seeding_integration_tests_rewrite plan; feat_query_normalization_tuning["query normalization tuning"] class feat_query_normalization_tuning plan; - feat_study_convergence_indicator["study convergence indicator"] - class feat_study_convergence_indicator plan; feat_ubi_llm_study_comparison["ubi llm study comparison"] class feat_ubi_llm_study_comparison plan; feat_contextual_help_mvp2["contextual help mvp2"] @@ -122,6 +120,8 @@ graph LR class feat_overnight_autopilot done; infra_adapter_solr["adapter solr"] class infra_adapter_solr done; + feat_study_convergence_indicator["study convergence indicator"] + class feat_study_convergence_indicator done; feat_ubi_judgments --> infra_adapter_solr ``` diff --git a/docs/00_overview/dashboard.html b/docs/00_overview/dashboard.html index 872ac496..7e1da557 100644 --- a/docs/00_overview/dashboard.html +++ b/docs/00_overview/dashboard.html @@ -392,7 +392,7 @@

Releases

MVP2 / v0.2
Three-Engine + Real Signals
-
6 / 10 scoped done · 22 remaining
+
7 / 10 scoped done · 21 remaining
In progress
diff --git a/docs/00_overview/planned_features/02_mvp2/feat_study_convergence_indicator/feature_spec.md b/docs/00_overview/implemented_features/2026_06_01_feat_study_convergence_indicator/feature_spec.md similarity index 100% rename from docs/00_overview/planned_features/02_mvp2/feat_study_convergence_indicator/feature_spec.md rename to docs/00_overview/implemented_features/2026_06_01_feat_study_convergence_indicator/feature_spec.md diff --git a/docs/00_overview/planned_features/02_mvp2/feat_study_convergence_indicator/idea.md b/docs/00_overview/implemented_features/2026_06_01_feat_study_convergence_indicator/idea.md similarity index 100% rename from docs/00_overview/planned_features/02_mvp2/feat_study_convergence_indicator/idea.md rename to docs/00_overview/implemented_features/2026_06_01_feat_study_convergence_indicator/idea.md diff --git a/docs/00_overview/planned_features/02_mvp2/feat_study_convergence_indicator/implementation_plan.md b/docs/00_overview/implemented_features/2026_06_01_feat_study_convergence_indicator/implementation_plan.md similarity index 99% rename from docs/00_overview/planned_features/02_mvp2/feat_study_convergence_indicator/implementation_plan.md rename to docs/00_overview/implemented_features/2026_06_01_feat_study_convergence_indicator/implementation_plan.md index 8f46e267..2d9514c5 100644 --- a/docs/00_overview/planned_features/02_mvp2/feat_study_convergence_indicator/implementation_plan.md +++ b/docs/00_overview/implemented_features/2026_06_01_feat_study_convergence_indicator/implementation_plan.md @@ -1,7 +1,7 @@ # Implementation Plan — Study convergence indicator **Date:** 2026-05-31 -**Status:** Ready for Execution +**Status:** Complete (PR #352, merged 2026-06-01) **Primary spec:** [`feature_spec.md`](feature_spec.md) **Policy source(s):** [`CLAUDE.md`](../../../../../CLAUDE.md), [`docs/01_architecture/api-conventions.md`](../../../../01_architecture/api-conventions.md), [`docs/01_architecture/ui-architecture.md`](../../../../01_architecture/ui-architecture.md) diff --git a/docs/00_overview/planned_features/02_mvp2/feat_study_convergence_indicator/pipeline_status.md b/docs/00_overview/implemented_features/2026_06_01_feat_study_convergence_indicator/pipeline_status.md similarity index 77% rename from docs/00_overview/planned_features/02_mvp2/feat_study_convergence_indicator/pipeline_status.md rename to docs/00_overview/implemented_features/2026_06_01_feat_study_convergence_indicator/pipeline_status.md index a3e1caf7..5538aa35 100644 --- a/docs/00_overview/planned_features/02_mvp2/feat_study_convergence_indicator/pipeline_status.md +++ b/docs/00_overview/implemented_features/2026_06_01_feat_study_convergence_indicator/pipeline_status.md @@ -1,5 +1,7 @@ # Pipeline Status — feat_study_convergence_indicator +**Release:** mvp2 + ## Idea - Status: Complete - File: idea.md @@ -21,4 +23,9 @@ - Phases covered: single-phase delivery (FR-1 through FR-9 all in this plan; no deferred phases) ## Implementation -- Status: Not started +- Status: Complete +- Date: 2026-06-01 +- PR: #352 (squash-merged `0eee17a9`) +- CI: green (pr + DCO + secrets-defense; SKIP_HEAVY_CI fast lane) +- Stories: 11/11 complete +- Cross-model review: Gemini (1 Medium accepted+fixed `644feeed`) + GPT-5.5 final (4 findings: 2 accepted+fixed `644feeed`/`ad72e297`, 3 rejected as review-window truncation artifacts) diff --git a/docs/00_overview/mvp2_dashboard.html b/docs/00_overview/mvp2_dashboard.html index d9b183f6..8c525b1c 100644 --- a/docs/00_overview/mvp2_dashboard.html +++ b/docs/00_overview/mvp2_dashboard.html @@ -397,13 +397,13 @@

MVP2 Progress

Specced features done
-
6 / 10
-
60% specced · 34 filed under MVP2
-
+
7 / 10
+
70% specced · 34 filed under MVP2
+
Pending work
-
28
+
27
every not-done feat/infra/chore/bug across all priorities
@@ -425,7 +425,7 @@

MVP2 Progress

P2 (default)
-
24
+
23
important to file, not blocking
@@ -435,7 +435,7 @@

MVP2 Progress

Legacy "Path to MVP2"
-
22
+
21
scoped not-done + bugs + chore-ideas only (excludes feat/infra ideas)
@@ -784,7 +784,7 @@

Spec 0

-

Plan 4

+

Plan 3

@@ -799,19 +799,6 @@

Plan 4

-
- -
- Feature - P2 - PR #316 -
-
Every completed study carries a plain-language **convergence verdict** — `converged` / `still_improving` / `too_few_trials` — backed by a best-metric-so-far curve.
- - -
- -
@@ -845,7 +832,7 @@

Implementing 0

-

Done 6

+

Done 7

@@ -886,6 +873,19 @@

Done 6

+
+ +
+ Feature + + PR #352 merged 2026-06-01 +
+
Every completed study carries a plain-language **convergence verdict** — `converged` / `still_improving` / `too_few_trials` — backed by a best-metric-so-far curve.
+ + +
+ +
@@ -941,8 +941,6 @@

Dependency graph (feat_ + infra_)

class chore_demo_seeding_integration_tests_rewrite plan; feat_query_normalization_tuning["query normalization tuning"] class feat_query_normalization_tuning plan; - feat_study_convergence_indicator["study convergence indicator"] - class feat_study_convergence_indicator plan; feat_ubi_llm_study_comparison["ubi llm study comparison"] class feat_ubi_llm_study_comparison plan; feat_contextual_help_mvp2["contextual help mvp2"] @@ -957,6 +955,8 @@

Dependency graph (feat_ + infra_)

class feat_overnight_autopilot done; infra_adapter_solr["adapter solr"] class infra_adapter_solr done; + feat_study_convergence_indicator["study convergence indicator"] + class feat_study_convergence_indicator done; feat_ubi_judgments --> infra_adapter_solr
diff --git a/state.md b/state.md index b874285c..2d62d5b3 100644 --- a/state.md +++ b/state.md @@ -14,8 +14,8 @@ MVP1 (v0.1) **shipped** — all six differentiators live (Bayesian/TPE optimizer ## Current branch / execution context -- **Branch:** `feat/study-convergence-indicator` (pre-PR). 9 commits on top of `main`; full plan executed (11/11 stories) in this session. -- **Active feature:** `feat_study_convergence_indicator` (MVP2 — Phase 2 backlog drain). All 11 stories shipped this session: epsilon hoist (`AUTO_FOLLOWUP_LIFT_EPSILON`) + AST/grep guard against re-inlining 0.005 anywhere in `backend/app/` (also collapsed a pre-existing duplicate in `chain_summary.py`); pure-domain `classify_convergence(...)` returning `StudyConvergenceShape | None` (3 verdicts: converged / still_improving / too_few_trials); direction-aware service `fetch_study_convergence` with in-flight short-circuit + invalid-direction WARN + classifier-exception shielding; additive `StudyDetail.convergence` field; `ConvergencePanel` (Recharts best-so-far curve + 3 null-state badges + 3 new glossary entries with deep-link to the runbook); digest worker + Jinja `` block + system-prompt framing rule ("re-run with a larger trial budget" leads on `still_improving` / `too_few_trials`); operator runbook + CLAUDE.md "Key Runbooks" row + arch patches in `data-model.md` + `ui-architecture.md`. Cross-PR contract: exports `ConvergenceVerdict` Literal for the autopilot PR's `StudyChainLink.convergence_verdict` (FR-7); no autopilot files touched here. No migration (Alembic head stays `0022`). One tangential idea captured (`bug_contract_allowlists_outdated_after_mvp2_features` — three pre-existing contract failures from prior MVP2 PRs not updating hand-maintained allowlists). Next: push branch + open PR + monitor CI + Gemini/GPT-5.5 review + finalize. +- **Branch:** `main` (clean). `feat_study_convergence_indicator` merged via PR #352 (squash, merge commit `0eee17a9`); finalization on `docs/finalize-study-convergence-indicator` (this branch). +- **Active feature:** _None in flight._ `feat_study_convergence_indicator` shipped — completed studies now carry a plain-language convergence verdict (converged / still_improving / too_few_trials) backed by a best-so-far curve, surfaced on `/studies/[id]`, threaded into the digest narrative's lead recommendation, and exported as a soft contract for the autopilot chain panel. Next: pull from the MVP2 Idea backlog (run `/pipeline status`). - **Alembic head:** `0022_solr_engine_auth_check` (added by `infra_adapter_solr` Story A6 — extends `clusters.engine_type` + `clusters.auth_kind` CHECK constraints for Solr). - **Python:** 3.13. **Frontend stack:** Next 16 (App Router + Turbopack), React 19, Tailwind 4 (CSS-first), Vitest 4, ESLint 9 (flat), TypeScript 6, Playwright (chromium, single worker) for E2E. - **Coverage gates:** backend 80% (`fail_under` in pyproject), UI vitest + tsc + ESLint + Next build, plus a full-stack smoke E2E job. Live pass counts: see the latest `pr.yml` run (the historical per-feature counts moved to `state_history.md`). @@ -24,14 +24,15 @@ MVP1 (v0.1) **shipped** — all six differentiators live (Bayesian/TPE optimizer Detail + reasoning for each is in [`state_history.md`](state_history.md). +- **2026-06-01** — `feat_study_convergence_indicator` (PR #352, squash-merged `0eee17a9`). MVP2 ergonomics feature: every completed study carries a plain-language **convergence verdict** — `converged` / `still_improving` / `too_few_trials` — backed by a best-so-far metric curve, answering "did the optimizer finish learning, or did I stop it too early?" **No migration** (reads existing `trials` columns; Alembic head stays `0022`). 11 stories / 7 epics: (1) hoist the lift epsilon to `AUTO_FOLLOWUP_LIFT_EPSILON` + an **AST/grep guard** that fails CI on any bare `0.005` in a lift/epsilon-shaped context under `backend/app/` — surfaced + collapsed a pre-existing duplicate in `chain_summary.py`; (2) pure-domain `classify_convergence(...)` → `StudyConvergenceShape | None` (trailing-window-flat, direction-aware running max/min, window clamp `min(20, max(5, total//5))`, warmup floor 50); (3) repo helper `list_complete_optuna_trials_for_study` (`status='complete' AND is_baseline IS NOT TRUE AND primary_metric IS NOT NULL`); (4) service `fetch_study_convergence` with in-flight short-circuit + invalid-direction WARN + classifier-exception shielding (GET never 500s); (5) additive `StudyDetail.convergence` (renamed class `StudyConvergenceShape` to coexist with `confidence.py`'s `ConvergenceShape` — winner-trial timing vs metric plateau — without OpenAPI name collision); (6) `ConvergencePanel` (Recharts curve + `ReferenceArea` trailing-window shade + 3 null-state badges + 3 glossary entries deep-linking the runbook) mounted on `/studies/[id]` with the `CONVERGENCE_VERDICT_VALUES` enum-discipline pair value-locked both sides; (7) digest worker + Jinja `` block + system-prompt rule that leads with "re-run with a larger trial budget" on `still_improving` / `too_few_trials`; (8) `ConvergenceVerdict` exported as the FR-7 soft contract for the autopilot chain panel (AC-16 lives in that PR's lane — no autopilot files touched here); (9) operator runbook + CLAUDE.md Key Runbooks row + `data-model.md` / `ui-architecture.md` patches. Tests: 40 domain unit + 11 prompt unit + 5 docs unit + 18 integration + 4 contract + 13 frontend vitest + 1 Playwright smoke; mypy clean. Cross-model: Gemini (1 Medium accepted — `is False`→`not getattr` None-safety, `644feeed`) + GPT-5.5 final (4 findings: SQL/domain filter-mirror `is_not(True)` accepted `ad72e297`; 3 rejected as review-window-truncation artifacts). Captured `bug_contract_allowlists_outdated_after_mvp2_features` (3 pre-existing contract failures from prior MVP2 PRs not updating hand-maintained allowlists). - **2026-05-31** — `feat_demo_reseed_solr_and_steplog` (PR #348, squash-merged `66323aba`). Completes the deferred `infra_adapter_solr` Story A13 — the async home-button demo reseed (`make seed-demo`) now seeds the **Solr** scenario (`acme-kb-docs-solr`) end-to-end, where before it crashed. Fixed layer by layer (each verified by the live reseed getting further): Solr host-URL→Compose-DNS mapping (`8983`); the dispatcher now creates the Solr collection from its configset + bulk-indexes (reusing `seed_solr_products`) instead of the ES `index_mapping` PUT; engine-aware **synthetic-UBI write** (`ubi_queries`/`ubi_events` Solr collections + Solr `/update`, per-engine ensure gate); study **search-space cardinality** (Solr scenario tunes 2 boosts, `estimate_cardinality` floats=100 → ≤3-float cap). **Surfaced + fixed a real product bug:** the UBI **read** path (`ubi_readiness`, `UbiReader`) built Elasticsearch query DSL → `UBI judgment generation was broken on Solr for every operator` (contradicting the "works everywhere from day one" claim); now builds Solr `q`/`fq`/`{!terms}`/`rows`/`fl` when `adapter.engine_type=="solr"`. Plus a **reseed step-history log** (worker accumulates `steps[]` → status endpoint → scrolling UI panel). **Verified by a live `make seed-demo` completing 6/6** (10 studies, 10 proposals). 8 commits; backend 2012 unit + frontend 983 vitest. Gemini: 3 findings accepted+fixed (`{!terms}` query_id vs maxBooleanClauses — verified live; collision-safe synthetic ids; scroll-on-reopen dep). Final GPT-5.5 review clean. Captured `bug_reseed_failure_blocks_retry_arq_singleton_dedup` (a failed reseed deduped retries for ~1h via the Arq singleton result). - **2026-05-31** — `feat_overnight_autopilot` (PR #343, squash-merged `fe146950`). MVP2 ergonomics feature surfacing the shipped auto-followup chaining engine as a first-class "set it and wake up to results" path — **read-side + UI only, the engine stayed untouched**. New read-only `GET /api/v1/studies/{id}/chain` (pure-domain `chain_summary.py`: stop-reason matrix, universal cumulative-lift, best-link selection — reuses `compute_first_decile_max`/`_direction_normalized_lift` from `auto_followup.py`; + a bounded `parent_study_id` traversal repo helper with a 10-hop upward cap + cycle guard, `LIMIT 1` downward walk, `DISTINCT ON` newest-non-rejected proposal lookup). Frontend: create-study wizard relabel to "🌙 Run overnight (compound automatically)" + Deep-preset discoverability hint + `overnight_autopilot` glossary key; `AutoFollowupChainPanel` rolled-up summary (ordered links + deltas, cumulative lift, 3-branch best-config, stop-reason phrases) via a new `useStudyChain` TanStack hook (focus/cancel/transition refetch + 120s grace poll). Tutorial Step 12 + arch-doc notes. No migration (Alembic head stays `0022`). 7 stories / 4 epics. Tests: 30 unit + 31 chain integration/contract + frontend vitest (panel 13/wizard/glossary) + 1 real-backend E2E. Cross-model: GPT-5.5 Epic 1 (1 Low accepted) + Epics 2+3 clean + final (1 Medium rejected — SQLite-portability moot, Postgres-only); Gemini (1 High accepted — zero-row hydration guard `9b1d894f`). Phase 2 ("ran while away" card) deferred to `feat_overnight_studies_summary_card`. 4 tangential idea files captured (`infra_generated_artifact_freshness_gate`, `bug_judgment_header_omits_click_bucket`, `chore_arq_pool_aclose_deprecation`, `bug_e2e_teardown_chain_node_delete_500`). - **2026-05-31** — `infra_adapter_solr` (PR #336 squash-merged `60aec9af` + demo-seed fix #337). MVP2 three-engine headliner: a single `SolrAdapter` implements the full `SearchAdapter` Protocol against Apache Solr 9.x/10.x (SolrCloud + standalone), pivoting on a construction-time capability probe. All 13 stories (A1–A13): adapter skeleton + probe, `edismax`/`dismax`/`lucene` render with unified-param pivots, parallel `/select` search_batch, get_schema/list_targets, explain (Lucene-escaped), get_document/list_documents (RealTime Get + cursorMark), LTR rescore (`rq={!ltr}`) + `LTR_MODEL_NOT_FOUND`, migration 0022 (engine_type + auth_kind CHECK extension) + `registry.py` allowlist relocation, `POST /clusters/{id}/reprobe` + `POST /clusters/test-connection`, Compose `solr:10.0` service + configsets + `/healthz` subsystems.solr, frontend wire literals + per-engine auth filtering + 3-engine `` + test-connection button, Guide 01 + runbook + tutorial Path C, and a 5th `acme-kb-docs-solr` demo scenario. **Live-Solr rework correction** (folded into the squash): local Solr runs security-DISABLED (parity with ES/OpenSearch), `bootstrap-security.sh` deleted, LTR via `SOLR_MODULES=ltr`; stock Solr ships **no** `solr.UBIComponent`, so UBI on Solr is read-path-only (demo synthesizes events, probe reports `ubi_component_present=false`). Cross-model review: GPT-5.5 F1/F2/F3 + 6 Gemini findings adjudicated. Post-pipeline followups tracked in `00_unsure/chore_solr_post_pipeline_followups` + `chore_solr_cred_backfill_needs_api_restart`. - **2026-05-30** — `chore_oss_public_launch_punchlist` (PRs #322, #330, + history-audit PR). Closed the 3-capability OSS public-launch gate. (1) SPDX headers via FSFE REUSE on every source file + `REUSE.toml` + `reuse-lint` pre-commit/CI gates (1477/1477 compliant). (2) Dependency license inventory (`scripts/gen_license_inventory.py` → `docs/04_security/license-inventory.md`, 786 deps / 0 violations, deterministic from locked closure) + `license-inventory` CI gate; 9 non-permissive licenses all adjudicated Accept (no GPL/AGPL ships). (3) Full-history `gitleaks` + manual sweep — 1 gitleaks finding + a few pickaxe hits, all confirmed false positives — captured in a repeatable runbook (`docs/03_runbooks/oss-history-audit.md`). Repo cleared for the visibility flip (operator action). NOTE: `SKIP_HEAVY_CI=true` during this work, so `license-headers`/`license-inventory` jobs were verified locally, not in CI. -- **2026-05-29** — `feat_ubi_judgments` (PR #317, squash-merged). MVP2 second feature: engine-neutral User Behavior Insights judgment generation, shipped end-to-end (all 13 stories incl. E2E + DB-backed integration tests + operator docs — none deferred). Migration 0021 (judgment_lists.generation_params JSONB) → domain/ubi/ pure-domain library (FeatureVec, async SignalsConverter Protocol + 3 impls, position-bias prior) → UbiReader (engine-neutral two-index scan + client-side join, no new adapter method) → ubi_readiness classifier (rung_0..rung_3, 60s Redis cache) → start_ubi_judgment_generation dispatcher (refactor extracts 5 shared helpers; LLM dispatcher parity preserved by all 12 existing tests) → 5 new wire Literals + _SourceBreakdown three-term evolution (FR-10) → GET /clusters/{id}/ubi-readiness + POST /judgments/generate-from-ubi endpoints → generate_judgments_from_ubi Arq worker with mapping_strategy + hybrid LLM-fill callback → 21st agent tool + orchestrator prompt update → frontend method picker dialog + on-ramp nudge + sparse-data card + value-delta + ambiguous-skip recovery cards → operator runbook + 3 FAQ entries + data-model patches → 4 Playwright E2E specs (rung_0/rung_3/hybrid/source-filter) green against the live ES-backed stack. **Real-engine E2E caught a production bug**: UbiReader requested `size=50000 > ES index.max_result_window (10000)` → "all shards failed" swallowed by the adapter → spurious `UBI_INSUFFICIENT_DATA` on dense clusters; fixed by clamping both index scans to `ES_MAX_RESULT_WINDOW=10000` + regression guard. Cross-model review: 6 Gemini findings + 6 GPT-5.5 findings all adjudicated (fixed or documented as working-as-designed). Remaining follow-ups are pure deferrals, not gaps: `chore_ubi_reader_search_after_pagination` (P2, >10k-event clusters), `chore_ubi_hybrid_template_render` (P3, vestigial-template contract cleanup — current behavior correct per FR-2), `feat_demo_ubi_study_comparison` (P1, side-by-side UBI-vs-LLM demo study). + ## In flight -- **`feat_study_convergence_indicator`** on `feat/study-convergence-indicator` (9 commits, pre-PR). 11/11 stories complete; PR forthcoming. +- _None._ `feat_study_convergence_indicator` (PR #352) merged 2026-06-01; finalization on `docs/finalize-study-convergence-indicator` (this branch). ## Queued (priority-ordered by dashboard / dep graph) diff --git a/state_history.md b/state_history.md index 9f3408af..ec820396 100644 --- a/state_history.md +++ b/state_history.md @@ -4,6 +4,118 @@ --- +### feat_study_convergence_indicator — "did the optimizer finish learning?" verdict (PR #352, 2026-06-01) + +An MVP2 ergonomics feature for the overnight-study workflow: every completed +study now carries a plain-language **convergence verdict** — +`converged` / `still_improving` / `too_few_trials` — backed by a best-so-far +metric curve. It answers the question the operator asks the morning after a +study (or an overnight chain) finishes: *did the optimizer actually finish +learning, or did I stop it too early?* Squash-merged via +[PR #352](https://github.com/SoundMindsAI/relyloop/pull/352) (merge commit +`0eee17a9`); the planned-feature folder moved to +`implemented_features/2026_06_01_feat_study_convergence_indicator/`. +**No migration** — the classifier reads existing `trials` columns only; +Alembic head stays `0022_solr_engine_auth_check`. + +**Eleven stories across seven epics:** + +- **1.1 — Epsilon hoist.** The lift epsilon `0.005` was inlined at two sites in + `auto_followup.py`. Hoisted to a single module-level + `AUTO_FOLLOWUP_LIFT_EPSILON`, re-exported as `CONVERGENCE_FLAT_EPSILON` by the + new convergence module so all three modules (`auto_followup`, `chain_summary`, + `convergence`) share one source of truth. A value-lock test asserts + `== 0.005` via value-equality (never `is`). +- **1.2 — Pure-domain classifier.** `backend/app/domain/study/convergence.py`'s + `classify_convergence(...)` → `StudyConvergenceShape | None`. Trailing-window- + flat algorithm: filter to (complete, non-baseline, non-null-metric) → sort by + `optuna_trial_number` → direction-aware running max/min best-so-far curve → + `window_size = min(20, max(5, total // 5))` → improvement over the window → + verdict via the §9 decision matrix (warmup-floor-50 first → flat-vs-epsilon → + still_improving). 39 unit tests (later 40) covering every branch, direction- + aware minimize, all filters, 9 window-clamp boundary cases, slow-drift / + single-late-jump / noisy-tail shapes, monotonicity invariant, determinism. + The story also shipped an **AST/grep guard** that walks every `*.py` under + `backend/app/` and fails CI on any bare `0.005` in a lift/epsilon-shaped + context outside the canonical declaration — which immediately surfaced and let + us collapse a pre-existing duplicate `CHAIN_LIFT_EPSILON = 0.005` in + `chain_summary.py`. +- **2.1 — Repo helper.** `list_complete_optuna_trials_for_study` pushes the + `status='complete' AND is_baseline IS NOT TRUE AND primary_metric IS NOT NULL` + filter into SQL, ordered by `optuna_trial_number ASC`. (`IS NOT TRUE` rather + than `IS FALSE` was a GPT-5.5-review fix — see below.) +- **2.2 — Async service.** `fetch_study_convergence` owns the three + orchestration concerns the pure layer can't: in-flight short-circuit + (`queued`/`running` → `None`, classifier never invoked); direction resolution + honoring the `studies.py:173` `objective.get("direction", "maximize")` + precedent with a `convergence_invalid_direction` WARN on a malformed value; + and `try/except Exception` shielding so a classifier bug can never 500 the + GET (emits `convergence_classifier_exception` WARN, returns `None`). 12 + integration tests using `structlog.testing.capture_logs` (RelyLoop's structlog + config writes through its own ConsoleRenderer, so `caplog` can't see the WARN + events). +- **3.1 — API.** Additive `StudyDetail.convergence: StudyConvergenceShape | None` + on the GET + cancel responses. The Pydantic class is named + **`StudyConvergenceShape`** (not bare `ConvergenceShape`) to coexist with + `confidence.py`'s existing `ConvergenceShape` — a *different* concept + (winner-trial timing vs metric plateau) that also rides on `StudyDetail`. + Tried `ConfigDict(title=...)` first; it only renames the inner JSON Schema + title, not the OpenAPI components key, so a clean class rename was the fix. + Inline-fixed a pre-existing contract test that asserted `engine_type="solr"` + was invalid (Solr shipped as first-class in `infra_adapter_solr`) — moved the + sentinel to `"vespa"`. +- **4.1 / 4.2 — Frontend.** `` (shadcn Card + verdict badge + + improvement-summary line + collapsible Recharts curve with a `ReferenceArea` + shading the trailing window + AC-20 aria-label + three null-state badges: + still_running / not_enough_trials / unavailable). 3 new glossary entries + (`convergence_verdict` with a long-form + deep link to the runbook, + `convergence_curve`, `convergence_window`). Mounted on `/studies/[id]` between + `` and the trials table. `CONVERGENCE_VERDICT_VALUES` + enum-discipline pair value-locked on both sides (frontend vitest + + backend Literal-membership test). 13 vitest + 1 real-backend Playwright smoke + (covers the AC-13b null-state path, since `seed-completed` only inserts 2 + trials — below the MIN-5 floor). +- **5.1 / 5.2 — Digest.** Worker fetches the shape + `model_dump()`s it into + `render_digest_user_prompt(convergence=...)`; the user template gains a + `{% if convergence %}...{% endif %}` block (verdict + + small numerics only — the curve would inflate tokens). The system prompt + gains a "Convergence-aware lead recommendation" section: `still_improving` / + `too_few_trials` lead the suggested follow-ups with "re-run with a larger + trial budget" and demote `narrow`/`widen` to secondary. 11 prompt unit tests + (8 user-block + 3 system-prompt AC-15 substring assertions). +- **6.1 — Autopilot soft contract.** `ConvergenceVerdict` Literal is exported + for the autopilot PR's `StudyChainLink.convergence_verdict` field; AC-16 lives + in the autopilot CI lane, NOT this PR. Verified by the import + Literal- + membership test; `git diff --stat` confirms zero autopilot/`StudyChainLink` + mutations. +- **7.1 — Docs.** Operator runbook + [`docs/03_runbooks/convergence-verdict.md`](docs/03_runbooks/convergence-verdict.md) + (verdict meanings, plain-language algorithm, re-run framing with wizard preset + names, null-state interpretation, minimize example, noisy-tail + troubleshooting), CLAUDE.md Key Runbooks row, `data-model.md` + + `ui-architecture.md` patches, and 5 docs-assertion tests. + +**Cross-model review.** Gemini posted one Medium: `getattr(..., "is_baseline", +False) is False` excludes a `None` flag (`None is False` → `False`), +contradicting the "include non-baseline trials" intent — accepted, switched to +`not getattr(...)` + regression test (`644feeed`). GPT-5.5's final review +returned four findings: one genuine (after the Gemini fix, the SQL helper's +`is_(False)` no longer mirrored the domain `not is_baseline` semantics on NULL +rows — changed to `is_not(True)`, `ad72e297`), and three review-window +truncation artifacts (the diff was capped at 90 KB and git path-ordering pushed +`ui/`, `prompts/`, and `docs/` past the cut, so the reviewer reported them +"absent" when they exist and pass CI — rejected with commit citations). + +**Tangential discovery.** `bug_contract_allowlists_outdated_after_mvp2_features` +(P2): three pre-existing contract failures surfaced on a clean tree during the +pre-push gate (`test_resolve_engine_type_wire`, the `/healthz` subsystems +schema, and `test_openapi_has_no_orphan_endpoints`) — all from prior MVP2 +features (`infra_adapter_solr`, `feat_ubi_judgments`, `feat_overnight_autopilot`) +not updating their hand-maintained allowlists. Confirmed pre-existing via +`git stash`; out of scope for this PR. + +--- + ### infra_adapter_solr — Apache Solr adapter, MVP2 three-engine reach (PR #336 + #337, 2026-05-31) MVP2's headliner and the feature that completes the project's three-engine diff --git a/website/docs/roadmap.md b/website/docs/roadmap.md index 27edc5d5..fd5a9b55 100644 --- a/website/docs/roadmap.md +++ b/website/docs/roadmap.md @@ -170,6 +170,7 @@ - ✅ [Contextual Help](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/implemented_features/2026_05_15_feat_contextual_help_mvp2) · [#124](https://github.com/SoundMindsAI/relyloop/pull/124) - ✅ [Demo UBI Study Comparison](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/implemented_features/2026_05_30_feat_demo_ubi_study_comparison) · [#320](https://github.com/SoundMindsAI/relyloop/pull/320) - ✅ [Overnight Autopilot](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/implemented_features/2026_05_31_feat_overnight_autopilot) · [#343](https://github.com/SoundMindsAI/relyloop/pull/343) +- ✅ [Study Convergence Indicator](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/implemented_features/2026_06_01_feat_study_convergence_indicator) · [#352](https://github.com/SoundMindsAI/relyloop/pull/352) - ✅ [Study Sub Warmup Guard](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/implemented_features/2026_05_29_feat_study_sub_warmup_guard) · [#316](https://github.com/SoundMindsAI/relyloop/pull/316) - ✅ [UBI Judgments](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/implemented_features/2026_05_29_feat_ubi_judgments) · [#317](https://github.com/SoundMindsAI/relyloop/pull/317) - 🟡 [Apply Path Normalizer Declaration](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/feat_apply_path_normalizer_declaration) @@ -177,7 +178,6 @@ - 🟡 [Overnight Studies Summary Card](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/feat_overnight_studies_summary_card) - 🟡 [Query Normalization Tuning](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/feat_query_normalization_tuning) - 🟡 [Query Normalizer Typed Pipeline](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/feat_query_normalizer_typed_pipeline) -- 🟡 [Study Convergence Indicator](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/feat_study_convergence_indicator) - 🟡 [UBI LLM Study Comparison](https://github.com/SoundMindsAI/relyloop/tree/main/docs/00_overview/planned_features/02_mvp2/feat_ubi_llm_study_comparison) ??? note "Infrastructure & tooling (3)"