Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ See [`docs/01_architecture/data-model.md`](docs/01_architecture/data-model.md) f

### Stack

- Next.js 14 App Router (TypeScript) — pages in `ui/src/app/`
- Next.js 16 App Router (TypeScript, Turbopack) — pages in `ui/src/app/`
- **shadcn/ui** for UI primitives (components copied into the repo, not an npm dependency — fully customizable)
- **Tailwind CSS** for styling
- **TanStack Query** for server state (caching, retries, optimistic updates, mutations)
Expand Down
35 changes: 13 additions & 22 deletions state.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> Read this first. A one-page snapshot: current focus, the last few merges, what's in flight, what's queued, and where the project sits in the MVP1 → MVP2 → MVP3 → GA roadmap. **Historical feature-merge narrative + chained execution context lives in [`state_history.md`](state_history.md)** — new merge entries land there, not here (per `chore_state_md_size_compression`, 2026-05-29). Keep this file loadable in a single `Read` call.

**Last updated:** 2026-05-29 (after `chore_e2e_api_base_url_construction` PR #301 + finalization #302 merged).
**Last updated:** 2026-05-29 (after PR #310 — `01_mvp1/` planned-features bucket fully drained; the two remaining deferred-by-design folders reclassified to `99_backlog/` + `03_mvp3/`).

## Where the roadmap sits

Expand All @@ -24,45 +24,36 @@ MVP1 (v0.1) **shipped** — all six differentiators live (Bayesian/TPE optimizer

Detail + reasoning for each is in [`state_history.md`](state_history.md).

- **2026-05-29** — `docs: reclassify 2 deferred MVP1 items → 99_backlog/03_mvp3` (PR #310, docs-only). Empties `01_mvp1/` — MVP1 actionable backlog fully drained. `chore_demo_reseed_stale_recovery_atomic_cas` → `99_backlog/` (already Priority: Backlog); `infra_agent_sibling_worktree_isolation` → `99_backlog/` (phases 1+2 shipped, only phase3 remains, defer-until-incident). Dashboards regenerated.
- **2026-05-29** — `bug_smoke_studies_data_table_search_flake` (PR #308 + finalization #309). Hardened the flaky `studies-data-table.spec.ts:20` search-visibility assertion: scoped it to the `studies-table` element + 15s web-first timeout to ride out the debounce→refetch→render race on slow CI runners. e2e-only; no product change.
- **2026-05-29** — `ci(pr): SKIP_HEAVY_CI kill-switch` (PR #307, infra). Added an `if:` guard on the 5 `pr.yml` jobs over 1 min so a repo variable can skip them (temporary GitHub Actions budget measure). See the Active CI note above — variable currently set, auto-restores ~2026-06-01.
- **2026-05-29** — `bug_ceiling_badge_assumes_maximize_direction` (PR #305 + finalization #306). Studies-list CEILING badge (best_metric ≥ 0.99) mislabeled minimize studies (0.99 is a bad score there). Preflight found it had gone latent→live (feat_study_baseline_trial made `direction=minimize` creatable). Added `direction` to StudySummary (defaults maximize) + gated the badge on `direction !== 'minimize'` (rolling-deploy-safe per Gemini). 7 tests.
- **2026-05-29** — `chore_state_md_size_compression` (PR #303 + finalization #304). Split `state.md` (360 KB → 9.3 KB snapshot) from new `state_history.md` (append-only narrative, root); added `state-md-size-guard` pre-commit hook (60 KB cap) + CLAUDE.md snapshot-vs-history convention. **First merge under the new convention.**
- **2026-05-29** — `chore_e2e_api_base_url_construction` (PR #301 + finalization #302). Swept 28 `${API_BASE}<path>` concats across 10 e2e specs to `new URL(...)`; aligned dashboard-reseed's API_BASE env var; URLSearchParams for a query. Mechanical, zero behavior change.
- **2026-05-29** — `bug_demo_reseed_button_silent_enqueue_failure` (PR #299 + finalization #300). Top-level `except Exception` barrier in `run_demo_reseed` + `reseed_status_is_stale()` POST auto-recovery so a worker init crash flips Redis to `failed` instead of stuck-`running`. 14 unit tests.

## In flight

- None. MVP1 alpha shipped; pre-MVP2 sweep drained 18 backlog items (Waves 1-3).
- None. MVP1 alpha shipped; the pre-MVP2 sweep drained the entire `01_mvp1/` backlog — that bucket is now empty (PR #310). Next work is the `02_mvp2/` queue below.

## Queued (priority-ordered by dashboard / dep graph)

**Source of truth:** [`docs/00_overview/MVP1_DASHBOARD.md`](docs/00_overview/MVP1_DASHBOARD.md) (regenerated by the `mvp1-dashboard-regen` pre-commit hook). Run `/pipeline status` for the live view.
**Source of truth:** [`docs/00_overview/DASHBOARD.md`](docs/00_overview/DASHBOARD.md) + [`docs/00_overview/MVP1_DASHBOARD.md`](docs/00_overview/MVP1_DASHBOARD.md) (regenerated by the `mvp1-dashboard-regen` pre-commit hook). Run `/pipeline status` for the live view.

Remaining items split by sized work-flow per the inline-fix vs idea-file rubric:
**MVP1 backlog is fully drained** (`01_mvp1/` empty as of PR #310). The next stop is **MVP2 / v0.2 — "Three-Engine + Real Signals"**. The `02_mvp2/` bucket currently holds 11 folders (run `ls docs/00_overview/planned_features/02_mvp2/` for the live list):

**`/bug-fix` candidates** (medium-sized bugs with design surface — run via `/bug-fix`):
- `bug_chat_long_conversation_truncation_mvp2` — Investigation `bug_fix.md` exists at `docs/00_overview/planned_features/02_mvp2/bug_chat_long_conversation_truncation/`; **held for MVP2** (decided 2026-05-13 — folder renamed with `_mvp2` suffix for ls visibility; pullable forward technically but deferred for scope discipline + latency-of-impact is zero today). Resume `/bug-fix` Default mode when MVP2 starts.
- **Headliners:** `infra_adapter_solr` (Apache Solr adapter), `feat_ubi_judgments` (UBI judgment source), `feat_chat_last_message_preview`, `feat_fts_rank_ordering`.
- **Bugs held for MVP2:** `bug_chat_long_conversation_truncation` (investigation `bug_fix.md` exists; pullable forward but deferred for scope discipline — latency-of-impact is zero today), `bug_webhook_concurrent_merge_race_timing_sensitive`.
- **Chores:** `chore_auto_followup_parent_advisory_lock`, `chore_demo_seeding_integration_tests_rewrite`, `chore_studies_post_arq_spy_fixture`, `chore_template_library_expansion`, `infra_arq_subprocess_test`.

**Polish chores** (`/bug-fix`-shaped — medium scope, design surface):
- `chore_chat_last_message_preview` — add `last_message_preview` + `last_message_at` on ConversationSummary (deferred from feat_chat_agent cycle-2 F15).

**`/pipeline` candidates** (feature-scale — new cron + settings + observability):
- `feat_judgments_periodic_resume_sweep` — strategic in-worker periodic resume cron for stuck `judgment_lists.status='generating'` rows. Preflighted 2026-05-14; folder renamed from `chore_*` to `feat_*` on the same date after work-type re-evaluation against the `feat_github_webhook` cron precedent (new background behavior + new operator settings + new observability events = feat-shaped, not chore-shaped). Design is locked against the existing `reconcile_pr_state` cron pattern; 4 open questions for spec-time decision are captured inline in the idea.

**Operator-deferred:** `infra_optuna_orphan_reaper` (operationally tolerated for MVP1), and the in-progress dogfood items pending design-partner feedback. Two ideas dropped 2026-05-14 per `/idea-preflight` ship-vs-drop calls: `chore_studies_ui_shadcn_polish` (the `feat_proposals_ui` PR #58 `ClusterFilterSelect` precedent established native `<select>` as the project's standard for page-level filter/control surfaces, retiring the F1 finding's inconsistency claim) and `chore_demo_recording_mvp3` (single-maintainer alpha project base rates make the 4-6 hour record-edit-upload-embed task unlikely to execute; `tutorial-first-study.md` serves the discovery role and any MVP3 recording would need MVP4 re-recording when auth UI lands — net carry value did not justify keeping the folder in `planned_features/`).
**Other buckets:** `03_mvp3/` (Observable — includes `infra_optuna_orphan_reaper`, deferred from MVP1 per spec §11 operational tolerance), `04_ga/`, `99_backlog/` (4 defer-until-incident items), `00_unsure/` (`bug_seed_meaningful_demos_silent_bulk_errors`).

## Known debt / fragility

- ~~**`backend/app/eval/qrels_loader.py` is an MVP1 stub.**~~ — **Resolved.** PR #35 replaced the stub with a real `SELECT query_id, doc_id, rating FROM judgments WHERE judgment_list_id = :id`. The legacy `JudgmentsTableMissing` symbol is retained as a no-op compat shim for any imported reference in older tests. Integration tests now seed real `judgments` rows; `run_trial` consumes the loader directly.
- **`infra_optuna_orphan_reaper`** — Phase 2 orchestrator can die between `study.ask()` and the enqueue commit, leaving orphan Optuna RUNNING trials. Operationally tolerated for MVP1 per spec §11 "Operational tolerance"; periodic reaper deferred.
- **CI lacks a `make up` smoke job.** All 5 first-run bugs in the
`infra_foundation` PR surfaced after CI was green. Captured at
[`infra_ci_smoke_makeup`](docs/00_overview/planned_features/infra_ci_smoke_makeup/idea.md)
with a ready-to-paste workflow YAML — should land before MVP1 ships
to prevent recurrence.
- **`infra_optuna_orphan_reaper`** — Phase 2 orchestrator can die between `study.ask()` and the enqueue commit, leaving orphan Optuna RUNNING trials. Operationally tolerated for MVP1 per spec §11 "Operational tolerance"; periodic reaper deferred to MVP3 ([`03_mvp3/infra_optuna_orphan_reaper`](docs/00_overview/planned_features/03_mvp3/infra_optuna_orphan_reaper/idea.md)).
- ~~**CI lacks a `make up` smoke job.**~~ — **Resolved.** `infra_ci_smoke_makeup` shipped 2026-05-13; `pr.yml` now has a full-stack `smoke:` E2E job (see the coverage-gates line above).
- **Tangential bugs captured during the bootstrap:**
- ~~`bug_env_file_corrupted_during_session`~~ — **Resolved.** Defense-in-depth `.env*` filename CI guard shipped in PR #94 + folder finalized to [`implemented_features/2026_05_13_bug_env_file_corrupted_during_session/`](docs/00_overview/implemented_features/2026_05_13_bug_env_file_corrupted_during_session/). Original local-tooling rename event remains undetermined (user-side investigation open).
- [`chore_starlette_422_deprecation`](docs/00_overview/planned_features/chore_starlette_422_deprecation/idea.md)`HTTP_422_UNPROCESSABLE_ENTITY` rename surfaces a `DeprecationWarning` on every test run; mechanical fix.
- ~~[`chore_starlette_422_deprecation`]~~**Resolved.** Shipped 2026-05-13 ([`implemented_features/2026_05_13_chore_starlette_422_deprecation`](docs/00_overview/implemented_features/2026_05_13_chore_starlette_422_deprecation/)).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The formatting ~~[chore_starlette_422_deprecation]~~ contains square brackets inside the strikethrough, which creates a broken markdown link reference because there is no corresponding link definition in the document. To keep it consistent with the resolved item on line 55 (~~bug_env_file_corrupted_during_session~~), it should be formatted as a code block without the outer square brackets: ~~chore_starlette_422_deprecation~~.

- **Manual operator handoffs (per `infra_foundation` §7.5):** `.env` is
not auto-created (operator opts in via `cp .env.example .env`); OpenAI
key file is empty by default; GitHub branch protection requires repo-admin
Expand Down
Loading