Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/00_overview/DASHBOARD.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ _Top-level index across MVP1 → GA v1+ as of **2026-05-24**. Click a release na

| Release | Theme | Progress | Status |
|---|---|---|---|
| [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 74 / 75 scoped done · 11 remaining | **In progress** |
| [MVP1 / v0.1](MVP1_DASHBOARD.md) | The Loop | 75 / 75 scoped done · 12 remaining | **In progress** |
| [MVP1.5 / v0.1.5](MVP1_5_DASHBOARD.md) | Real Signals | 1 item(s) queued | **Held / queued** |
| [MVP2 / v0.2](MVP2_DASHBOARD.md) | Observable | 1 / 1 scoped done · 1 remaining | **In progress** |
| MVP3 / v0.3 | Production Stacks | — | **Not yet scoped** |
Expand Down
65 changes: 30 additions & 35 deletions docs/00_overview/MVP1_DASHBOARD.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/00_overview/dashboard.html
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@ <h2>Releases</h2>
<div class="roadmap-row">
<div class="release-name"><a href="mvp1_dashboard.html">MVP1 / v0.1</a></div>
<div class="theme">The Loop</div>
<div class="progress">74 / 75 scoped done · 11 remaining</div>
<div class="progress">75 / 75 scoped done · 12 remaining</div>
<span class="state-pill in_progress">In progress</span>
</div>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@
- [`docs/00_overview/implemented_features/2026_05_24_feat_digest_executable_followups/feature_spec.md`](../../../00_overview/implemented_features/2026_05_24_feat_digest_executable_followups/feature_spec.md) — Tier-A substrate this spec extends
- [`docs/01_architecture/llm-orchestration.md`](../../../01_architecture/llm-orchestration.md)
- [`docs/01_architecture/data-model.md`](../../../01_architecture/data-model.md)
- Sibling (in-flight backlog): [`backlog_feat_digest_template_edit_followups`](../backlog_feat_digest_template_edit_followups/idea.md) — Tier C `edit_template`
- Sibling (in-flight backlog): [`backlog_feat_digest_template_edit_followups`](../../../02_product/planned_features/backlog_feat_digest_template_edit_followups/idea.md) — Tier C `edit_template`

---

## 1) Purpose

- **Problem:** Tier A (shipped 2026-05-24 as PR #225) lets the LLM suggest `narrow` / `widen` / `text` followups within the **same query template**. But the LLM sometimes recognizes that a **different template entirely** is the better fit — e.g., parameter-importance is highly skewed (some declared params are dead weight), or winning trials cluster around a sub-set of params that map cleanly onto a different template's `declared_params`. Today the operator has to notice this themselves; the LLM has no structured way to say "try template X instead." The "Run this followup" substrate (`backend/app/domain/study/followups.py`, `ui/src/components/proposals/suggested-followups-panel.tsx`, the `?action=run_followup` modal prefill at `ui/src/app/proposals/[id]/page.tsx:120-184`) is in place — only the `swap_template` variant + its UI surface is missing.
- **Outcome:** The LLM emits a fourth `kind: "swap_template"` variant carrying `{rationale, template_id, search_space}` where `template_id` references a different `query_templates.id` than the parent study used. The proposal-detail UI renders the variant as an actionable card with a side-by-side `declared_params` comparison (parent template vs proposed swap target) before the operator commits. The "Run this followup" button pre-fills `template_id = <swap_target>` (not the parent's template) plus the LLM-proposed `search_space`, with disjoint params filled from the existing heuristic at `backend/app/domain/study/search_space_defaults.py`. Lineage (`studies.parent_proposal_id` + `parent_proposal_followup_index`) is reused unchanged — the cross-template hop is explicit in the data because the child study's `template_id` differs from the parent's.
- **Non-goal:** Auto-running swap-template followups without operator click (already covered for the deterministic narrow-around-winner case by `feat_auto_followup_studies`; cross-template swaps are a much larger trust surface and explicitly stay operator-mediated). LLM-driven template **edits** (Tier C — different surface, tracked at sibling [`backlog_feat_digest_template_edit_followups`](../backlog_feat_digest_template_edit_followups/idea.md)). Side-by-side rendering of the **query body** itself (Jinja2 source) — out, only `declared_params` are compared. Auto-discovery of the swap-target template by the worker (the LLM picks; we don't fall back to a similarity search).
- **Non-goal:** Auto-running swap-template followups without operator click (already covered for the deterministic narrow-around-winner case by `feat_auto_followup_studies`; cross-template swaps are a much larger trust surface and explicitly stay operator-mediated). LLM-driven template **edits** (Tier C — different surface, tracked at sibling [`backlog_feat_digest_template_edit_followups`](../../../02_product/planned_features/backlog_feat_digest_template_edit_followups/idea.md)). Side-by-side rendering of the **query body** itself (Jinja2 source) — out, only `declared_params` are compared. Auto-discovery of the swap-target template by the worker (the LLM picks; we don't fall back to a similarity search).

## 2) Current state audit

Expand Down Expand Up @@ -93,7 +93,7 @@

### Out of scope

- **Tier C — `kind: "edit_template"` followups.** Operator-only today; LLM-suggested template edits are a much larger trust/validation surface and unrelated to this spec's lane. Tracked at sibling backlog folder [`backlog_feat_digest_template_edit_followups`](../backlog_feat_digest_template_edit_followups/idea.md).
- **Tier C — `kind: "edit_template"` followups.** Operator-only today; LLM-suggested template edits are a much larger trust/validation surface and unrelated to this spec's lane. Tracked at sibling backlog folder [`backlog_feat_digest_template_edit_followups`](../../../02_product/planned_features/backlog_feat_digest_template_edit_followups/idea.md).
- **Auto-running swap-template followups without operator click.** Out — operator review is the entire trust mechanism for cross-template hops.
- **Side-by-side rendering of the template's Jinja2 body.** Out — only `declared_params` are compared. The Jinja source is large, hard to diff usefully without a syntax-aware viewer, and most operators making the call don't need it; if they do, the existing template detail page at `/templates/[id]` is one click away.
- **Auto-discovery of the swap-target template.** The LLM picks; we don't fall back to a similarity search or compute the swap target server-side. (Reason: the LLM has the full study-outcome context including parameter-importance distribution + winning-trial cluster; a deterministic similarity search would have to re-derive a much weaker proxy for "which template fits these winning params better.")
Expand Down Expand Up @@ -713,7 +713,7 @@ Tooltip placement uses the existing `<InfoTooltip glossaryKey="...">` primitive
- [ ] Documentation updates across docs/01–05 are merged (§15).
- [ ] Rollout gates from §16 are satisfied.
- [ ] Cross-model review (GPT-5.5) on this spec and the forthcoming implementation plan completed and adjudicated.
- [x] Deferred-phase tracking: N/A (single-phase delivery). Tier C `edit_template` is tracked at sibling [`backlog_feat_digest_template_edit_followups`](../backlog_feat_digest_template_edit_followups/idea.md).
- [x] Deferred-phase tracking: N/A (single-phase delivery). Tier C `edit_template` is tracked at sibling [`backlog_feat_digest_template_edit_followups`](../../../02_product/planned_features/backlog_feat_digest_template_edit_followups/idea.md).
- [ ] No open questions remain in §19.

## 19) Open questions and decision log
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,4 @@ Phase 1 of `feat_digest_executable_followups` handles `narrow` / `widen` / `text
- **Builds on [`feat_digest_executable_followups`](../../../00_overview/implemented_features/2026_05_24_feat_digest_executable_followups/idea.md) Phase 1 substrate** — discriminated-union schema, JSONB column, lineage columns, and "Run this followup" UI scaffolding all already landed.
- **Reuses [`backend/app/domain/study/search_space_defaults.py`](../../../../backend/app/domain/study/search_space_defaults.py)** from `feat_agent_propose_search_space` (shipped 2026-05-21) for the disjoint-set heuristic bounds.
- **Reuses `feat_create_study_search_space_builder` row primitives** (shipped 2026-05-20) for the cross-template comparison (when feasible).
- **Adjacent backlog item:** [`../backlog_feat_digest_template_edit_followups/idea.md`](../backlog_feat_digest_template_edit_followups/idea.md) — the Tier C `edit_template` extension, prefixed `backlog_` because its template-editor UI prerequisite doesn't exist. Promotes out of `backlog_` once this feature ships AND the editor lands.
- **Adjacent backlog item:** [`../../../02_product/planned_features/backlog_feat_digest_template_edit_followups/idea.md`](../../../02_product/planned_features/backlog_feat_digest_template_edit_followups/idea.md) — the Tier C `edit_template` extension, prefixed `backlog_` because its template-editor UI prerequisite doesn't exist. Promotes out of `backlog_` once this feature ships AND the editor lands.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
- Tier-A patterns are the structural template — story shapes (Domain → Worker/Prompts → API → Frontend → E2E), test-layer choice, and DoD style mirror the shipped Tier-A plan one-to-one.
- Fail-loud tests: assert explicit status, shape, error codes, and structlog reason codes.
- Keep increments narrow enough to verify independently — domain helper → discriminated-union widening → LLM schema/prompts → worker remap → API response widening → frontend card + prefill → E2E.
- **Single-phase delivery.** No deferred phases — Tier C (`edit_template`) lives at sibling [`backlog_feat_digest_template_edit_followups`](../backlog_feat_digest_template_edit_followups/idea.md) and is not gated by this work.
- **Single-phase delivery.** No deferred phases — Tier C (`edit_template`) lives at sibling [`backlog_feat_digest_template_edit_followups`](../../../02_product/planned_features/backlog_feat_digest_template_edit_followups/idea.md) and is not gated by this work.
- **No new migration.** Tier-A's JSONB column + lineage columns + CHECK constraint + BEFORE DELETE trigger apply unchanged (spec §3, FR-13).

## 1) Scope traceability (FR → epics/stories)
Expand Down Expand Up @@ -51,7 +51,7 @@

**Spec error-code coverage vs plan:** Spec §8.5 introduces **zero** new error codes. Worker-side validation failures downgrade in-band (no API error); `POST /api/v1/studies` flow uses existing Tier-A codes (`PROPOSAL_NOT_FOUND`, `DIGEST_NOT_FOUND`, `FOLLOWUP_INDEX_OUT_OF_RANGE`, `TEMPLATE_NOT_FOUND`, `INVALID_SEARCH_SPACE`, etc.) verbatim. Match.

**Deferred phases verified:** N/A — single-phase delivery per spec §3 "Phase boundaries". Tier C (`edit_template`) lives at sibling [`backlog_feat_digest_template_edit_followups`](../backlog_feat_digest_template_edit_followups/idea.md) folder and is explicitly NOT gated by this work.
**Deferred phases verified:** N/A — single-phase delivery per spec §3 "Phase boundaries". Tier C (`edit_template`) lives at sibling [`backlog_feat_digest_template_edit_followups`](../../../02_product/planned_features/backlog_feat_digest_template_edit_followups/idea.md) folder and is explicitly NOT gated by this work.

## 2) Delivery structure

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
- Cycle 2: 5 findings (5 accepted, 0 rejected) — 3 re-raises (stale §2/§3 prose on optional schema + deterministic worker pre-clean rule; §13/§4 diagnostic field-name drift; §6 intro sentence still narrow) and 2 net-new (4th reason code `remap_invalid_search_space` for FR-7 step 3 emission; `validation_error` truncation matches the canonical `_truncate` helper)
- Cycle 3: 1 finding (1 accepted, 0 rejected) — net-new internal-consistency catch: empty trusted intersection is unreachable on the worker path (Pydantic min_length=1 rejects empty `SearchSpace`), so helper rejects no-trusted-intersection inputs and prompt instructs LLM to skip in that case; disjoint-only swaps explicitly out of contract
- Total: 18 accepted, 0 rejected across 18 findings (Decision Log D-17 through D-34 enumerate the resolutions)
- Phases: 1 total (single-phase delivery — no `phase2_idea.md`; Tier C `edit_template` tracked at sibling [`../backlog_feat_digest_template_edit_followups/`](../backlog_feat_digest_template_edit_followups/idea.md))
- Phases: 1 total (single-phase delivery — no `phase2_idea.md`; Tier C `edit_template` tracked at sibling [`../../../02_product/planned_features/backlog_feat_digest_template_edit_followups/`](../../../02_product/planned_features/backlog_feat_digest_template_edit_followups/idea.md))

## Plan
- Status: Approved
Expand All @@ -28,7 +28,9 @@
- Phases covered: single-phase delivery (Tier B only)

## Implementation
- Status: Not started

## Implementation
- Status: Not started
- Status: Complete — admin-merged into main as PR #232 squash `791642e0` on 2026-05-24.
- Branch: `feature/digest-executable-followups-swap-template` (deleted post-merge).
- PR: [#232](https://github.com/SoundMindsAI/relyloop/pull/232) — admin-merged with smoke gate red. The smoke failure was a compound cascade of 5+ pre-existing regressions from PR #188 + PR #228's admin-merge bypasses (NOT introduced by Tier B code): cleared `OPENAI_API_KEY_TEST` repo secret; missing `scripts/` COPY in Dockerfile (broke api container startup); `_wait_healthy` not gating on capability check; missing `make seed-demo` step in smoke workflow; OpenAI key rejection by capability check (root unclear). Tier B's own code is clean (3 GPT-5.5 spec cycles + 2 plan cycles + Gemini accept + final-review pass with 6 of 7 findings rejected with cited counter-evidence + 1 deferred). 5 fixes applied during the smoke cascade are bundled into this same squash; remaining issues captured as separate `bug_*` ideas (OpenAI capability + ES cluster unreachability).
- Cross-model review: spec 3 cycles 18/18 accepted; plan 2 cycles 7 accepted + 4 rejected; Gemini 1 Medium accepted; final GPT-5.5 1 deferred + 2 rejected with counter-evidence + 4 spurious from diff-window truncation.
- Test deltas: backend unit 1331 → 1346 (+15 — 7 template_swap + 6 followup union + 1 backcompat + 7 worker validation overlap accounted); +3 integration; +3 contract; +20 vitest; +1 Playwright E2E (gated on demo-data seed which is part of the cascade).
- **No new migration** — Tier A's `0019_digests_suggested_followups_jsonb` + lineage columns apply unchanged. Alembic head stays at `0019`.
Loading