chore: regenerate stale guide screenshots/videos for guides 01 + 06 (MVP2) by SoundMindsAI · Pull Request #363 · SoundMindsAI/relyloop

SoundMindsAI · 2026-06-01T02:46:59Z

What

The in-app walkthrough assets for guide 01 (register cluster) and guide 06 (create + monitor study) were captured 2026-05-27 — before the MVP2 batch (Solr adapter, UBI judgments, convergence indicator, overnight autopilot) landed. They showed a pre-Solr, pre-convergence UI. This regenerates both guides' slides, captions, and videos against the current stack.

Surfaced by the guide-gen --audit recorded in PR #354's OSS-launch checklist.

Guide 06 — showcase the convergence verdict

The <ConvergencePanel> (the headline feat_study_convergence_indicator surface) needs ≥ STUDIES_TPE_WARMUP_FLOOR (50) usable trials to emit a real verdict. A real study seeded here scores 0.0 on every trial (the seeded judgments reference doc IDs not in the local ES products index), producing a flat-zero curve + too_few_trials — which looks broken, not instructive.

Switched guide-06's seed to the test-only seed-completed endpoint with 50 synthetic trials, so the detail page renders a genuine converged verdict + best-so-far curve and the ConfidencePanel happy path. Verified via API: verdict=converged, 50 trials, best_metric=0.487.
Best-so-far stays pinned to the unchanged 0.412 → 0.487 winner story (every extra metric < 0.487), so best_metric / proposal / digest — which other E2E specs assert on — are untouched.
Backend (additive): optional extra_trial_metrics: list[float] on seed_study_completed_with_digest + SeedCompletedStudyRequest (default None = unchanged 2-trial behaviour).
Spec expands the convergence-curve <details> (auto-collapsed when converged) before the shot; captions refreshed (study name, queued-filter note, + a new Convergence-panel paragraph).

Guide 01 — show all three engines

Pre-seed an OpenSearch + a Solr cluster via API before the landing shot, so the clusters list, the engine filter chips (all / elasticsearch / opensearch / solr), and the per-engine <EngineBadge> are backed by real rows of each engine (previously ES + OpenSearch only; no Solr).
Caption + GUIDE_REGISTRY/metadata.json description updated to name all three engines.

Videos

Both walkthrough.webm recaptured via the demo config + promote-videos.mjs.

Validation

Backend ruff + mypy clean; _test contract guard (53) green.
Guide vitest (42) green; tsc clean; full pre-commit gate passed.
Both specs pass against the live stack; API confirms the seeded study reports verdict=converged.

Notes

Test-only endpoint change (/api/v1/_test/..., 404 outside ENVIRONMENT=development) — no production surface.
The other 8 guides were not re-read individually in this pass; the PR-docs: reconcile docs with shipped Solr (three engines) + add loop diagram #354 audit flagged 01 + 06 as the confirmed-stale ones. A broader guide sweep can follow.

🤖 Generated with Claude Code

…MVP2) The in-app walkthrough assets for guides 01 (register cluster) and 06 (create + monitor study) were captured 2026-05-27, before the MVP2 batch landed, so they showed a pre-Solr, pre-convergence UI. Regenerate both — slides, captions, and videos — against the current stack. Guide 06 — showcase the convergence verdict: - The convergence classifier needs >= STUDIES_TPE_WARMUP_FLOOR (50) usable trials to emit a real verdict; a real study here scores 0.0 on every trial (seeded judgments don't match the local ES index), yielding a flat-zero curve + too_few_trials. Switch the guide-06 seed to the test-only seed-completed endpoint with 50 synthetic trials so the study-detail page renders a genuine `converged` verdict + best-so-far curve, plus the ConfidencePanel happy path. Best-so-far stays pinned to the unchanged 0.412 -> 0.487 winner story (every extra metric < 0.487). - Backend: add an optional `extra_trial_metrics: list[float]` to seed_study_completed_with_digest + the SeedCompletedStudyRequest schema (default None = unchanged 2-trial behaviour). Purely additive. - Spec: expand the convergence-curve <details> (auto-collapsed when converged) before the screenshot; wait on confidence + convergence panels. - Captions: refresh study-name references, the queued-filter note, and add the Convergence panel to the detail-page caption. Guide 01 — show all three engines: - Pre-seed an OpenSearch + a Solr cluster via API before the landing shot so the clusters list + engine filter chips (all/elasticsearch/opensearch/solr) + per-engine EngineBadge are backed by real rows of each engine. - Caption + GUIDE_REGISTRY/metadata description updated to name all three engines (was "Elasticsearch or OpenSearch"). Videos: recaptured both walkthrough.webm via the demo config + promote-videos. Validated: backend ruff+mypy clean; _test contract guard (53) green; guide vitest (42) green; tsc clean; both specs pass against the live stack; API confirms the seeded study reports verdict=converged, 50 trials, best 0.487. Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

gemini-code-assist

Code Review

This pull request introduces support for seeding additional trials (extra_trial_metrics) in completed studies to satisfy the convergence classifier's minimum trial requirements, allowing the <ConvergencePanel> to render a real verdict and curve in E2E tests and guides. It also updates the cluster registration guide and tests to showcase multi-engine support (Elasticsearch, OpenSearch, and Apache Solr). Feedback suggests optimizing the trial seeding loop in test_seeding.py to avoid excessive database round-trips, and adding assertions to ensure pre-seeding API calls succeed in the E2E tests.

gemini-code-assist · 2026-06-01T02:48:08Z

+    if extra_trial_metrics:
+        cursor = started + timedelta(milliseconds=2500)
+        for offset, metric in enumerate(extra_trial_metrics):
+            trial_number = 2 + offset
+            await repo.create_trial(
+                db,
+                id=str(uuid_utils.uuid7()),
+                study_id=study_id,
+                optuna_trial_number=trial_number,
+                params={"boost": round(1.5 + 0.1 * offset, 2)},
+                primary_metric=metric,
+                metrics={"ndcg@10": metric},
+                duration_ms=1100,
+                status="complete",
+                error=None,
+                started_at=cursor,
+                ended_at=cursor + timedelta(milliseconds=1100),
+            )
+            cursor = cursor + timedelta(milliseconds=1200)


Calling repo.create_trial inside a loop executes a db.flush() and db.refresh(trial) for every single trial. When seeding 48 extra trials, this results in 96 database round-trips, which significantly slows down the E2E test setup.

Since the returned Trial objects are not used in the loop, we can optimize this by directly instantiating Trial models and adding them to the session via db.add(). The subsequent complete_study call will flush all of them in a single database round-trip.

if extra_trial_metrics: from backend.app.db.models import Trial cursor = started + timedelta(milliseconds=2500) for offset, metric in enumerate(extra_trial_metrics): trial_number = 2 + offset db.add( Trial( id=str(uuid_utils.uuid7()), study_id=study_id, optuna_trial_number=trial_number, params={"boost": round(1.5 + 0.1 * offset, 2)}, primary_metric=metric, metrics={"ndcg@10": metric}, duration_ms=1100, status="complete", error=None, started_at=cursor, ended_at=cursor + timedelta(milliseconds=1100), ) ) cursor = cursor + timedelta(milliseconds=1200)

gemini-code-assist · 2026-06-01T02:48:08Z

+    for (const c of preSeed) {
+      const r = await apiCtx.post('/api/v1/clusters', { data: c });
+      if (r.status() === 201) {
+        const body = (await r.json()) as { id: string };
+        appendForCleanup('cluster', body.id);
+      }
+    }


The pre-seeding API calls are executed without asserting their success. If any of the pre-seed requests fail (e.g., due to database or network issues), the test will silently skip registering them for cleanup and continue, leading to confusing downstream failures or incorrect screenshots.

Adding an explicit assertion like expect(r.status()).toBe(201) ensures that any pre-seeding failure is caught immediately with a clear error message.

for (const c of preSeed) { const r = await apiCtx.post('/api/v1/clusters', { data: c }); expect(r.status()).toBe(201); const body = (await r.json()) as { id: string }; appendForCleanup('cluster', body.id); }

F2) The OpenSearch + Solr pre-seed POSTs in guide 01 were fire-and-forget — a silent failure (e.g. the wrong credentials_ref that 503s the probe, which bit this during development) would leave the landing screenshot missing an engine while the test still passed green. Assert 201 so any pre-seed failure surfaces immediately with a labeled error. Gemini F1 (batch the 48 extra create_trial flushes via db.add) rejected: keeps the repo-layer convention (every trial insert goes through repo.create_trial), and the per-flush cost is immaterial in a test-only seeder (the spec runs in ~8s). Consistency > micro-opt on a non-hot path. Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

SoundMindsAI · 2026-06-01T03:14:47Z

Gemini review adjudication + CI status

Gemini findings (both on this PR's actual changes, HEAD `79eeda78`)

#	File:line	Finding	Severity	Verdict	Action
F1	`test_seeding.py:194`	48 extra `create_trial` calls = 48 flush/refresh round-trips; batch via `db.add`	Medium	Reject (reasoned)	Keeps the repo-layer convention (every trial insert goes through `repo.create_trial`); per-flush cost is immaterial in a test-only seeder (spec runs ~8s). Consistency > micro-opt on a non-hot path.
F2	`01_*.spec.ts:79`	Pre-seed cluster POSTs unasserted — silent failure → wrong screenshot, green test	Medium	Accept	Fixed in `a3d235d5` — `expect(r.status()).toBe(201)`. This exact failure mode (wrong `credentials_ref` → 503) bit me during development; the assertion would have caught it immediately.

Backend CI — pre-existing test-isolation flakiness, NOT a regression

The backend + smoke jobs are red, but the diagnosis is conclusive that it's not this PR:

The failing set is unstable across identical re-runs — re-running the same commit produced a different failure list (e.g. test_enum_source_of_truth_helpers, test_openapi_surface, test_demo_seeding_ubi_full appeared only on the second run). Order-dependent pollution under pytest-randomly, not a deterministic break.
The failing tests pass in isolation — e.g. test_cluster_health_warmup.py (7/7) and test_capability_check.py run green locally; the failures are caplog-empty / assert [] shapes characteristic of cross-test log-capture pollution.
This PR touches only backend/app/api/v1/_test.py + backend/app/services/test_seeding.py (test-only seeding). None of the 21 failing tests, nor any of their systems-under-test, are touched by the diff — verified file-by-file.
Two recurring failures (test_judgment_generate click-bucket, test_migration_0021) are pre-existing MVP2 drift already tracked (the click-bucket one is issues Render the click (UBI) bucket in the judgment-list source-breakdown card #356/Update contract-test allowlists for MVP2 endpoints + enum value #357).

Touched-area tests are green locally: _test contract guard (53), convergence domain+service (45). The seeded study was verified via API: verdict=converged, 50 trials, best_metric=0.487.

Merging despite the pre-existing flaky reds (operator-confirmed). The broad backend test-isolation failure deserves its own bug — filing a follow-up.

🤖 Generated with Claude Code

…domly caplog pollution) Surfaced by PR #363 CI: ~15 caplog/contract tests fail nondeterministically under the full randomized suite (unstable failing set across identical re-runs) but pass in isolation. Distinct from the narrow bug_baseline_phase_test_isolation. Makes the backend coverage job nondeterministically red now that SKIP_HEAVY_CI is lifted. P1. Includes the lockstep dashboard + roadmap regen the pre-commit hook produces. Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

SoundMindsAI · 2026-06-01T03:26:59Z

Final CI status — merging

11/13 checks green — every deterministic gate: frontend, backend static (ruff+mypy+guards), both docker buildx, both license gates, fast-lane unit tests, DCO, gitleaks, secrets.

2 red, both pre-existing infra flakes with no causal link to this diff:

backend (… tests + coverage) — the pytest-ordering caplog test-isolation nondeterminism (failing set changes across identical re-runs; tests pass in isolation; diff touches none of the failing tests/SUTs). Filed P1: bug_backend_suite_nondeterministic_caplog_isolation.
smoke (operator-path tutorial flow) — failed at "Bring up the stack": container relyloop-solr-1 exited (1) during make up, before any test logic ran. All other services came up Healthy. A Solr-container startup flake; this PR changes nothing about the Solr Compose service / image / config.

This PR's diff is test-only seeding (_test.py, test_seeding.py) + guide specs/assets. Touched-area tests green locally (contract guard 53, convergence 45); seeded study verified verdict=converged. Merging per operator decision to proceed despite the pre-existing flaky reds.

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

SoundMindsAI merged commit f70e971 into main Jun 1, 2026
11 of 13 checks passed

SoundMindsAI deleted the chore/regen-stale-guide-screenshots branch June 1, 2026 03:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: regenerate stale guide screenshots/videos for guides 01 + 06 (MVP2)#363

chore: regenerate stale guide screenshots/videos for guides 01 + 06 (MVP2)#363
SoundMindsAI merged 3 commits into
mainfrom
chore/regen-stale-guide-screenshots

SoundMindsAI commented Jun 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Uh oh!

SoundMindsAI commented Jun 1, 2026

Uh oh!

SoundMindsAI commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SoundMindsAI commented Jun 1, 2026

What

Guide 06 — showcase the convergence verdict

Guide 01 — show all three engines

Videos

Validation

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

SoundMindsAI commented Jun 1, 2026

Gemini review adjudication + CI status

Gemini findings (both on this PR's actual changes, HEAD 79eeda78)

Backend CI — pre-existing test-isolation flakiness, NOT a regression

Uh oh!

SoundMindsAI commented Jun 1, 2026

Final CI status — merging

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Gemini findings (both on this PR's actual changes, HEAD `79eeda78`)