Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 23 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@

.DEFAULT_GOAL := help
.PHONY: help fmt lint typecheck test test-unit test-integration test-contract \
ui-lint ui-typecheck ui-test ui-build ui-dev \
backend-fmt backend-lint backend-typecheck \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When adding the ui-fmt target for symmetry (as suggested below), ensure it is included in the .PHONY list to prevent conflicts with any potential directory named ui-fmt.

        backend-fmt backend-lint backend-typecheck ui-fmt \

ui-fmt ui-lint ui-typecheck ui-test ui-build ui-dev \
pre-commit pre-commit-install \
up down restart logs reset migrate migrate-create seed-clusters seed-es \
dev dashboard
Expand All @@ -21,18 +22,31 @@ help: ## Show this help message
@echo " invoke manually only to apply a freshly-authored revision without bouncing.)"
@echo ""

# ---------- Code quality ----------
# ---------- Backend code quality (per infra_make_targets_split_backend_only) ----------
#
# `backend-*` sub-targets mirror the existing `ui-*` siblings below.
# Backend-only contributors (humans or agents on backend/*.py changes)
# can run these without tripping the UI tooling's Node ≥20.18 engine
# guard. The bundled `fmt`/`lint`/`typecheck` targets compose
# `backend-*` + `ui-*` so CI behavior is unchanged.

fmt: ## Format Python (ruff format) and frontend (prettier)
backend-fmt: ## Format Python (ruff format) — backend only, skips UI
uv run ruff format .
pnpm --dir ui format

lint: ui-lint ## Lint Python (ruff check) and frontend (eslint)
backend-lint: ## Lint Python (ruff check) — backend only, skips UI
uv run ruff check .

typecheck: ui-typecheck ## Type-check Python (mypy --strict) and frontend (tsc --noEmit)
backend-typecheck: ## Type-check Python (mypy --strict) — backend only, skips UI
uv run mypy backend/

# ---------- Code quality (composed: backend + UI) ----------

fmt: backend-fmt ui-fmt ## Format Python (ruff format) and frontend (prettier)

lint: backend-lint ui-lint ## Lint Python (ruff check) and frontend (eslint)

typecheck: backend-typecheck ui-typecheck ## Type-check Python (mypy --strict) and frontend (tsc --noEmit)

# ---------- Tests ----------

test: test-unit test-integration test-contract ui-test ## Run all backend + UI test layers
Expand All @@ -59,6 +73,9 @@ pre-commit-install: ## Install pre-commit hooks (commit-msg + pre-commit stages

# ---------- Frontend (UI) ----------

ui-fmt: ## Format frontend (prettier)
pnpm --dir ui format

ui-lint: ## Lint frontend (next lint / eslint)
pnpm --dir ui lint

Expand Down
4 changes: 2 additions & 2 deletions docs/00_overview/MVP1_DASHBOARD.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,8 @@ _None._

| Feature | Type | One-liner | Depends on | Status |
|---|---|---|---|---|
| [infra_arq_subprocess_test](../02_product/planned_features/infra_arq_subprocess_test/idea.md) | Infra | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review) | — | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review) |
| [infra_make_targets_split_backend_only](../02_product/planned_features/infra_make_targets_split_backend_only/idea.md) | Infra | The current `Makefile` bundles backend (ruff, mypy) and frontend (prettier, eslint, tsc) tooling under each top-level target: | — | Idea — captured during feat_judgments_periodic_resume_sweep impl-execute tangential sweep |
| [infra_arq_subprocess_test](../02_product/planned_features/infra_arq_subprocess_test/idea.md) | Infra | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly; | — | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly; a subprocess test would add a narrow Arq-version-regression guard. |
| [infra_make_targets_split_backend_only](../02_product/planned_features/infra_make_targets_split_backend_only/idea.md) | Infra | The current `Makefile` bundles backend (ruff, mypy) and frontend (prettier, eslint, tsc) tooling under each top-level target. Pre-fix shape was ([`Makefile:26-34` in the pre-PR-110 main](../../Makefil | — | Idea — ready for `/impl-execute --ad-hoc`. |
| [chore_chat_last_message_preview](../02_product/planned_features/chore_chat_last_message_preview/idea.md) | Chore | The `/chat` list page (`ui/src/app/chat/page.tsx`) shows each conversation row as `title + relative timestamp + "{N} messages"`. There is no preview of the last message — operators with several simila | — | — |
| [chore_demo_recording_mvp3](../02_product/planned_features/chore_demo_recording_mvp3/idea.md) | Chore | | — | — |
| [chore_digest_worker_narrow_except](../02_product/planned_features/chore_digest_worker_narrow_except/idea.md) | Chore | … | — | Idea (deferred from Gemini Code Assist Finding #2 on [PR #92](https://github.com/SoundMindsAI/relyloop/pull/92)) |
Expand Down
4 changes: 2 additions & 2 deletions docs/00_overview/mvp1_dashboard.html
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,7 @@ <h3>Idea <span class="count">7</span></h3>
<span class="badge infra">Infra</span>

</div>
<div class="one-liner">Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review)</div>
<div class="one-liner">Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly; </div>


</div>
Expand All @@ -392,7 +392,7 @@ <h3>Idea <span class="count">7</span></h3>
<span class="badge infra">Infra</span>

</div>
<div class="one-liner">The current `Makefile` bundles backend (ruff, mypy) and frontend (prettier, eslint, tsc) tooling under each top-level target:</div>
<div class="one-liner">The current `Makefile` bundles backend (ruff, mypy) and frontend (prettier, eslint, tsc) tooling under each top-level target. Pre-fix shape was ([`Makefile:26-34` in the pre-PR-110 main](../../Makefil</div>


</div>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,32 +1,21 @@
# infra — subprocess-driven Arq worker test for FR-5 / AC-4

**Date:** 2026-05-10
**Status:** Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review)
**Origin:** GPT-5.5 final-review finding #8 — Story 2.3 task 4 called
for a subprocess fixture that spawns
`arq backend.workers.all.WorkerSettings`, runs N seconds, SIGTERMs,
then restarts, and asserts trials continue. PR #25 shipped the in-
process equivalent (`test_study_resume.py::test_on_startup_resumes_*`)
because spawning a real Arq worker requires Redis + DB connectivity
from the test process and stable lifecycle hooks.
**Preflighted:** 2026-05-14 — confirmed test functions still exist at cited paths; `_subprocess_helpers/` directory already in place from PR #20 precedent; arq still at `>=0.26` (`0.28.0` in `uv.lock`); two crons now registered in `WorkerSettings.cron_jobs` (added context note).
**Status:** Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly; a subprocess test would add a narrow Arq-version-regression guard.
**Origin:** GPT-5.5 final-review finding #8 on PR #25 — Story 2.3 task 4 called for a subprocess fixture that spawns `arq backend.workers.all.WorkerSettings`, runs N seconds, SIGTERMs, then restarts, and asserts trials continue. PR #25 shipped the in-process equivalent because spawning a real Arq worker requires Redis + DB connectivity from the test process and stable lifecycle hooks.

## Why deferred

* The three test cases shipped at the unit/integration boundary
(`test_on_startup_resumes_every_running_study`,
`test_resume_study_idempotent_on_already_running`,
`test_resume_study_wrapper_delegates_to_start_study`) verify the
resume contract surfaces — sweep correctness + idempotent resume +
wrapper dispatch.
* A subprocess test would additionally catch Arq-version-specific
wiring regressions (Arq's `WorkerSettings` schema changes, function
registration semantics, on_startup hook signature drift).
* `docs/03_runbooks/study-lifecycle-debugging.md` documents the
manual SIGTERM dance for operators.
* The three test cases shipped at the unit/integration boundary verify the resume contract surfaces — sweep correctness + idempotent resume + wrapper dispatch. They live in [`backend/tests/integration/test_study_resume.py`](../../../../backend/tests/integration/test_study_resume.py): `test_on_startup_resumes_every_running_study`, `test_resume_study_idempotent_on_already_running`, `test_resume_study_wrapper_delegates_to_start_study`.
* A subprocess test would additionally catch Arq-version-specific wiring regressions (Arq's `WorkerSettings` schema changes, function registration semantics, `on_startup` hook signature drift, **cron-jobs registry drift now that `WorkerSettings.cron_jobs` carries two crons** — `reconcile_pr_state` from `feat_github_webhook` and `resume_stuck_judgment_lists` from `feat_judgments_periodic_resume_sweep`).
* [`docs/03_runbooks/study-lifecycle-debugging.md`](../../../03_runbooks/study-lifecycle-debugging.md) documents the manual SIGTERM dance for operators — verified present, still authoritative.

## Proposed fix

Add `backend/tests/integration/_subprocess_helpers/orchestrator_restart.py`:
The `_subprocess_helpers/` directory **already exists** ([`backend/tests/integration/_subprocess_helpers/`](../../../../backend/tests/integration/_subprocess_helpers/)) with one entrypoint: `run_trial_with_test_stubs.py` (shipped with `infra_optuna_eval` Story 3.1, used by [`test_run_trial_partial_failure.py`](../../../../backend/tests/integration/test_run_trial_partial_failure.py)). Add a **second** entrypoint there:

`backend/tests/integration/_subprocess_helpers/orchestrator_restart.py`:

```python
@asynccontextmanager
Expand All @@ -40,20 +29,48 @@ async def arq_worker_subprocess() -> AsyncIterator[Process]:
Then in `backend/tests/integration/test_study_resume_subprocess.py`:

1. Seed a running study with `max_trials=100, parallelism=4`.
2. `async with arq_worker_subprocess() as p:` — wait until 20 trials
commit; SIGTERM p; restart `arq_worker_subprocess()`.
3. Within 30s, new trials accumulate from `optuna_trial_number 21+`;
study eventually completes at trial 100.
2. `async with arq_worker_subprocess() as p:` — wait until 20 trials commit; SIGTERM p; restart `arq_worker_subprocess()`.
3. Within 30s, new trials accumulate from `optuna_trial_number 21+`; study eventually completes at trial 100.

### Why an `@asynccontextmanager` (different from the existing precedent)

The shipped `run_trial_with_test_stubs.py` is invoked via synchronous `subprocess.run(...)` in `test_run_trial_partial_failure.py:_run_subprocess_with_fault` — it blocks until the helper exits, returning the exit code. That shape works for "spawn helper, wait for it to fault on a specific seam, assert exit code."

This idea's test fundamentally needs **mid-test control over the subprocess** (kill while it's mid-loop, observe DB state, restart). A sync `subprocess.run()` can't do that — it's await-or-block. The `@asynccontextmanager + Process` shape lets the test:

* Spawn via `asyncio.create_subprocess_exec(...)`.
* Poll the DB on the test's event loop while the worker runs.
* `p.terminate()` (or `os.kill(p.pid, signal.SIGTERM)`) on a deterministic condition.
* `await p.wait()` for clean shutdown.

These are different test shapes solving different problems; they coexist cleanly.

### Stub-survival across the subprocess boundary

The existing precedent at `run_trial_with_test_stubs.py:33-71` shows how to thread test doubles into a fresh Python interpreter (env-var-passed JSON blobs reinstalled inside `_main`). The new helper will need a similar pattern for any stubs the resume test needs — but the resume test arguably needs **fewer** stubs than the partial-failure test, since the goal is to exercise the REAL `on_startup` sweep + REAL `run_trial` against a REAL Optuna RDBStorage. Open question: does the test stub the engine adapter (`build_adapter`) at all, or accept that trials will fail against a non-running Elasticsearch and verify resume of `status='failed'` rows instead of `complete` ones? Locked recommendation: stub the engine adapter the same way `run_trial_with_test_stubs.py` does so trials genuinely complete; the test isn't trying to exercise the engine path.

## Scope signals

* Backend: yes (test infrastructure).
* Frontend: no.
* Migration: no.
* Config: no.
* **Backend:** yes (test infrastructure only — 2 new files: the helper at `_subprocess_helpers/orchestrator_restart.py` and the test at `test_study_resume_subprocess.py`).
* **Frontend:** no.
* **Migration:** no.
* **Config:** no.
* **Audit events:** N/A (test-only).
* **CLAUDE.md absolute-rules walked:** none implicated. No schema, no API, no LLM call, no secret, no engine-adapter dispatch, no audit_log emission, no `<select>` option list.

## Why this isn't a blocker today

The resume contract is functionally tested. The subprocess test catches
a narrow class of Arq-version-specific regressions; the next time we
bump Arq's version we should consider adding it.
The resume contract is functionally tested. The subprocess test catches a narrow class of Arq-version-specific regressions; the next time we bump Arq's version we should consider adding it.

**Concrete trigger lock:** ship this when one of the following happens, whichever comes first:

1. The `arq>=0.26` pin in `pyproject.toml` is bumped to a new minor version (e.g., `>=0.29`). Arq's `WorkerSettings` schema changes have historically landed in minor releases; the subprocess test is the cheapest way to detect schema breakage.
2. A third cron job lands in `WorkerSettings.cron_jobs` (current count: 2). At three or more crons the registry shape becomes worth a smoke test that exercises the real `arq` CLI invocation, not just an in-process import.
3. MVP3 hardening explicitly opts in.

## Relationship to other work

* [`feat_study_lifecycle` Phase 2 (PR #25)](../../../00_overview/implemented_features/2026_05_10_feat_study_lifecycle/) — the origin. The three in-process tests at `test_study_resume.py` are the deferred-from work.
* [`infra_optuna_eval` Story 3.1 (PR #20)](../../../00_overview/implemented_features/2026_05_10_infra_optuna_eval/) — established the `_subprocess_helpers/` pattern with `run_trial_with_test_stubs.py`. This idea adds a sibling entrypoint to the same directory.
* [`feat_github_webhook`](../../../00_overview/implemented_features/2026_05_12_feat_github_webhook/) + [`feat_judgments_periodic_resume_sweep`](../../../00_overview/implemented_features/2026_05_14_feat_judgments_periodic_resume_sweep/) — both shipped after this idea was written; both register cron jobs in `WorkerSettings.cron_jobs`. The subprocess test would now incidentally smoke-test cron-jobs-registry correctness too (any cron-registration regression would surface as worker-boot failure in the spawned subprocess).
* Not blocking and not blocked by anything currently in the backlog.
Loading
Loading