diff --git a/Makefile b/Makefile index 754cef5c..fea52e17 100644 --- a/Makefile +++ b/Makefile @@ -4,7 +4,8 @@ .DEFAULT_GOAL := help .PHONY: help fmt lint typecheck test test-unit test-integration test-contract \ - ui-lint ui-typecheck ui-test ui-build ui-dev \ + backend-fmt backend-lint backend-typecheck \ + ui-fmt ui-lint ui-typecheck ui-test ui-build ui-dev \ pre-commit pre-commit-install \ up down restart logs reset migrate migrate-create seed-clusters seed-es \ dev dashboard @@ -21,18 +22,31 @@ help: ## Show this help message @echo " invoke manually only to apply a freshly-authored revision without bouncing.)" @echo "" -# ---------- Code quality ---------- +# ---------- Backend code quality (per infra_make_targets_split_backend_only) ---------- +# +# `backend-*` sub-targets mirror the existing `ui-*` siblings below. +# Backend-only contributors (humans or agents on backend/*.py changes) +# can run these without tripping the UI tooling's Node ≥20.18 engine +# guard. The bundled `fmt`/`lint`/`typecheck` targets compose +# `backend-*` + `ui-*` so CI behavior is unchanged. -fmt: ## Format Python (ruff format) and frontend (prettier) +backend-fmt: ## Format Python (ruff format) — backend only, skips UI uv run ruff format . - pnpm --dir ui format -lint: ui-lint ## Lint Python (ruff check) and frontend (eslint) +backend-lint: ## Lint Python (ruff check) — backend only, skips UI uv run ruff check . -typecheck: ui-typecheck ## Type-check Python (mypy --strict) and frontend (tsc --noEmit) +backend-typecheck: ## Type-check Python (mypy --strict) — backend only, skips UI uv run mypy backend/ +# ---------- Code quality (composed: backend + UI) ---------- + +fmt: backend-fmt ui-fmt ## Format Python (ruff format) and frontend (prettier) + +lint: backend-lint ui-lint ## Lint Python (ruff check) and frontend (eslint) + +typecheck: backend-typecheck ui-typecheck ## Type-check Python (mypy --strict) and frontend (tsc --noEmit) + # ---------- Tests ---------- test: test-unit test-integration test-contract ui-test ## Run all backend + UI test layers @@ -59,6 +73,9 @@ pre-commit-install: ## Install pre-commit hooks (commit-msg + pre-commit stages # ---------- Frontend (UI) ---------- +ui-fmt: ## Format frontend (prettier) + pnpm --dir ui format + ui-lint: ## Lint frontend (next lint / eslint) pnpm --dir ui lint diff --git a/docs/00_overview/MVP1_DASHBOARD.md b/docs/00_overview/MVP1_DASHBOARD.md index 5d239b35..d3bfa4ea 100644 --- a/docs/00_overview/MVP1_DASHBOARD.md +++ b/docs/00_overview/MVP1_DASHBOARD.md @@ -82,8 +82,8 @@ _None._ | Feature | Type | One-liner | Depends on | Status | |---|---|---|---|---| -| [infra_arq_subprocess_test](../02_product/planned_features/infra_arq_subprocess_test/idea.md) | Infra | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review) | — | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review) | -| [infra_make_targets_split_backend_only](../02_product/planned_features/infra_make_targets_split_backend_only/idea.md) | Infra | The current `Makefile` bundles backend (ruff, mypy) and frontend (prettier, eslint, tsc) tooling under each top-level target: | — | Idea — captured during feat_judgments_periodic_resume_sweep impl-execute tangential sweep | +| [infra_arq_subprocess_test](../02_product/planned_features/infra_arq_subprocess_test/idea.md) | Infra | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly; | — | Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly; a subprocess test would add a narrow Arq-version-regression guard. | +| [infra_make_targets_split_backend_only](../02_product/planned_features/infra_make_targets_split_backend_only/idea.md) | Infra | The current `Makefile` bundles backend (ruff, mypy) and frontend (prettier, eslint, tsc) tooling under each top-level target. Pre-fix shape was ([`Makefile:26-34` in the pre-PR-110 main](../../Makefil | — | Idea — ready for `/impl-execute --ad-hoc`. | | [chore_chat_last_message_preview](../02_product/planned_features/chore_chat_last_message_preview/idea.md) | Chore | The `/chat` list page (`ui/src/app/chat/page.tsx`) shows each conversation row as `title + relative timestamp + "{N} messages"`. There is no preview of the last message — operators with several simila | — | — | | [chore_demo_recording_mvp3](../02_product/planned_features/chore_demo_recording_mvp3/idea.md) | Chore | | — | — | | [chore_digest_worker_narrow_except](../02_product/planned_features/chore_digest_worker_narrow_except/idea.md) | Chore | … | — | Idea (deferred from Gemini Code Assist Finding #2 on [PR #92](https://github.com/SoundMindsAI/relyloop/pull/92)) | diff --git a/docs/00_overview/mvp1_dashboard.html b/docs/00_overview/mvp1_dashboard.html index 9da37473..339bf782 100644 --- a/docs/00_overview/mvp1_dashboard.html +++ b/docs/00_overview/mvp1_dashboard.html @@ -380,7 +380,7 @@

Idea 7

Infra -
Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review)
+
Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly;
@@ -392,7 +392,7 @@

Idea 7

Infra -
The current `Makefile` bundles backend (ruff, mypy) and frontend (prettier, eslint, tsc) tooling under each top-level target:
+
The current `Makefile` bundles backend (ruff, mypy) and frontend (prettier, eslint, tsc) tooling under each top-level target. Pre-fix shape was ([`Makefile:26-34` in the pre-PR-110 main](../../Makefil
diff --git a/docs/02_product/planned_features/infra_arq_subprocess_test/idea.md b/docs/02_product/planned_features/infra_arq_subprocess_test/idea.md index ac9855f9..be7f1dcc 100644 --- a/docs/02_product/planned_features/infra_arq_subprocess_test/idea.md +++ b/docs/02_product/planned_features/infra_arq_subprocess_test/idea.md @@ -1,32 +1,21 @@ # infra — subprocess-driven Arq worker test for FR-5 / AC-4 **Date:** 2026-05-10 -**Status:** Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review) -**Origin:** GPT-5.5 final-review finding #8 — Story 2.3 task 4 called -for a subprocess fixture that spawns -`arq backend.workers.all.WorkerSettings`, runs N seconds, SIGTERMs, -then restarts, and asserts trials continue. PR #25 shipped the in- -process equivalent (`test_study_resume.py::test_on_startup_resumes_*`) -because spawning a real Arq worker requires Redis + DB connectivity -from the test process and stable lifecycle hooks. +**Preflighted:** 2026-05-14 — confirmed test functions still exist at cited paths; `_subprocess_helpers/` directory already in place from PR #20 precedent; arq still at `>=0.26` (`0.28.0` in `uv.lock`); two crons now registered in `WorkerSettings.cron_jobs` (added context note). +**Status:** Idea (deferred from `feat_study_lifecycle` Phase 2 / PR #25 final GPT-5.5 review). Still applicable as of 2026-05-14: the three in-process tests cited below still cover the resume contract correctly; a subprocess test would add a narrow Arq-version-regression guard. +**Origin:** GPT-5.5 final-review finding #8 on PR #25 — Story 2.3 task 4 called for a subprocess fixture that spawns `arq backend.workers.all.WorkerSettings`, runs N seconds, SIGTERMs, then restarts, and asserts trials continue. PR #25 shipped the in-process equivalent because spawning a real Arq worker requires Redis + DB connectivity from the test process and stable lifecycle hooks. ## Why deferred -* The three test cases shipped at the unit/integration boundary - (`test_on_startup_resumes_every_running_study`, - `test_resume_study_idempotent_on_already_running`, - `test_resume_study_wrapper_delegates_to_start_study`) verify the - resume contract surfaces — sweep correctness + idempotent resume + - wrapper dispatch. -* A subprocess test would additionally catch Arq-version-specific - wiring regressions (Arq's `WorkerSettings` schema changes, function - registration semantics, on_startup hook signature drift). -* `docs/03_runbooks/study-lifecycle-debugging.md` documents the - manual SIGTERM dance for operators. +* The three test cases shipped at the unit/integration boundary verify the resume contract surfaces — sweep correctness + idempotent resume + wrapper dispatch. They live in [`backend/tests/integration/test_study_resume.py`](../../../../backend/tests/integration/test_study_resume.py): `test_on_startup_resumes_every_running_study`, `test_resume_study_idempotent_on_already_running`, `test_resume_study_wrapper_delegates_to_start_study`. +* A subprocess test would additionally catch Arq-version-specific wiring regressions (Arq's `WorkerSettings` schema changes, function registration semantics, `on_startup` hook signature drift, **cron-jobs registry drift now that `WorkerSettings.cron_jobs` carries two crons** — `reconcile_pr_state` from `feat_github_webhook` and `resume_stuck_judgment_lists` from `feat_judgments_periodic_resume_sweep`). +* [`docs/03_runbooks/study-lifecycle-debugging.md`](../../../03_runbooks/study-lifecycle-debugging.md) documents the manual SIGTERM dance for operators — verified present, still authoritative. ## Proposed fix -Add `backend/tests/integration/_subprocess_helpers/orchestrator_restart.py`: +The `_subprocess_helpers/` directory **already exists** ([`backend/tests/integration/_subprocess_helpers/`](../../../../backend/tests/integration/_subprocess_helpers/)) with one entrypoint: `run_trial_with_test_stubs.py` (shipped with `infra_optuna_eval` Story 3.1, used by [`test_run_trial_partial_failure.py`](../../../../backend/tests/integration/test_run_trial_partial_failure.py)). Add a **second** entrypoint there: + +`backend/tests/integration/_subprocess_helpers/orchestrator_restart.py`: ```python @asynccontextmanager @@ -40,20 +29,48 @@ async def arq_worker_subprocess() -> AsyncIterator[Process]: Then in `backend/tests/integration/test_study_resume_subprocess.py`: 1. Seed a running study with `max_trials=100, parallelism=4`. -2. `async with arq_worker_subprocess() as p:` — wait until 20 trials - commit; SIGTERM p; restart `arq_worker_subprocess()`. -3. Within 30s, new trials accumulate from `optuna_trial_number 21+`; - study eventually completes at trial 100. +2. `async with arq_worker_subprocess() as p:` — wait until 20 trials commit; SIGTERM p; restart `arq_worker_subprocess()`. +3. Within 30s, new trials accumulate from `optuna_trial_number 21+`; study eventually completes at trial 100. + +### Why an `@asynccontextmanager` (different from the existing precedent) + +The shipped `run_trial_with_test_stubs.py` is invoked via synchronous `subprocess.run(...)` in `test_run_trial_partial_failure.py:_run_subprocess_with_fault` — it blocks until the helper exits, returning the exit code. That shape works for "spawn helper, wait for it to fault on a specific seam, assert exit code." + +This idea's test fundamentally needs **mid-test control over the subprocess** (kill while it's mid-loop, observe DB state, restart). A sync `subprocess.run()` can't do that — it's await-or-block. The `@asynccontextmanager + Process` shape lets the test: + +* Spawn via `asyncio.create_subprocess_exec(...)`. +* Poll the DB on the test's event loop while the worker runs. +* `p.terminate()` (or `os.kill(p.pid, signal.SIGTERM)`) on a deterministic condition. +* `await p.wait()` for clean shutdown. + +These are different test shapes solving different problems; they coexist cleanly. + +### Stub-survival across the subprocess boundary + +The existing precedent at `run_trial_with_test_stubs.py:33-71` shows how to thread test doubles into a fresh Python interpreter (env-var-passed JSON blobs reinstalled inside `_main`). The new helper will need a similar pattern for any stubs the resume test needs — but the resume test arguably needs **fewer** stubs than the partial-failure test, since the goal is to exercise the REAL `on_startup` sweep + REAL `run_trial` against a REAL Optuna RDBStorage. Open question: does the test stub the engine adapter (`build_adapter`) at all, or accept that trials will fail against a non-running Elasticsearch and verify resume of `status='failed'` rows instead of `complete` ones? Locked recommendation: stub the engine adapter the same way `run_trial_with_test_stubs.py` does so trials genuinely complete; the test isn't trying to exercise the engine path. ## Scope signals -* Backend: yes (test infrastructure). -* Frontend: no. -* Migration: no. -* Config: no. +* **Backend:** yes (test infrastructure only — 2 new files: the helper at `_subprocess_helpers/orchestrator_restart.py` and the test at `test_study_resume_subprocess.py`). +* **Frontend:** no. +* **Migration:** no. +* **Config:** no. +* **Audit events:** N/A (test-only). +* **CLAUDE.md absolute-rules walked:** none implicated. No schema, no API, no LLM call, no secret, no engine-adapter dispatch, no audit_log emission, no ``, no `/healthz`. + +## Acceptance test + +After this lands, the following sequence should work without `ERR_PNPM_UNSUPPORTED_ENGINE` on a host with Node 18: + +```bash +make backend-fmt && make backend-lint && make backend-typecheck +``` + +And the existing CI-visible behavior remains: + +```bash +make fmt # still runs ruff + prettier +make lint # still runs ruff + eslint +make typecheck # still runs mypy + tsc +``` + +CI (`.github/workflows/pr.yml`) continues to call `make lint`/`make typecheck` and sees no behavior change because both target compositions still run. -## Why deferred +## Why deferred (originally) -The fix is ~10 lines of Makefile but it's out of scope for `feat_judgments_periodic_resume_sweep` (a worker-runtime feature, not a build-tooling feature). Bundling it into that PR would have muddied the diff. Capturing here for a future `/impl-execute --ad-hoc` infra-sweep PR. +The fix is ~12 lines of Makefile but it was out of scope for `feat_judgments_periodic_resume_sweep` (a worker-runtime feature, not build-tooling). Bundling it into that PR would have muddied the diff. Now ripe for `/impl-execute --ad-hoc` as a standalone infra commit — no spec/plan ceremony needed. ## Relationship to other work -- Sibling chore: `infra_dashboard_regen_pre_commit_conflict` (also captured during this same impl-execute session). -- No interference with planned MVP2 work. +- Sibling chore: [`infra_dashboard_regen_pre_commit_conflict`](../../../00_overview/implemented_features/2026_05_14_infra_dashboard_regen_pre_commit_conflict/idea.md) (shipped 2026-05-14 as PR #108 — `_maybe_write` idempotency + path-rewriter helpers). Both originated from the same `feat_judgments_periodic_resume_sweep` tangential sweep; same operational-friction class (build-tooling and pre-commit-hook drift). +- Builds on the existing `ui-*` sub-targets in [`Makefile`](../../../../Makefile) (precedent for the scope-first naming). +- Doesn't interfere with any active or backlogged feature.