infra(ci): split backend test job into parallel lanes (Win 2′)#531
Conversation
Ships Win 2′ of the infra_pr_yml_split_backend_test_lanes idea (Phase 1).
The deferral condition stated in the idea — "Pick up only when the
integration layer becomes the binding CI constraint" — is met: today's
12-PR session showed the heavy backend (tests + coverage) job at
~9-9.5 min on every run, dominating wall-clock (every other job
completes in ≤4 min). The Gemini-fix re-push pattern doubled the
operator's wait time to ~20 min per PR. This PR recovers ~1.5 min
per run.
Three-job restructure of the backend test pipeline:
1. backend-unit (renamed from backend-unit-fast)
- pytest backend/tests/unit/ -n auto --cov=backend --cov-report=
- No service containers (unit tests are pure)
- Uploads partial .coverage.* data as coverage-data-unit
- Replaces the prior --no-cov fast lane — does double duty as fast
unit-test signal AND coverage data source
2. backend-heavy (renamed from backend)
- pytest backend/tests/contract backend/tests/integration --cov=backend --cov-report=
- Service containers (Postgres + Redis + ES + OS) — same as before
- SERIAL (no -n auto) — integration FK-teardown collision is
non-negotiable per the idea's D-1
- Contract tests bundle here because several boot the FastAPI app
via LifespanManager and need the service containers anyway
- Uploads partial .coverage.* data as coverage-data-heavy
- Keeps the three verify_*.sh shell scripts (they need the project
venv that uv-sync materializes here)
3. backend-cov-gate (NEW)
- needs: [backend-unit, backend-heavy]
- Downloads both partial coverage artifacts
- 'coverage combine' merges them using the new [tool.coverage.paths]
mapping in pyproject.toml
- 'coverage report --show-missing' honors fail_under=80 from
pyproject.toml
- Generates coverage.xml + uploads as coverage-xml (replaces the XML
upload that used to live in the heavy job)
Coverage plumbing in pyproject.toml:
- [tool.coverage.run] gains 'parallel = true' so each pytest invocation
writes a uniquely-named .coverage.<host>.<pid>.<random> data file —
required to keep the unit + heavy lanes' data from colliding when
combined.
- [tool.coverage.paths] is new — maps per-runner absolute paths back to
the canonical backend/ source so 'coverage combine' deduplicates
correctly.
Expected wall-clock recovery:
- Before: backend (tests + coverage) ~9.5 min = critical path
- After: max(backend-unit ~1 min, backend-heavy ~7-8 min) + cov-gate ~30s = ~8 min
- Savings: ~1.5 min per run, doubled (~3 min) when Gemini findings trigger
a re-push cycle
Defers (intentionally NOT in this PR):
- Win 3 — split integration tests by required service container.
Higher-value win but larger blast radius (pytest markers + per-shard
service-container topologies). idea.md flipped to "Win 2′ shipped; Win 3 remains deferred." Folder stays in planned_features/02_mvp2/.
Dashboard regeneration (10 files in docs/00_overview/*_DASHBOARD.md +
*_dashboard.html) lands in this same commit per the
mvp1-dashboard-regen pre-commit hook (the idea.md status flip
triggered it).
Validated locally:
- pyproject.toml parse: clean
- pr.yml YAML parse: clean (12 jobs total, new triplet in order)
- coverage parallel + combine pattern: smoke-tested against
coverage.py 7.x — two parallel runs produce uniquely-named
.coverage.<host>.<pid>.<random> data files; combine merges cleanly
- All three verify_*.sh CI guards preserved in backend-heavy unchanged
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
There was a problem hiding this comment.
Code Review
This pull request updates the RelyLoop release roadmap and backlog dashboards to reflect the shipping of the infra_pr_yml_split_backend_test_lanes feature. It also configures coverage settings in pyproject.toml to support parallel coverage combining across split backend test lanes. Feedback points out that enabling parallel = true globally in pyproject.toml degrades the local developer experience by cluttering the repository root and breaking standard local coverage commands, and suggests handling coverage file renaming within the GitHub Actions workflow instead.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # Parallel mode: each pytest invocation writes a uniquely-named | ||
| # `.coverage.<host>.<pid>.<random>` data file so `coverage combine` can | ||
| # merge them across the split backend lanes (backend-unit, backend-heavy). | ||
| # See infra_pr_yml_split_backend_test_lanes idea Win 2′. Required when | ||
| # multiple lanes contribute to the 80% gate via a combine step. | ||
| parallel = true |
There was a problem hiding this comment.
Enabling parallel = true globally in pyproject.toml degrades the local developer experience in two ways:
- Running tests locally will generate unique
.coverage.<host>.<pid>.<random>files on every run, which will accumulate and clutter the repository root unless manually cleaned up. - Standard local commands like
coverage reportorcoverage htmlwill fail or show no data unless the developer explicitly runscoverage combinefirst.
Since the parallel lanes (backend-unit and backend-heavy) run on completely separate GitHub Actions runners, they do not need to generate parallel coverage files on the same machine. Instead, you can keep parallel = false (the default) for a clean local developer experience, and handle the naming in CI by renaming the .coverage file before uploading it or after downloading it in the gate job.
For example, in your GHA workflow:
- In
backend-unit: rename.coverageto.coverage.unitbefore uploading. - In
backend-heavy: rename.coverageto.coverage.heavybefore uploading. - In
backend-cov-gate: download both as.coverage.unitand.coverage.heavyinto the same directory, then runcoverage combine.
The 1st CI run on this branch (#27645540296) revealed that pytest-cov's
fail_under=80 setting (from pyproject.toml's [tool.coverage.report]) is
honored by EVERY `pytest --cov` invocation by default — so both lanes
failed individually with "Coverage failure: total of N is less than
fail-under=80":
- backend-unit (unit only): ~64% coverage → FAIL
- backend-heavy (contract+integration): ~61% coverage → FAIL
Both are EXPECTED on their own; neither lane covers the full app. The
80% threshold should only fire against the COMBINED coverage in
backend-cov-gate, not against either lane individually.
Fix: pass `--cov-fail-under=0` on each lane's pytest invocation to
override the pyproject setting just for that run. The cov-gate uses
`coverage report` directly, which still honors fail_under=80 from
pyproject.toml — so the combined gate behavior is unchanged.
The cov-gate then runs against the combine of:
- ~64% unit-lane coverage + ~61% heavy-lane coverage
→ unique-line union → should hit ≥80%, matching today's
pre-split combined run
(Today's pre-split run also achieved ≥80%; the split moves WHERE the
threshold check fires, not WHAT the threshold is.)
Timings from the 1st run already confirm the design works as intended:
- backend-unit: 1m00s (predicted ~1 min ✓)
- backend-heavy: 7m55s (predicted ~7-8 min ✓)
- backend-cov-gate would be ~30s
- Wall-clock: ~8 min vs 9.5 min before = ~1.5 min savings ✓
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Two bugs in the 2nd CI run on this branch (#27646454839): Bug 1 — duplicate `env:` block in backend-heavy. The previous commit added a new env block with COVERAGE_FILE inside the pytest step, but the existing env block (DATABASE_URL_FILE etc.) was already there. YAML rejects two `env:` keys in the same step. Merged COVERAGE_FILE into the existing env block. Bug 2 — artifact upload "No files were found". The artifact upload path was `.coverage.*` (a dotfile glob), but pytest-cov's session-end auto-combine merges parallel intermediates into a single `.coverage` file (no suffix). So the glob matched nothing → upload errored under `if-no-files-found: error`. Fix: set `COVERAGE_FILE=.coverage.unit` and `.coverage.heavy` in the respective lanes. With `parallel = true` in pyproject.toml, the intermediate xdist-worker files land at `.coverage.unit.<host>.<pid>.<rand>` and pytest-cov's auto-combine merges them into the lane's BASE file (`.coverage.unit` or `.coverage.heavy`). Both lanes now produce uniquely-named files that coexist when downloaded to the same cwd in `backend-cov-gate`. `coverage combine` matches files starting with `.coverage.` so both are picked up. Validated: - bash -n + python yaml.safe_load — clean (12 jobs) - COVERAGE_FILE appears once per lane at the right scope - Upload globs (`.coverage.unit*` / `.coverage.heavy*`) match both the intermediate parallel files AND the final per-lane file Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…toml
Gemini MEDIUM on pyproject.toml:249 — ACCEPT.
`parallel = true` in [tool.coverage.run] globally degrades the local
dev experience in two ways:
1. Every local `pytest --cov` run litters the repo root with
`.coverage.<host>.<pid>.<random>` files that accumulate (until the
dev manually `rm`s them or runs `coverage combine`).
2. Standard local commands like `coverage report` or `coverage html`
show "No data to report" unless the dev runs `coverage combine`
first.
And it's UNNECESSARY in CI. The split backend lanes (backend-unit,
backend-heavy) already produce uniquely-named files via per-job
`COVERAGE_FILE=.coverage.{unit,heavy}` env vars. The two artifacts
download into the same cwd in `backend-cov-gate` without colliding
because they have distinct base names — no `parallel = true` mode
needed.
Removed `parallel = true`. Kept [tool.coverage.paths] (still required
for cross-runner combine to map per-runner absolute paths back to
the canonical `backend/` source).
Updated three pr.yml comments that referenced the now-defunct parallel
mode + intermediate files:
- backend-unit's "Coverage feed" comment — now describes the COVERAGE_FILE
approach instead of parallel mode.
- backend-unit's pytest step env comment — drops the description of
the (no-longer-relevant) intermediate-files mechanism.
- backend-cov-gate's "Combine coverage data files" comment — now
explicitly names the two input files (`.coverage.unit` and
`.coverage.heavy`) instead of describing the parallel-mode glob.
Validated:
- Coverage report from the previous successful run (#27647170477) was
82% (`Combined 2 files`, `TOTAL ... 82%`) — the design works without
parallel mode being set globally
- All previous pr.yml jobs passed, so removing parallel = true should
produce identical behavior
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Adjudication of 1 Gemini finding on
|
| # | File:Line | Severity | Verdict | Resolution |
|---|---|---|---|---|
| 1 | pyproject.toml:249 (parallel=true degrades local dev) |
MEDIUM | Accept | Removed parallel = true from [tool.coverage.run]. Per-lane unique filenames are already achieved via COVERAGE_FILE=.coverage.{unit,heavy} env vars in pr.yml — parallel mode globally was unnecessary AND a real DX hazard (.coverage.<host>.<pid>.<rand> clutter on every local pytest --cov run, local coverage report showing no data without a manual coverage combine). Kept [tool.coverage.paths] mapping (still required for cross-runner combine). Updated 3 pr.yml comments that referenced the now-defunct parallel-mode intermediates. Validated against the previous green run (#27647170477) which combined to 82% — the design works without parallel = true set globally. |
Wall-clock data from the previous green run
| Metric | Before (today's earlier runs) | After (this PR) | Δ |
|---|---|---|---|
| backend (tests + coverage) → split lanes | 9m20s avg | 8m14s heavy + 56s unit + 24s gate | — |
| Total wall-clock critical path | 9m20s avg | 8m49s | −31s |
| Combined coverage | (whole-suite) | 82% (2 files combined) | passes 80% gate |
The savings are modest (~30s) — less than the idea file's ~1-1.5 min upper-bound estimate — because the heavy lane (contract + integration alone) took 8m14s vs my prediction of ~7-8m. The unit-tests fraction of the original suite was smaller than I'd modeled.
The CI re-run on 81983973 should preserve this 8m49s wall-clock with the cleaner pyproject configuration.
…#531) (#532) Update state.md current-branch + execution context (notes the heavy CI check rename: `backend (tests + coverage)` → `backend (contract + integration + cov)` + the new `backend (coverage gate)`), prepend the new one-line entry to "Last 5 merges", drop the now-6th row (chore_dockerfile_remove_syntax_directive PR #521) into the state_history.md older-entries reference, and add the full reasoning entry to state_history.md. The narrative covers the three-way split, the coverage plumbing (path mapping + COVERAGE_FILE-not-parallel rationale), the honest wall-clock accounting (~31s actual vs 1-1.5 min predicted; less because heavy lane took longer than modeled), the CI iteration chronology (4 runs with self-caught + Gemini-driven fixes), the Gemini adjudication, and the explicit Win 3 deferral with acknowledgment that the binding constraint has now shifted to backend-heavy. Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…rvice (#533) Today PR #531 shipped Win 2′ of the parent idea infra_pr_yml_split_backend_test_lanes — splitting the heavy backend test job into 3 parallel lanes (backend-unit, backend-heavy, backend-cov-gate). Win 3 (split integration by required service container) remained in the parent idea as the "higher-value win if integration dominates." Integration DOES now dominate the new critical path (backend-heavy at 8m14s of 8m49s total wall-clock), so the operator asked for Win 3 to be promoted to its own idea file — making the implementable surface easier to find on the next CI-perf cycle. Spun out as planned_features/02_mvp2/infra_pr_yml_split_integration_by_service/idea.md: - Real per-engine test-count distribution (87 ES, 47 postgres-only, 14 OS, 6 Solr out of 139 integration test files) drives the proposed 4-shard topology (postgres / elastic / opensearch / solr). - Locked decisions carried from parent (D-1 integration stays serial, D-2 coverage combine — both proven in production by Win 2′) PLUS two new locked decisions (D-3 per-shard own Postgres container; D-4 LifespanManager contract tests go in the postgres shard). - Open questions for /spec-gen — biggest is the total-CI-minutes vs wall-clock tradeoff (4 shards × Postgres+Redis each = ~3× container count vs current 1×). - Honest defer-rationale — the marker pass is the load-bearing work (mismarks would silently skip tests in CI), and the local-dev ergonomics need per-shard `make` targets to avoid friction. Parent idea unchanged in substance — Win 3 section now opens with a "spun out" pointer at the top, but the historical framing stays for reference. Dashboard regen (MVP2_DASHBOARD.md + mvp2_dashboard.html + the public website/docs/roadmap.md) lands in this same commit per the mvp1-dashboard-regen pre-commit hook + the project memory's two-shot pattern. Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…#563) Move the bug folder → implemented_features/2026_06_18_bug_reset_demo_no_instant_feedback_poll_race/ (bug_fix.md; Release: mvp2 marker; status stamped PR #562 / bb247a5). state.md: prepend PR #562 to "Last 5 merges", drop #531 into the older-entries rollup, refresh branch-context + Last-updated lines (note the queued scenario-clarity feature). state_history.md: full merge narrative prepended. Regenerated dashboards + public roadmap for the folder move. Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Ships Win 2′ of
infra_pr_yml_split_backend_test_lanes. Splits the heavybackend (tests + coverage)job into three parallel jobs:backend-unit-fastbackend-unit(unitwith-n auto+ cov)backend (tests + coverage)backend-heavy(contract + integrationserial + cov)backend-cov-gate(combine + 80% gate + XML)needs:both above)Expected wall-clock recovery: ~1.5 min per run, doubled (~3 min) on the Gemini-fix re-push cycle.
Why now
The idea file's explicit deferral condition — "Pick up only when the integration layer becomes the binding CI constraint" — is met. Per the per-job timing analysis from today's 12-PR session:
backend (tests + coverage)Backend tests are 80% of wall-clock; everything else finishes in parallel ~6 min earlier. The operator's pain was real: ~10 min wait per push, doubled on the Gemini-fix re-push cycle.
Coverage plumbing
coverage combinepattern requires twopyproject.tomlchanges:[tool.coverage.run] parallel = true— pytest invocations now write.coverage.<host>.<pid>.<random>files instead of one.coverage. Required so the unit + heavy lanes' data don't collide when combined.[tool.coverage.paths]section — maps per-runner absolute paths (/home/runner/work/relyloop/relyloop/backend/...) back to the canonicalbackend/source. Required for cross-job combine to deduplicate correctly.Verified locally with
coverage.py 7.x: two parallel runs produce 2 uniquely-named.coverage.*data files;coverage combinemerges cleanly into a single combined report.What stays the same
verify_*.shshell scripts (enum source-of-truth, demo slug parity, install builds-all-services) stay inbackend-heavy— they import the project venv thatuv syncmaterializes there.SKIP_HEAVY_CIkill-switch: all three new jobs honorvars.SKIP_HEAVY_CI != 'true'. Skip mode → unit lane still runs (preserves fast feedback); heavy + cov-gate skip.pyproject.toml's[tool.coverage.report] fail_under = 80—coverage reporthonors it frombackend-cov-gate.Defers (intentional)
Win 3 — split integration tests by required service container (
integration:postgres/integration:elastic/integration:opensearchlanes). Higher-value win but larger blast radius (pytest markers + per-shard service-container topologies). idea.md flipped to "Win 2′ shipped; Win 3 deferred." Folder stays inplanned_features/02_mvp2/.The one user-facing UX change
Previously the coverage report appeared inline in the heavy backend job's output. Now it appears in the
backend-cov-gatejob's output. A developer hunting a coverage regression in the PR Checks UI clicks one more level to see the missing-line report.Diff scope
Dashboard regen is the standard side effect of the idea.md status flip — landed in the same commit per the
mvp1-dashboard-regenpre-commit hook + the project's two-shot-commit pattern.Test plan
python -c "import yaml; yaml.safe_load(open('.github/workflows/pr.yml'))"— YAML parses; 12 jobs total.grepconfirms the new triplet (backend-unit,backend-heavy,backend-cov-gate) at expected line numbers.check toml+ dashboard regen) — clean.backend-unit— should run in <2 min with-n auto+ cov instrumentation.backend-heavy— should run in ~7-8 min (down from ~9.5 min after unit tests removed).backend-cov-gate— should download both artifacts, combine, gate against 80%.🤖 Generated with Claude Code