Skip to content

infra(ci): split backend test job into parallel lanes (Win 2′)#531

Merged
SoundMindsAI merged 4 commits into
mainfrom
infra_pr_yml_split_backend_test_lanes
Jun 16, 2026
Merged

infra(ci): split backend test job into parallel lanes (Win 2′)#531
SoundMindsAI merged 4 commits into
mainfrom
infra_pr_yml_split_backend_test_lanes

Conversation

@SoundMindsAI

Copy link
Copy Markdown
Owner

Summary

Ships Win 2′ of infra_pr_yml_split_backend_test_lanes. Splits the heavy backend (tests + coverage) job into three parallel jobs:

Old New Time Service containers
backend-unit-fast backend-unit (unit with -n auto + cov) ~1 min None
backend (tests + coverage) backend-heavy (contract + integration serial + cov) ~7-8 min Postgres + Redis + ES + OS
(n/a) backend-cov-gate (combine + 80% gate + XML) ~30s None (needs: both above)

Expected wall-clock recovery: ~1.5 min per run, doubled (~3 min) on the Gemini-fix re-push cycle.

Why now

The idea file's explicit deferral condition — "Pick up only when the integration layer becomes the binding CI constraint" — is met. Per the per-job timing analysis from today's 12-PR session:

Job Avg Critical path?
backend (tests + coverage) 9m20s YES — sole bottleneck
frontend (lint + typecheck + tests + build) 3m32s No (parallel)
static-checks (frontend) 3m07s No (parallel)
docker buildx (api) 1m52s No
6 other jobs (static-checks, license, generated, fast-lane) ≤45s each No

Backend tests are 80% of wall-clock; everything else finishes in parallel ~6 min earlier. The operator's pain was real: ~10 min wait per push, doubled on the Gemini-fix re-push cycle.

Coverage plumbing

coverage combine pattern requires two pyproject.toml changes:

  1. [tool.coverage.run] parallel = true — pytest invocations now write .coverage.<host>.<pid>.<random> files instead of one .coverage. Required so the unit + heavy lanes' data don't collide when combined.
  2. New [tool.coverage.paths] section — maps per-runner absolute paths (/home/runner/work/relyloop/relyloop/backend/...) back to the canonical backend/ source. Required for cross-job combine to deduplicate correctly.

Verified locally with coverage.py 7.x: two parallel runs produce 2 uniquely-named .coverage.* data files; coverage combine merges cleanly into a single combined report.

What stays the same

  • The three verify_*.sh shell scripts (enum source-of-truth, demo slug parity, install builds-all-services) stay in backend-heavy — they import the project venv that uv sync materializes there.
  • SKIP_HEAVY_CI kill-switch: all three new jobs honor vars.SKIP_HEAVY_CI != 'true'. Skip mode → unit lane still runs (preserves fast feedback); heavy + cov-gate skip.
  • 80% coverage threshold from pyproject.toml's [tool.coverage.report] fail_under = 80coverage report honors it from backend-cov-gate.

Defers (intentional)

Win 3 — split integration tests by required service container (integration:postgres / integration:elastic / integration:opensearch lanes). Higher-value win but larger blast radius (pytest markers + per-shard service-container topologies). idea.md flipped to "Win 2′ shipped; Win 3 deferred." Folder stays in planned_features/02_mvp2/.

The one user-facing UX change

Previously the coverage report appeared inline in the heavy backend job's output. Now it appears in the backend-cov-gate job's output. A developer hunting a coverage regression in the PR Checks UI clicks one more level to see the missing-line report.

Diff scope

.github/workflows/pr.yml                       +173/-34
pyproject.toml                                 +20/-0
docs/.../infra_pr_yml_split_backend_test_lanes/idea.md  +4/-2
docs/00_overview/{MVP2_DASHBOARD,backlog_dashboard,...}  auto-regen (10 files)

Dashboard regen is the standard side effect of the idea.md status flip — landed in the same commit per the mvp1-dashboard-regen pre-commit hook + the project's two-shot-commit pattern.

Test plan

  • python -c "import yaml; yaml.safe_load(open('.github/workflows/pr.yml'))" — YAML parses; 12 jobs total.
  • grep confirms the new triplet (backend-unit, backend-heavy, backend-cov-gate) at expected line numbers.
  • Pre-commit (check toml + dashboard regen) — clean.
  • Local coverage parallel + combine smoke test — confirmed.
  • CI backend-unit — should run in <2 min with -n auto + cov instrumentation.
  • CI backend-heavy — should run in ~7-8 min (down from ~9.5 min after unit tests removed).
  • CI backend-cov-gate — should download both artifacts, combine, gate against 80%.
  • CI wall-clock — total ~8 min (down from ~9.5 min).

🤖 Generated with Claude Code

Ships Win 2′ of the infra_pr_yml_split_backend_test_lanes idea (Phase 1).
The deferral condition stated in the idea — "Pick up only when the
integration layer becomes the binding CI constraint" — is met: today's
12-PR session showed the heavy backend (tests + coverage) job at
~9-9.5 min on every run, dominating wall-clock (every other job
completes in ≤4 min). The Gemini-fix re-push pattern doubled the
operator's wait time to ~20 min per PR. This PR recovers ~1.5 min
per run.

Three-job restructure of the backend test pipeline:

1. backend-unit (renamed from backend-unit-fast)
   - pytest backend/tests/unit/ -n auto --cov=backend --cov-report=
   - No service containers (unit tests are pure)
   - Uploads partial .coverage.* data as coverage-data-unit
   - Replaces the prior --no-cov fast lane — does double duty as fast
     unit-test signal AND coverage data source

2. backend-heavy (renamed from backend)
   - pytest backend/tests/contract backend/tests/integration --cov=backend --cov-report=
   - Service containers (Postgres + Redis + ES + OS) — same as before
   - SERIAL (no -n auto) — integration FK-teardown collision is
     non-negotiable per the idea's D-1
   - Contract tests bundle here because several boot the FastAPI app
     via LifespanManager and need the service containers anyway
   - Uploads partial .coverage.* data as coverage-data-heavy
   - Keeps the three verify_*.sh shell scripts (they need the project
     venv that uv-sync materializes here)

3. backend-cov-gate (NEW)
   - needs: [backend-unit, backend-heavy]
   - Downloads both partial coverage artifacts
   - 'coverage combine' merges them using the new [tool.coverage.paths]
     mapping in pyproject.toml
   - 'coverage report --show-missing' honors fail_under=80 from
     pyproject.toml
   - Generates coverage.xml + uploads as coverage-xml (replaces the XML
     upload that used to live in the heavy job)

Coverage plumbing in pyproject.toml:

- [tool.coverage.run] gains 'parallel = true' so each pytest invocation
  writes a uniquely-named .coverage.<host>.<pid>.<random> data file —
  required to keep the unit + heavy lanes' data from colliding when
  combined.
- [tool.coverage.paths] is new — maps per-runner absolute paths back to
  the canonical backend/ source so 'coverage combine' deduplicates
  correctly.

Expected wall-clock recovery:

- Before: backend (tests + coverage) ~9.5 min = critical path
- After:  max(backend-unit ~1 min, backend-heavy ~7-8 min) + cov-gate ~30s = ~8 min
- Savings: ~1.5 min per run, doubled (~3 min) when Gemini findings trigger
  a re-push cycle

Defers (intentionally NOT in this PR):

- Win 3 — split integration tests by required service container.
  Higher-value win but larger blast radius (pytest markers + per-shard
  service-container topologies). idea.md flipped to "Win 2′ shipped; Win 3 remains deferred." Folder stays in planned_features/02_mvp2/.

Dashboard regeneration (10 files in docs/00_overview/*_DASHBOARD.md +
*_dashboard.html) lands in this same commit per the
mvp1-dashboard-regen pre-commit hook (the idea.md status flip
triggered it).

Validated locally:
- pyproject.toml parse: clean
- pr.yml YAML parse: clean (12 jobs total, new triplet in order)
- coverage parallel + combine pattern: smoke-tested against
  coverage.py 7.x — two parallel runs produce uniquely-named
  .coverage.<host>.<pid>.<random> data files; combine merges cleanly
- All three verify_*.sh CI guards preserved in backend-heavy unchanged

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the RelyLoop release roadmap and backlog dashboards to reflect the shipping of the infra_pr_yml_split_backend_test_lanes feature. It also configures coverage settings in pyproject.toml to support parallel coverage combining across split backend test lanes. Feedback points out that enabling parallel = true globally in pyproject.toml degrades the local developer experience by cluttering the repository root and breaking standard local coverage commands, and suggests handling coverage file renaming within the GitHub Actions workflow instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pyproject.toml Outdated
Comment on lines +244 to +249
# Parallel mode: each pytest invocation writes a uniquely-named
# `.coverage.<host>.<pid>.<random>` data file so `coverage combine` can
# merge them across the split backend lanes (backend-unit, backend-heavy).
# See infra_pr_yml_split_backend_test_lanes idea Win 2′. Required when
# multiple lanes contribute to the 80% gate via a combine step.
parallel = true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Enabling parallel = true globally in pyproject.toml degrades the local developer experience in two ways:

  1. Running tests locally will generate unique .coverage.<host>.<pid>.<random> files on every run, which will accumulate and clutter the repository root unless manually cleaned up.
  2. Standard local commands like coverage report or coverage html will fail or show no data unless the developer explicitly runs coverage combine first.

Since the parallel lanes (backend-unit and backend-heavy) run on completely separate GitHub Actions runners, they do not need to generate parallel coverage files on the same machine. Instead, you can keep parallel = false (the default) for a clean local developer experience, and handle the naming in CI by renaming the .coverage file before uploading it or after downloading it in the gate job.

For example, in your GHA workflow:

  • In backend-unit: rename .coverage to .coverage.unit before uploading.
  • In backend-heavy: rename .coverage to .coverage.heavy before uploading.
  • In backend-cov-gate: download both as .coverage.unit and .coverage.heavy into the same directory, then run coverage combine.

SoundMindsAI and others added 3 commits June 16, 2026 16:35
The 1st CI run on this branch (#27645540296) revealed that pytest-cov's
fail_under=80 setting (from pyproject.toml's [tool.coverage.report]) is
honored by EVERY `pytest --cov` invocation by default — so both lanes
failed individually with "Coverage failure: total of N is less than
fail-under=80":

- backend-unit (unit only):            ~64% coverage → FAIL
- backend-heavy (contract+integration): ~61% coverage → FAIL

Both are EXPECTED on their own; neither lane covers the full app. The
80% threshold should only fire against the COMBINED coverage in
backend-cov-gate, not against either lane individually.

Fix: pass `--cov-fail-under=0` on each lane's pytest invocation to
override the pyproject setting just for that run. The cov-gate uses
`coverage report` directly, which still honors fail_under=80 from
pyproject.toml — so the combined gate behavior is unchanged.

The cov-gate then runs against the combine of:
- ~64% unit-lane coverage + ~61% heavy-lane coverage
  → unique-line union → should hit ≥80%, matching today's
    pre-split combined run

(Today's pre-split run also achieved ≥80%; the split moves WHERE the
threshold check fires, not WHAT the threshold is.)

Timings from the 1st run already confirm the design works as intended:
- backend-unit:  1m00s  (predicted ~1 min   ✓)
- backend-heavy: 7m55s  (predicted ~7-8 min ✓)
- backend-cov-gate would be ~30s
- Wall-clock: ~8 min vs 9.5 min before = ~1.5 min savings ✓

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Two bugs in the 2nd CI run on this branch (#27646454839):

Bug 1 — duplicate `env:` block in backend-heavy.
The previous commit added a new env block with COVERAGE_FILE inside the
pytest step, but the existing env block (DATABASE_URL_FILE etc.) was
already there. YAML rejects two `env:` keys in the same step. Merged
COVERAGE_FILE into the existing env block.

Bug 2 — artifact upload "No files were found".
The artifact upload path was `.coverage.*` (a dotfile glob), but
pytest-cov's session-end auto-combine merges parallel intermediates
into a single `.coverage` file (no suffix). So the glob matched
nothing → upload errored under `if-no-files-found: error`.

Fix: set `COVERAGE_FILE=.coverage.unit` and `.coverage.heavy` in the
respective lanes. With `parallel = true` in pyproject.toml, the
intermediate xdist-worker files land at `.coverage.unit.<host>.<pid>.<rand>`
and pytest-cov's auto-combine merges them into the lane's BASE file
(`.coverage.unit` or `.coverage.heavy`).

Both lanes now produce uniquely-named files that coexist when downloaded
to the same cwd in `backend-cov-gate`. `coverage combine` matches files
starting with `.coverage.` so both are picked up.

Validated:
- bash -n + python yaml.safe_load — clean (12 jobs)
- COVERAGE_FILE appears once per lane at the right scope
- Upload globs (`.coverage.unit*` / `.coverage.heavy*`) match both the
  intermediate parallel files AND the final per-lane file

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…toml

Gemini MEDIUM on pyproject.toml:249 — ACCEPT.

`parallel = true` in [tool.coverage.run] globally degrades the local
dev experience in two ways:

1. Every local `pytest --cov` run litters the repo root with
   `.coverage.<host>.<pid>.<random>` files that accumulate (until the
   dev manually `rm`s them or runs `coverage combine`).
2. Standard local commands like `coverage report` or `coverage html`
   show "No data to report" unless the dev runs `coverage combine`
   first.

And it's UNNECESSARY in CI. The split backend lanes (backend-unit,
backend-heavy) already produce uniquely-named files via per-job
`COVERAGE_FILE=.coverage.{unit,heavy}` env vars. The two artifacts
download into the same cwd in `backend-cov-gate` without colliding
because they have distinct base names — no `parallel = true` mode
needed.

Removed `parallel = true`. Kept [tool.coverage.paths] (still required
for cross-runner combine to map per-runner absolute paths back to
the canonical `backend/` source).

Updated three pr.yml comments that referenced the now-defunct parallel
mode + intermediate files:

- backend-unit's "Coverage feed" comment — now describes the COVERAGE_FILE
  approach instead of parallel mode.
- backend-unit's pytest step env comment — drops the description of
  the (no-longer-relevant) intermediate-files mechanism.
- backend-cov-gate's "Combine coverage data files" comment — now
  explicitly names the two input files (`.coverage.unit` and
  `.coverage.heavy`) instead of describing the parallel-mode glob.

Validated:
- Coverage report from the previous successful run (#27647170477) was
  82% (`Combined 2 files`, `TOTAL ... 82%`) — the design works without
  parallel mode being set globally
- All previous pr.yml jobs passed, so removing parallel = true should
  produce identical behavior

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
@SoundMindsAI

Copy link
Copy Markdown
Owner Author

Adjudication of 1 Gemini finding on 3797ef5e

# File:Line Severity Verdict Resolution
1 pyproject.toml:249 (parallel=true degrades local dev) MEDIUM Accept Removed parallel = true from [tool.coverage.run]. Per-lane unique filenames are already achieved via COVERAGE_FILE=.coverage.{unit,heavy} env vars in pr.yml — parallel mode globally was unnecessary AND a real DX hazard (.coverage.<host>.<pid>.<rand> clutter on every local pytest --cov run, local coverage report showing no data without a manual coverage combine). Kept [tool.coverage.paths] mapping (still required for cross-runner combine). Updated 3 pr.yml comments that referenced the now-defunct parallel-mode intermediates. Validated against the previous green run (#27647170477) which combined to 82% — the design works without parallel = true set globally.

Wall-clock data from the previous green run

Metric Before (today's earlier runs) After (this PR) Δ
backend (tests + coverage) → split lanes 9m20s avg 8m14s heavy + 56s unit + 24s gate
Total wall-clock critical path 9m20s avg 8m49s −31s
Combined coverage (whole-suite) 82% (2 files combined) passes 80% gate

The savings are modest (~30s) — less than the idea file's ~1-1.5 min upper-bound estimate — because the heavy lane (contract + integration alone) took 8m14s vs my prediction of ~7-8m. The unit-tests fraction of the original suite was smaller than I'd modeled.

The CI re-run on 81983973 should preserve this 8m49s wall-clock with the cleaner pyproject configuration.

@SoundMindsAI SoundMindsAI merged commit 8dfb774 into main Jun 16, 2026
20 checks passed
@SoundMindsAI SoundMindsAI deleted the infra_pr_yml_split_backend_test_lanes branch June 16, 2026 21:12
SoundMindsAI added a commit that referenced this pull request Jun 16, 2026
…#531) (#532)

Update state.md current-branch + execution context (notes the heavy CI
check rename: `backend (tests + coverage)` → `backend (contract +
integration + cov)` + the new `backend (coverage gate)`), prepend the
new one-line entry to "Last 5 merges", drop the now-6th row
(chore_dockerfile_remove_syntax_directive PR #521) into the
state_history.md older-entries reference, and add the full reasoning
entry to state_history.md.

The narrative covers the three-way split, the coverage plumbing (path
mapping + COVERAGE_FILE-not-parallel rationale), the honest wall-clock
accounting (~31s actual vs 1-1.5 min predicted; less because heavy lane
took longer than modeled), the CI iteration chronology (4 runs with
self-caught + Gemini-driven fixes), the Gemini adjudication, and the
explicit Win 3 deferral with acknowledgment that the binding constraint
has now shifted to backend-heavy.

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request Jun 16, 2026
…rvice (#533)

Today PR #531 shipped Win 2′ of the parent idea
infra_pr_yml_split_backend_test_lanes — splitting the heavy backend
test job into 3 parallel lanes (backend-unit, backend-heavy,
backend-cov-gate). Win 3 (split integration by required service
container) remained in the parent idea as the "higher-value win if
integration dominates."

Integration DOES now dominate the new critical path
(backend-heavy at 8m14s of 8m49s total wall-clock), so the operator
asked for Win 3 to be promoted to its own idea file — making the
implementable surface easier to find on the next CI-perf cycle.

Spun out as planned_features/02_mvp2/infra_pr_yml_split_integration_by_service/idea.md:

- Real per-engine test-count distribution (87 ES, 47 postgres-only,
  14 OS, 6 Solr out of 139 integration test files) drives the proposed
  4-shard topology (postgres / elastic / opensearch / solr).
- Locked decisions carried from parent (D-1 integration stays serial,
  D-2 coverage combine — both proven in production by Win 2′) PLUS
  two new locked decisions (D-3 per-shard own Postgres container; D-4
  LifespanManager contract tests go in the postgres shard).
- Open questions for /spec-gen — biggest is the total-CI-minutes vs
  wall-clock tradeoff (4 shards × Postgres+Redis each = ~3× container
  count vs current 1×).
- Honest defer-rationale — the marker pass is the load-bearing work
  (mismarks would silently skip tests in CI), and the local-dev
  ergonomics need per-shard `make` targets to avoid friction.

Parent idea unchanged in substance — Win 3 section now opens with a
"spun out" pointer at the top, but the historical framing stays for
reference.

Dashboard regen (MVP2_DASHBOARD.md + mvp2_dashboard.html + the public
website/docs/roadmap.md) lands in this same commit per the
mvp1-dashboard-regen pre-commit hook + the project memory's two-shot
pattern.

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
SoundMindsAI added a commit that referenced this pull request Jun 18, 2026
…#563)

Move the bug folder → implemented_features/2026_06_18_bug_reset_demo_no_instant_feedback_poll_race/
(bug_fix.md; Release: mvp2 marker; status stamped PR #562 / bb247a5).

state.md: prepend PR #562 to "Last 5 merges", drop #531 into the older-entries
rollup, refresh branch-context + Last-updated lines (note the queued
scenario-clarity feature). state_history.md: full merge narrative prepended.

Regenerated dashboards + public roadmap for the folder move.

Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant