feat: add cgroup CPU isolation for sandboxed walltime runs#427
feat: add cgroup CPU isolation for sandboxed walltime runs#427GuillaumeLagrange wants to merge 1 commit into
Conversation
Greptile SummaryThis PR adds cgroup-based CPU isolation for sandboxed wall-time benchmark runs. The main changes are:
Confidence Score: 5/5This looks safe to merge.
Important Files Changed
Reviews (3): Last reviewed commit: "feat: add cgroup CPU isolation for sandb..." | Re-trigger Greptile |
Merging this PR will not alter performance
|
a15e7ed to
40f9325
Compare
| IsolationMode::Cgroup { scope_dir } => { | ||
| // Move the runner (and the profiler it later spawns) onto the | ||
| // system cores before wrapping the benchmark for the bench cores. | ||
| place_runner_in_support(scope_dir)?; |
There was a problem hiding this comment.
Support leaf race In cgroup mode, this moves the runner into
<scope>/support/cgroup.procs before the code waits for any leaf to exist. If the privileged setup creates support and bench asynchronously, this write can fail with a missing support/cgroup.procs path even though the cgroup layout would be ready shortly after. The run then exits before the benchmark starts, so the support leaf needs the same readiness handling as the bench leaf.
The walltime executor pins the benchmark to dedicated cores with `systemd-run --scope --slice=codspeed.slice`, but that needs a reachable host systemd — absent inside the macro-agent sandbox, where the runner is PID 1 of a private namespace with no host authority. Add a `Cgroup` isolation mode for that case, selected by `CODSPEED_ISOLATION=CGROUP:<dir>`. The privileged side delegates that scope cgroup already split and pinned into `all` / `support` / `bench` leaves; the runner just relocates with two plain `cgroup.procs` writes — itself into `support`, so it and the profiler stay off the bench cores, and the benchmark into `bench` via a small `bash` shim. No cpuset arithmetic and no privilege: the benchmark stays in the profiler's process subtree, so it records unprivileged. Unlike the systemd scope, this keeps the benchmark a descendant of the profiler, so the `isolate` flag threaded through the profilers becomes `requires_sudo`, which is true only for the systemd mode. Refs COD-3012 Co-Authored-By: Claude <noreply@anthropic.com>
40f9325 to
d2f2793
Compare
Add an alternative to
systemd-runwhere the caller can prepare cgroups and forward the info throughCODSPEED_WALLTIME_ISOLATION=CGROUP:/path/to/cgroup.The runner expects this group to have two leaves:
supportandbench.When spawning the bench process, the runner
supportgroupbenchgroup