Skip to content

Add UFFD snapshot pager graduation#272

Draft
sjmiller609 wants to merge 4 commits into
mainfrom
hypeship/uffd-graduation
Draft

Add UFFD snapshot pager graduation#272
sjmiller609 wants to merge 4 commits into
mainfrom
hypeship/uffd-graduation

Conversation

@sjmiller609
Copy link
Copy Markdown
Collaborator

Summary

Running UFFD-backed VMs are pinned to their snapshot memory pager for the life of the restore. This adds a way to detach a running VM from its pager after it has soaked, so the pool of active pager sessions stays bounded and old pager versions can drain to zero and exit.

Detach happens without touching the VM: a new pager endpoint POST /sessions/{id}/complete populates every outstanding page from the backing file and then unregisters userfaultfd. The guest never pauses and its network is untouched; the VM ends up running on resident memory with no pager dependency.

Why not migrate UFFD→UFFD or fall back to the file backend: the memory backend is fixed at the mmap when a VM is restored, so reaching the file backend requires a VMM restart, which drops every TCP connection. Graduation (finish the lazy load, then detach) is the only path that is non-interrupting.

What's here

  • Pager (lib/uffdpager): POST /sessions/{id}/complete + Supervisor.CompleteSessionVersion. Completion runs in the fault-loop goroutine (woken via a pipe), populates all pages (reusing the existing read/copy path), then UFFDIO_UNREGISTERs the ranges. Unregister happens only after a full populate — otherwise the kernel zero-fills still-absent pages (corruption). On any populate failure the session keeps serving faults and is not torn down.
  • Hypervisor: new Capabilities().UsesDetachableSnapshotMemoryPager (true for Firecracker) so the controller stays hypervisor-agnostic.
  • Manager: GraduateSnapshotMemoryPager performs the detach under the instance lock and clears the session binding.
  • Controller (lib/uffdgraduate): scans for running pager-backed VMs and graduates eligible ones, prioritising outdated pager versions.
  • Config (hypervisor.firecracker_uffd_graduation): enabled (default false), min_session_age (10m), max_concurrent (1), max_active_sessions (0 = time-based weaning), scan_interval (1m), completion_timeout (10m). Wired in main.go via the existing configure/start pattern (no wire regen).

Behaviour

  • Disabled by default and only constructed on the uffd backend.
  • max_active_sessions == 0: every session past min_session_age is graduated (time-based weaning). > 0: only enough oldest sessions are graduated to return to the ceiling; outdated-version sessions are always graduated after the soak.
  • A failed graduation leaves the VM untouched (still on its pager) and is retried on a later scan.

Tradeoff

Graduated pages become resident anonymous memory (reclaimable only to swap, unlike clean file-backed pages), and completion reads the whole remaining image once — hence the soak + concurrency pacing.

Test plan

  • go build ./..., go vet, and unit tests pass for lib/uffdgraduate, lib/uffdpager, cmd/api/config.
  • Controller unit tests cover soak gating, concurrency, the max_active_sessions ceiling, outdated-version priority, and disabled = no-op. Config Normalize/Validate covered.
  • A unit test guards the hand-computed UFFDIO_UNREGISTER ioctl value and the wake pipe.
  • Not validated locally (needs real Firecracker + host kernel): the populate-then-unregister path on a live VM. Three assumptions to confirm before enabling in production:
    1. Firecracker tolerates the handler unregistering + closing the uffd mid-run after a full populate.
    2. Active-ballooning interaction: after unregister, a ballooned-then-reused page re-faults to zero-fill (safe only if genuinely guest-relinquished).
    3. UFFDIO_COPY is dirty-neutral on the host kernel, so the first post-graduation diff snapshot stays small (size regression risk, not correctness).

🤖 Generated with Claude Code

sjmiller609 and others added 4 commits June 6, 2026 18:35
Detach running UFFD-backed VMs from their snapshot memory pager after a
soak period instead of leaving them pinned for the life of the restore.
A new pager /sessions/{id}/complete endpoint populates the remaining
pages from the backing file and unregisters userfaultfd, so the VM keeps
running on resident memory with no pager dependency and no pause or
network interruption. This bounds the number of active pager sessions
and lets old pager versions drain to zero and exit.

A background controller (lib/uffdgraduate) drives graduations subject to
min_session_age, max_concurrent, and an optional max_active_sessions
ceiling, prioritising sessions on outdated pager versions. Disabled by
default and only active on the uffd backend. The detach is gated behind
a new hypervisor capability so the controller stays hypervisor-agnostic.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sibling of the UFFD one-shot lifecycle test that detaches a running
UFFD-backed VM from its pager and asserts the VM keeps running with its
guest memory and disk intact, new writes still work, and a later
standby/restore preserves memory. Leaves the existing test unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Overlapping the graduation test's full memory populate with the sibling
UFFD lifecycle test's VMs saturated the CI runner and timed out
guest-agent readiness. Drop t.Parallel so peak concurrent UFFD VM load
matches the pre-existing single-test profile.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant