Add UFFD snapshot pager graduation#272
Draft
sjmiller609 wants to merge 4 commits into
Draft
Conversation
Detach running UFFD-backed VMs from their snapshot memory pager after a
soak period instead of leaving them pinned for the life of the restore.
A new pager /sessions/{id}/complete endpoint populates the remaining
pages from the backing file and unregisters userfaultfd, so the VM keeps
running on resident memory with no pager dependency and no pause or
network interruption. This bounds the number of active pager sessions
and lets old pager versions drain to zero and exit.
A background controller (lib/uffdgraduate) drives graduations subject to
min_session_age, max_concurrent, and an optional max_active_sessions
ceiling, prioritising sessions on outdated pager versions. Disabled by
default and only active on the uffd backend. The detach is gated behind
a new hypervisor capability so the controller stays hypervisor-agnostic.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sibling of the UFFD one-shot lifecycle test that detaches a running UFFD-backed VM from its pager and asserts the VM keeps running with its guest memory and disk intact, new writes still work, and a later standby/restore preserves memory. Leaves the existing test unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Overlapping the graduation test's full memory populate with the sibling UFFD lifecycle test's VMs saturated the CI runner and timed out guest-agent readiness. Drop t.Parallel so peak concurrent UFFD VM load matches the pre-existing single-test profile. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Running UFFD-backed VMs are pinned to their snapshot memory pager for the life of the restore. This adds a way to detach a running VM from its pager after it has soaked, so the pool of active pager sessions stays bounded and old pager versions can drain to zero and exit.
Detach happens without touching the VM: a new pager endpoint
POST /sessions/{id}/completepopulates every outstanding page from the backing file and then unregisters userfaultfd. The guest never pauses and its network is untouched; the VM ends up running on resident memory with no pager dependency.Why not migrate UFFD→UFFD or fall back to the file backend: the memory backend is fixed at the mmap when a VM is restored, so reaching the file backend requires a VMM restart, which drops every TCP connection. Graduation (finish the lazy load, then detach) is the only path that is non-interrupting.
What's here
lib/uffdpager):POST /sessions/{id}/complete+Supervisor.CompleteSessionVersion. Completion runs in the fault-loop goroutine (woken via a pipe), populates all pages (reusing the existing read/copy path), thenUFFDIO_UNREGISTERs the ranges. Unregister happens only after a full populate — otherwise the kernel zero-fills still-absent pages (corruption). On any populate failure the session keeps serving faults and is not torn down.Capabilities().UsesDetachableSnapshotMemoryPager(true for Firecracker) so the controller stays hypervisor-agnostic.GraduateSnapshotMemoryPagerperforms the detach under the instance lock and clears the session binding.lib/uffdgraduate): scans for running pager-backed VMs and graduates eligible ones, prioritising outdated pager versions.hypervisor.firecracker_uffd_graduation):enabled(default false),min_session_age(10m),max_concurrent(1),max_active_sessions(0 = time-based weaning),scan_interval(1m),completion_timeout(10m). Wired inmain.govia the existing configure/start pattern (no wire regen).Behaviour
uffdbackend.max_active_sessions == 0: every session pastmin_session_ageis graduated (time-based weaning).> 0: only enough oldest sessions are graduated to return to the ceiling; outdated-version sessions are always graduated after the soak.Tradeoff
Graduated pages become resident anonymous memory (reclaimable only to swap, unlike clean file-backed pages), and completion reads the whole remaining image once — hence the soak + concurrency pacing.
Test plan
go build ./...,go vet, and unit tests pass forlib/uffdgraduate,lib/uffdpager,cmd/api/config.max_active_sessionsceiling, outdated-version priority, and disabled = no-op. Config Normalize/Validate covered.UFFDIO_UNREGISTERioctl value and the wake pipe.UFFDIO_COPYis dirty-neutral on the host kernel, so the first post-graduation diff snapshot stays small (size regression risk, not correctness).🤖 Generated with Claude Code