Vouch request: craig-kindo #2132
craig-kindo
started this conversation in
Vouch Request
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What do you want to work on?
I want to work on #1447 to add warm pool support.
Why this change?
I have a warm pool implementation that is different (and simpler) than that issue's discussion. The issue has multiple people agreeing to tackle the problem in a straightforward if toilsome way: by extending the gateway <-> supervisor communication channel to establish consensus over the state of the sandbox. This isn't a bad design. It maintains the most separation between OpenShell's ability to have pre-warmed sandboxes, and the underlying provider. Anybody could swap out providers and have the most machinery already available to provide warm pools on the new provider. If that provider has the equivalent of k8s' AgentSandbox, it would be a drop-in replacement.
I'm not in love with that solution due to the simple fact that we're making the cluster state much more complicated. AgentSandbox is supposed to store the canonical claim of a pre-warmed sandbox in k8s custom resources. I don't want to mirror that state into the supervisor, and into the gateway, and create a distributed consensus problem. I also don't really feel comfortable making drastic changes to the gateway <-> supervisor handshake; I feel that handshake needs either radical simplicity or impeccable taste to stay maintainable. Adding an extra combinatorial factor there for warm pools just doesn't seem like the right choice.
The simpler state change I have in mind is to just add a new optional startup phase in the supervisor that waits to be claimed. Only after it's been claimed, it will proceed as normal and use the newly claimed identity to begin the handshake with the gateway. This optional wait-for-claim phase can be set by a command line flag or env variable in the warm pool template. The claim state stays in k8s, and we can pipe that into the supervisor without even including a k8s client (!) by creating a downwardApi volume to project the claim annotation as a sentinel file for the supervisor to watch.
All of this is done in a couple hundred lines, which is dwarfed by the diff size to send SandboxClaims instead of Sandboxes to k8s in the gateway. That, at least, is a very straightforward change.
(Note: I previously requested a vouch here but closed it when the fix I wanted to implement became redundant. I don't believe opening a new vouch request to be against the CONTRIBUTING.md, but please let me know if it would be better to move this request under that discussion.)
Checklist
Beta Was this translation helpful? Give feedback.
All reactions