Add template debugging tooling and surface Jupyter logs#288
Conversation
Send Jupyter's stdout to the systemd journal instead of /dev/null so startup errors (e.g. failed session creation) are visible via `journalctl -u jupyter`. Add a `make debug-template` workflow (build_debug.py + debug_logs.py) that builds the template via the real systemd start path with a timeout ready-gate, then spawns a sandbox and dumps the jupyter and code-interpreter service journals — for diagnosing a server that fails its readiness check. make_template() gains an optional `ready` override to support this. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PR SummaryLow Risk Overview Reviewed by Cursor Bugbot for commit 179e18f. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 72c8ba882e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
wait_for_timeout takes milliseconds (min 1000ms), so 60 collapsed to a 1s ready-gate. Use 60_000 for the intended 60s. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
bd905ae to
c78eb9b
Compare
Address Cursor Bugbot review on PR #288: - Bump sandbox TTL 180s -> 600s so the full diagnostic sequence (sleep + per-command 60s budgets) can't outlive the sandbox. - Wrap each command in try/except so one slow or failing command no longer aborts the loop and skips the remaining journals/probes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The start command runs at build time and the resulting state is snapshotted, so a resumed sandbox already has the services running (and their journals populated). No need to wait after create. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Annotate `ready` as `ReadyCmd | None` (the type returned by wait_for_url/wait_for_timeout and accepted by set_start_cmd). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A port that accepts TCP but never sends an HTTP response (the half-broken state this tool diagnoses) would otherwise hang curl until the 60s command timeout. --max-time 3 makes each probe fail fast. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Keep production at StandardOutput=null. make_template(debug=True) now applies a systemd drop-in (jupyter-debug.conf) that flips Jupyter's stdout to the journal, and build_debug.py opts in. Production template behavior is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
systemd lazily loads the freshly-copied units (and their drop-ins) on the first `systemctl start` at end of build, so the explicit daemon-reload was a no-op. Verified the prod build still reaches a healthy /health gate without it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 22747f2. Configure here.
The published template artifact is unchanged (jupyter.service matches main, daemon-reload removal yields an identical image, the journal drop-in only ships in debug builds). Remaining changes are the build script, dev-only scripts, and docs — no version bump warranted. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

What
Makes the sandbox template easier to debug when the server fails to start — motivated by a recent
jupyter-serverbump whose only failure signal was a 10-minute readiness timeout, because the actual error was being thrown away.Changes
systemd/jupyter.servicenow sendsStandardOutputto the journal instead of/dev/null, soServerApprequest/error logs survive. Inspect withjournalctl -u jupyter.make debug-template— builds the template via the real systemd start path (the one CI/prod use, not the Dockerstart-up.shpath) with a timeout ready-gate, so the build finalizes even while the server is crash-looping, then spawns a sandbox and dumpssystemctl status+ fulljournalctlfor both services.template/build_debug.py,template/debug_logs.py— the two scripts the target chains.make_template()gains an optionalready=override (defaults to the/healthgate — all existing callers unaffected).make start-template-serveruses the Docker path while CI/prod use systemd (the divergence that hid the original bug).Why
When a build fails its readiness check, the real cause lives in the systemd journal — which a failed cloud build never surfaces, and
StandardOutput=nulldiscarded anyway. These changes turn that investigation into a one-commandjournalctlread.Notes
ruff check/ruff formatclean.🤖 Generated with Claude Code