Skip to content

Add template debugging tooling and surface Jupyter logs#288

Merged
mishushakov merged 9 commits into
mainfrom
template-debug-tooling
Jun 4, 2026
Merged

Add template debugging tooling and surface Jupyter logs#288
mishushakov merged 9 commits into
mainfrom
template-debug-tooling

Conversation

@mishushakov

Copy link
Copy Markdown
Member

What

Makes the sandbox template easier to debug when the server fails to start — motivated by a recent jupyter-server bump whose only failure signal was a 10-minute readiness timeout, because the actual error was being thrown away.

Changes

  • Stop discarding Jupyter's logssystemd/jupyter.service now sends StandardOutput to the journal instead of /dev/null, so ServerApp request/error logs survive. Inspect with journalctl -u jupyter.
  • make debug-template — builds the template via the real systemd start path (the one CI/prod use, not the Docker start-up.sh path) with a timeout ready-gate, so the build finalizes even while the server is crash-looping, then spawns a sandbox and dumps systemctl status + full journalctl for both services.
    • template/build_debug.py, template/debug_logs.py — the two scripts the target chains.
    • make_template() gains an optional ready= override (defaults to the /health gate — all existing callers unaffected).
  • README — a "Debugging a server that won't start" section, calling out that make start-template-server uses the Docker path while CI/prod use systemd (the divergence that hid the original bug).

Why

When a build fails its readiness check, the real cause lives in the systemd journal — which a failed cloud build never surfaces, and StandardOutput=null discarded anyway. These changes turn that investigation into a one-command journalctl read.

Notes

  • Pure tooling/observability change; the default build path is unchanged.
  • ruff check / ruff format clean.

🤖 Generated with Claude Code

Send Jupyter's stdout to the systemd journal instead of /dev/null so
startup errors (e.g. failed session creation) are visible via
`journalctl -u jupyter`.

Add a `make debug-template` workflow (build_debug.py + debug_logs.py)
that builds the template via the real systemd start path with a timeout
ready-gate, then spawns a sandbox and dumps the jupyter and
code-interpreter service journals — for diagnosing a server that fails
its readiness check. make_template() gains an optional `ready` override
to support this.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cursor

cursor Bot commented Jun 3, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Observability and dev tooling only; default template builds still use the health gate and unchanged Jupyter logging.

Overview
Adds make debug-template, which builds a debug sandbox image on the systemd startup path with a 60s timeout instead of /health, then prints service status, journals, and quick curl checks. make_template accepts optional ready and debug; when debug is true, a systemd drop-in sends Jupyter stdout to the journal. README documents systemd vs Docker startup divergence and manual journalctl usage.

Reviewed by Cursor Bugbot for commit 179e18f. Bugbot is set up for automated code reviews on this repo. Configure here.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72c8ba882e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread template/build_debug.py Outdated
wait_for_timeout takes milliseconds (min 1000ms), so 60 collapsed to a
1s ready-gate. Use 60_000 for the intended 60s.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mishushakov mishushakov force-pushed the template-debug-tooling branch from bd905ae to c78eb9b Compare June 3, 2026 15:02
Comment thread template/debug_logs.py Outdated
Comment thread template/debug_logs.py Outdated
mishushakov and others added 3 commits June 3, 2026 17:06
Address Cursor Bugbot review on PR #288:
- Bump sandbox TTL 180s -> 600s so the full diagnostic sequence (sleep +
  per-command 60s budgets) can't outlive the sandbox.
- Wrap each command in try/except so one slow or failing command no
  longer aborts the loop and skips the remaining journals/probes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The start command runs at build time and the resulting state is
snapshotted, so a resumed sandbox already has the services running (and
their journals populated). No need to wait after create.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Annotate `ready` as `ReadyCmd | None` (the type returned by
wait_for_url/wait_for_timeout and accepted by set_start_cmd).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread template/build_debug.py Outdated
Comment thread template/debug_logs.py Outdated
A port that accepts TCP but never sends an HTTP response (the half-broken
state this tool diagnoses) would otherwise hang curl until the 60s
command timeout. --max-time 3 makes each probe fail fast.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread template/systemd/jupyter.service Outdated
mishushakov and others added 2 commits June 3, 2026 17:35
Keep production at StandardOutput=null. make_template(debug=True) now
applies a systemd drop-in (jupyter-debug.conf) that flips Jupyter's
stdout to the journal, and build_debug.py opts in. Production template
behavior is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
systemd lazily loads the freshly-copied units (and their drop-ins) on
the first `systemctl start` at end of build, so the explicit
daemon-reload was a no-op. Verified the prod build still reaches a
healthy /health gate without it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 22747f2. Configure here.

Comment thread template/template.py
Comment thread template/template.py
Comment thread .changeset/template-debug-tooling.md Outdated
The published template artifact is unchanged (jupyter.service matches
main, daemon-reload removal yields an identical image, the journal
drop-in only ships in debug builds). Remaining changes are the build
script, dev-only scripts, and docs — no version bump warranted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mishushakov mishushakov merged commit 8115b1d into main Jun 4, 2026
16 checks passed
@mishushakov mishushakov deleted the template-debug-tooling branch June 4, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants