Skip to content

feat(runtime): wire ctx.llm from sandbox credentials in cloud defaults#193

Merged
khaliqgant merged 2 commits into
mainfrom
fix/cloud-defaults-llm-context
Jun 4, 2026
Merged

feat(runtime): wire ctx.llm from sandbox credentials in cloud defaults#193
khaliqgant merged 2 commits into
mainfrom
fix/cloud-defaults-llm-context

Conversation

@khaliqgant

@khaliqgant khaliqgant commented Jun 4, 2026

Copy link
Copy Markdown
Member

User description

Root cause this fixes

ctx.llm has been structurally unwired in EVERY cloud deployment since the runtime existed: buildCtx only receives an LlmContext when a caller passes one (runner.ts forwards subsystems.llm only), createCloudRuntimeDefaults never built one, and no deploy/bootstrap path constructs one anywhere — repo-wide grep finds zero constructors outside type definitions. Result: ctx.llm.complete() always threw the UNAVAILABLE_LLM stub, and its error message's advice (set persona.useSubscription:true and connect a provider) was misleading — useSubscription gates deploy-time harness-binary credential linking, and never flowed into buildCtx at all.

Why it surfaced now: the linear-chat-lead CWD/mount fix (relay-helpers 0.3.33) advanced handler execution past getIssue onto classifyIntent's ctx.llm.complete() (linear/agent.ts:188). granola (agent.ts:95) and hn-monitor (agent.ts:65) share the same dead path — every ctx.llm-using persona has always been broken in cloud deployment.

Change

  • New cloud-llm.ts: createDefaultLlm({persona, env, log}) derives an LlmContext from sandbox env credentials, priority order:
    1. ANTHROPIC_API_KEY → Anthropic Messages API via x-api-key + anthropic-version: 2023-06-01
    2. CLAUDE_CODE_OAUTH_TOKEN → same API via Authorization: Bearer (the claude setup-token env that cloud#1629 injects for oauth_token provider credentials) — never via x-api-key, and never both headers at once
    3. OPENAI_API_KEY → OpenAI chat completions, bearer
  • Provider-family steering: when persona.model names a family (claude-* / anthropic/... vs gpt-* / openai/... / openai-codex/...), a credential of that family is preferred and the persona's model is used (provider prefix stripped). Family defaults otherwise: claude-opus-4-8 / gpt-5.1 ⚠️ reviewer check requested on the OpenAI default model string.
  • createCloudRuntimeDefaults exposes llm?; startRunner forwards subsystems.llm ?? cloudDefaults.llm (mirrors the existing workflow-subsystem wiring).
  • Fail-soft: no credential → undefinedbuildCtx keeps the existing throwing stub (unchanged behavior for boxes with no LLM creds).
  • Errors: non-2xx throws with status + truncated detail (no secrets logged); empty content throws rather than returning '' (the granola JSON-parse path would otherwise mis-succeed); 120s timeout via AbortSignal.timeout.

Validation

  • npm run build clean (after building persona-kit workspace dep)
  • npm test (runtime): 46/46 pass, including 8 new cloud-llm tests covering: no-creds→undefined, x-api-key header shape, Bearer-only for setup-token (no x-api-key leak), OpenAI routing + prefix-strip, family preference with multiple creds, anthropic default with no family, non-2xx throw, empty-content throw.

Rollout chain (cloud side, after merge)

publish runtime → bump in cloud snapshot/agents pins (lockstep rules apply) → redeploy affected personas (linear-chat-lead, granola, hn-monitor). Note: the deployed-persona env for anthropic harnesses currently UNSETS ANTHROPIC_API_KEY (ambient-env scrub in deployment-trigger-delivery) and auths the binary via credential files — so for those personas this fix activates once cloud#1629's CLAUDE_CODE_OAUTH_TOKEN injection (merged today) deploys, or when the workspace uses BYOK env keys. Worth a follow-up discussion on whether the credential-file fallback belongs in the runtime too.

🤖 Generated with Claude Code


CodeAnt-AI Description

Use sandbox credentials for ctx.llm in cloud runs

What Changed

  • Cloud personas can now use ctx.llm when a supported API credential is present in the sandbox, instead of always failing with the unavailable-LLM stub
  • The runtime chooses Anthropic or OpenAI based on the persona’s model when possible, and falls back to the first available matching credential
  • Anthropic API keys and Claude OAuth tokens now both work for Anthropic requests, and OpenAI keys work for chat completion requests
  • If the model name already includes a provider prefix, that prefix is removed before the request is sent
  • Failed LLM calls now return clear errors when the service responds with an error or when the response contains no text

Impact

✅ Working LLM calls in cloud personas
✅ Fewer runtime failures in chat and classification flows
✅ Clearer errors when an LLM request is rejected or empty

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

ctx.llm was structurally unwired in every cloud deployment: buildCtx only
receives an LlmContext when a caller passes one, createCloudRuntimeDefaults
never built one, and no deploy/bootstrap path did either — so
ctx.llm.complete() always threw the UNAVAILABLE_LLM stub regardless of
persona.useSubscription (which only gates harness-binary credential
linking). The gap surfaced when the linear-chat-lead CWD fix advanced
execution past getIssue onto the classifyIntent ctx.llm call; granola and
hn-monitor share the same dead path.

createCloudRuntimeDefaults now derives an LlmContext from sandbox env
credentials, in order: ANTHROPIC_API_KEY (x-api-key), CLAUDE_CODE_OAUTH_TOKEN
(Authorization: Bearer — the claude setup-token cloud#1629 injects),
OPENAI_API_KEY (chat completions). The persona's model field steers
provider-family selection and supplies the model (provider prefixes
stripped); families fall back to claude-opus-4-8 / gpt-5.1. No credential →
undefined → buildCtx keeps the existing throwing stub.

startRunner forwards subsystems.llm ?? cloudDefaults.llm, mirroring the
workflow subsystem wiring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codeant-ai

codeant-ai Bot commented Jun 4, 2026

Copy link
Copy Markdown

CodeAnt AI is reviewing your PR.

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@khaliqgant, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 56 minutes and 27 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e3df929a-785a-44a5-9b51-87d7db45be54

📥 Commits

Reviewing files that changed from the base of the PR and between 53d18ef and 8de5526.

📒 Files selected for processing (2)
  • packages/runtime/src/cloud-llm.test.ts
  • packages/runtime/src/cloud-llm.ts
📝 Walkthrough

Walkthrough

This PR adds cloud LLM support to the runtime, enabling deployed personas to query Anthropic or OpenAI models using environment-provided credentials. Core logic selects credentials per provider, resolves model names, and implements request/response flows for both endpoints. Integration points wire the LLM into cloud defaults and runtime context with fallback semantics.

Changes

Cloud LLM Support

Layer / File(s) Summary
LLM Provider Contracts & Credential Selection
packages/runtime/src/cloud-llm.ts
CloudLlmOptions type and createDefaultLlm entry point. Credential selection chooses between env-backed Anthropic/OpenAI sources with single-header strategy, derives provider family from persona model naming, resolves effective model by stripping provider prefixes or applying family defaults.
Anthropic & OpenAI Complete Flows
packages/runtime/src/cloud-llm.ts
Anthropic complete() builds /v1/messages payload and extracts concatenated text blocks. OpenAI complete() builds /v1/chat/completions payload and extracts first choice message content. Both throw when text content is absent.
Networking & Utility Helpers
packages/runtime/src/cloud-llm.ts
postJson centralizes POST with JSON serialization, timeout, pre-response logging, status validation, and error truncation. Utilities support env trimming, record type checks, and string truncation.
Cloud Defaults Integration
packages/runtime/src/cloud-defaults.ts
Extends CloudRuntimeDefaults with optional llm field. createCloudRuntimeDefaults constructs default LLM from persona/env/logger and conditionally includes it in returned object.
Runtime Context Wiring
packages/runtime/src/runner.ts
startRunner's buildCtx uses options.subsystems?.llm if provided, otherwise falls back to cloudDefaults.llm, ensuring default LLM availability.
Comprehensive Test Suite
packages/runtime/src/cloud-llm.test.ts
Tests cover no-credential scenario, Anthropic API-key auth with Messages endpoint, OAuth and OpenAI Bearer tokens routing to Chat Completions, credential precedence favoring persona model family, Anthropic default fallback, non-2xx rejection, and empty content rejection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • AgentWorkforce/workforce#159: Modifies CloudRuntimeDefaults and createCloudRuntimeDefaults in cloud-defaults.ts; PR #159 adds optional sandbox: false behavior while this PR adds default llm context support.

Poem

🐰 A rabbit hops through API keys with glee,
Building bridges to Claude and OpenAI spree,
Anthropic and completions dance in sync,
Context flows default with a clever wink,
Cloud personas now chat—what a feat! 🌙✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main change: wiring ctx.llm from sandbox credentials into cloud defaults, which is the core objective of this PR.
Description check ✅ Passed The description comprehensively explains the root cause, changes made, validation performed, and rollout implications—all directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/cloud-defaults-llm-context

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai Bot added the size:L This PR changes 100-499 lines, ignoring generated files label Jun 4, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a default environment-derived LLM context (ctx.llm) for cloud-deployed personas, resolving an issue where cloud personas lacked a working LLM client. It adds credential selection and routing for Anthropic and OpenAI APIs, along with comprehensive unit tests. The review feedback highlights several improvement opportunities, including updating the default OpenAI model to a valid one (e.g., gpt-4o), improving model family detection to support newer o1/o3 models and the harness field, and adding defensive checks using the isRecord helper to prevent potential runtime TypeErrors when parsing API responses.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/runtime/src/cloud-llm.ts Outdated
*/

const DEFAULT_ANTHROPIC_MODEL = 'claude-opus-4-8';
const DEFAULT_OPENAI_MODEL = 'gpt-5.1';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

gpt-5.1 is not a valid OpenAI model. As requested in the PR description, this should be updated to a valid, standard OpenAI model such as gpt-4o or gpt-4o-mini to ensure out-of-the-box compatibility.

Suggested change
const DEFAULT_OPENAI_MODEL = 'gpt-5.1';
const DEFAULT_OPENAI_MODEL = 'gpt-4o';

Comment on lines +109 to +123
function personaModelFamily(persona: PersonaSpec): LlmProviderFamily | null {
const model = nonEmpty(persona.model);
if (!model) return null;
const normalized = model.toLowerCase();
if (normalized.startsWith('anthropic/') || normalized.includes('claude')) return 'anthropic';
if (
normalized.startsWith('openai/') ||
normalized.startsWith('openai-codex/') ||
normalized.includes('gpt-') ||
normalized.includes('codex')
) {
return 'openai';
}
return null;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current personaModelFamily implementation only checks the model string. This has two issues:

  1. It does not recognize OpenAI's newer o1 and o3 series models (e.g., o1-mini, o3-mini), which do not contain the gpt- substring.
  2. It ignores the persona.harness field, which is a highly reliable indicator of the provider family (claude maps to anthropic, and codex/opencode map to openai).

We can make this much more robust by checking persona.harness first, and also adding support for o1- and o3- prefixes.

function personaModelFamily(persona: PersonaSpec): LlmProviderFamily | null {
  if (persona.harness === 'claude') return 'anthropic';
  if (persona.harness === 'codex' || persona.harness === 'opencode') return 'openai';

  const model = nonEmpty(persona.model);
  if (!model) return null;
  const normalized = model.toLowerCase();
  if (normalized.startsWith('anthropic/') || normalized.includes('claude')) return 'anthropic';
  if (
    normalized.startsWith('openai/') ||
    normalized.startsWith('openai-codex/') ||
    normalized.includes('gpt-') ||
    normalized.includes('codex') ||
    normalized.startsWith('o1-') ||
    normalized.startsWith('o3-')
  ) {
    return 'openai';
  }
  return null;
}

Comment on lines +158 to +160
const content = Array.isArray((payload as { content?: unknown }).content)
? ((payload as { content: unknown[] }).content)
: [];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Defensive programming: payload is typed as unknown and cast directly to an object (payload as { content?: unknown }). If the API returns a non-object JSON response (like null or a primitive), accessing .content will throw a TypeError at runtime. We should use the existing isRecord helper to safely guard this property access.

Suggested change
const content = Array.isArray((payload as { content?: unknown }).content)
? ((payload as { content: unknown[] }).content)
: [];
const content = isRecord(payload) && Array.isArray(payload.content)
? payload.content
: [];

Comment on lines +170 to +172
`ctx.llm: Anthropic response contained no text content (stop_reason=${String(
(payload as { stop_reason?: unknown }).stop_reason ?? 'unknown'
)})`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Defensive programming: If payload is not an object, casting it to access stop_reason will throw a TypeError. We should safely check if payload is a record before accessing stop_reason.

Suggested change
`ctx.llm: Anthropic response contained no text content (stop_reason=${String(
(payload as { stop_reason?: unknown }).stop_reason ?? 'unknown'
)})`
`ctx.llm: Anthropic response contained no text content (stop_reason=${String(
(isRecord(payload) && payload.stop_reason) ?? 'unknown'
)})`

Comment on lines +201 to +202
const choices = (payload as { choices?: unknown }).choices;
const first = Array.isArray(choices) ? choices[0] : undefined;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Defensive programming: If payload is not an object, casting it to access choices will throw a TypeError. We should safely check if payload is a record before accessing choices.

Suggested change
const choices = (payload as { choices?: unknown }).choices;
const first = Array.isArray(choices) ? choices[0] : undefined;
const choices = isRecord(payload) ? payload.choices : undefined;
const first = Array.isArray(choices) ? choices[0] : undefined;

@khaliqgant

Copy link
Copy Markdown
Member Author

Review (claude-3) — GO with one likely-functional gap on the OAuth leg + two model-string asks

Read the full diff at 53d18ef. The architecture is right: env-derived, fail-soft (no creds → existing throwing stub), exactly-one-auth-header discipline, persona.model steers family with prefix stripping, structured logs that name the credential SOURCE but never the value, 120s timeout + truncated error detail. Wiring through CloudRuntimeDefaults.llm with the conditional spread preserves every existing caller.

1. (Likely functional) The CLAUDE_CODE_OAUTH_TOKEN leg probably 401s without the OAuth beta header. Claude Code setup-tokens are accepted by the Messages API only with anthropic-beta: oauth-2025-04-20 alongside Authorization: Bearer — a bare Bearer on /v1/messages is rejected as an authentication error. Since this leg is exactly the #1629-activated path, it would fail at first live use despite a valid token. Add the beta header to the OAuth candidate's headers (NOT the x-api-key one) and ideally pin a test asserting the header set differs between the two anthropic sources. Verify against a live setup-token before the lockstep publish.

2. OpenAI default model gpt-5.1 is stale relative to what the fleet actually runs — repo personas use gpt-5.4/gpt-5.5. Suggest bumping the default to the current family default rather than a dated pin.

3. Codex-model passthrough: personaModelFamily maps *codex* → openai (right for family/credential selection), but resolveModel then passes e.g. gpt-5.5-codex to /v1/chat/completions, which doesn't serve codex models. When the stripped model still contains codex, fall back to DEFAULT_OPENAI_MODEL instead — pr-reviewer-class personas (model gpt-5.5 is fine, but any openai-codex/... ref breaks).

Side observation for the activation-dependency flag: agreed it's real, and note the failure shape if the OAuth token itself is expired/rotated — Anthropic's authentication_error message is essentially "invalid bearer token", which will surface through postJson as ctx.llm: ... returned 401. Good that the log line includes status+detail; that's what'll make the prod triage fast.

…t, codex-model fallback

- CLAUDE_CODE_OAUTH_TOKEN leg sends anthropic-beta: oauth-2025-04-20
  (setup-tokens are rejected by /v1/messages on a bare Bearer); test pins
  that the header set DIFFERS between the two anthropic sources
- DEFAULT_OPENAI_MODEL gpt-5.1 -> gpt-5.5 (fleet-current)
- gpt-*-codex persona models steer family selection but fall back to the
  default chat model for /v1/chat/completions (codex models aren't served
  there); new test

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codeant-ai

codeant-ai Bot commented Jun 4, 2026

Copy link
Copy Markdown

CodeAnt AI finished reviewing your PR.

@khaliqgant

Copy link
Copy Markdown
Member Author

All three review findings addressed at 8de5526 (47/47 tests):

  1. OAuth beta header: the CLAUDE_CODE_OAUTH_TOKEN candidate now carries anthropic-beta: oauth-2025-04-20 alongside the Bearer; the x-api-key leg explicitly does NOT (test asserts the header sets differ between the two anthropic sources, per your ask). Agreed on verifying against a live setup-token before the lockstep publish — flagging that as a pre-publish gate for whoever runs the release.
  2. OpenAI default: gpt-5.1gpt-5.5 (fleet-current).
  3. Codex passthrough: stripped models containing codex steer family selection but fall back to DEFAULT_OPENAI_MODEL for chat/completions; covered by a new test (openai-codex/gpt-5.5-codexgpt-5.5).

Also +1 on your triage note: an expired/rotated setup-token will now surface as ctx.llm: https://api.anthropic.com/v1/messages returned 401: … with the API's detail in the log line.

@khaliqgant

Copy link
Copy Markdown
Member Author

Re-reviewed the 53d18ef→8de5526 delta — all three findings closed exactly as asked: OAuth leg carries anthropic-beta: oauth-2025-04-20 with a test pinning the header-set asymmetry vs the x-api-key leg (including the negative assertion on the api-key side); gpt-5.5 default; *-codex models steer family but fall back to the default chat model with a pinned test. GO from me at 8de5526. Standing gate before lockstep publish: one live setup-token probe of the OAuth leg (unit tests prove header shape, not Anthropic acceptance).

@khaliqgant khaliqgant merged commit 07296e7 into main Jun 4, 2026
3 checks passed
@khaliqgant khaliqgant deleted the fix/cloud-defaults-llm-context branch June 4, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant