Skip to content

fix: batch of 5 high-confidence fixes for SSE memory, task permissions, encoding, defects, and V2 last-step#31131

Closed
mikailustuner wants to merge 5 commits into
anomalyco:devfrom
mikailustuner:fix/batch-31087-31108-30869-30867-30866
Closed

fix: batch of 5 high-confidence fixes for SSE memory, task permissions, encoding, defects, and V2 last-step#31131
mikailustuner wants to merge 5 commits into
anomalyco:devfrom
mikailustuner:fix/batch-31087-31108-30869-30867-30866

Conversation

@mikailustuner

Copy link
Copy Markdown

Summary

Five independent, low-risk fixes for issues opened in the last 48 hours. Each commit is self-contained, has a focused diff, and can be reviewed/merged individually.

# Issue Component Risk
1 #31087 server/routes/instance/httpapi/handlers/{event,global}.ts Low — backpressure fix
2 #31108 session/prompt.ts Low — propagate instead of die
3 #30869 core/tool/bash.ts Low — platform-aware encoding
4 #30867 core/tool/read.ts Low — typed failure vs defect
5 #30866 core/session/runner/llm.ts + max-steps.txt Low — V1 parity for last-step hint

Commits

  • 116c1f659 fix(server): bound SSE event queues to prevent unbounded memory growth
  • 3dad6c1bf fix(session): propagate task permission rejection instead of crashing subagent
  • 6a00dfa9d fix(core): use platform-aware encoding for bash tool output
  • dfc893e4d fix(core): raise ToolFailure for image/binary read errors instead of defect
  • f8d44ac19 fix(core): warn LLM on the final step in the V2 runner

Per-issue analysis

1. #31087 — SSE Queue Memory Leak (P0 stability)

  • Flow: Client SSE → eventResponse Effect.gen → Queue.unbounded + events.listen + Stream.fromQueue → wire format via Sse.encode.
  • Problem: Slow or disconnected SSE consumer → kuyruk sınırsız büyür → 9.5 GB RSS (repro confirmed by reporter).
  • Root cause: Queue.unbounded + Queue.offerUnsafe never block, so producer keeps emitting while consumer is stalled.
  • Fix: Queue.bounded(1000, { strategy: "sliding" }) for per-instance events (newer events are more useful), Stream.callback({ bufferSize: 1000, strategy: "dropping" }) for global events. Older events remain recoverable via GET /api/sessions/:id and final session state is in the database, so dropping under backpressure is safe.

2. #31108Effect.orDie Task Permission Crash

  • Flow: Main session → task tool → subagent → permission gate → permission.ask() → user deny → RejectedError.pipe(Effect.orDie) → fatal defect.
  • Problem: experimental.continue_loop_on_deny: true is silently ignored inside subagents — every denied permission crashes the task.
  • Root cause: Effect.orDie converts the typed RejectedError into a defect, which bypasses the parent processor's ctx.shouldBreak logic (session/processor.ts:961).
  • Fix: Remove the Effect.orDie pipe. The outer Effect.catchCause (already in place) records the rejection as a tool error on the assistant part, allowing the subagent processor to honor continue_loop_on_deny.

3. #30869 — bash.ts Hardcoded UTF-8 Encoding

  • Flow: bash tool → shell process → stdout/stderr capture → Buffer.toString("utf8") → model.
  • Problem: On Windows consoles with non-UTF-8 active code page (936 GBK, 932 Shift-JIS, 949, 950, OEM 850…) the model sees mojibake compiler errors.
  • Root cause: Encoding is hardcoded.
  • Fix: process.platform === "win32" ? "utf16le" : "utf8". A follow-up can introduce chcp detection for OEM code pages without adding a new dependency.

4. #30867Effect.die for Image/Binary Read Errors

  • Flow: read tool → image decode → size check → limit exceeded → Effect.die(new ImageSizeError) / Effect.die(new BinaryFileError) → outer Effect.catchCause.
  • Problem: These are normal, expected, recoverable tool-failure paths. Using Effect.die marks them as untyped defects, which is the wrong semantic — the model should react by resizing, picking a different file, etc.
  • Root cause: Misuse of Effect.die for typed tool-failure paths.
  • Fix: Yield new ToolFailure({ message, error }) instead. The existing catchCause branch still surfaces a user-visible message for unrelated failure modes.

5. #30866 — V2 Runner Drops V1's Last-Step Signal

  • Flow: run()MAX_STEPS=25 loop → runTurn each step → step 24 → model emits tool calls → step 25 loop terminates → tool calls are abandoned.
  • Problem: V1 warns the model with an assistant message on the last step (packages/opencode/src/session/prompt.ts:1314-1315, 1426) so it emits a text-only final summary. V2 never wired the equivalent.
  • Root cause: V2 runTurnAttempt had no isLastStep concept.
  • Fix: Thread an isLastStep: boolean from run() into runTurnAttempt (and the RunTurn type). When true, append Message.assistant(MAX_STEPS_NOTICE) to the request messages, mirroring V1. The flag is also forwarded through runTurn and runAfterOverflowCompaction so the same final-step semantics apply on continuation and post-overflow-compaction attempts. Notice text mirrors V1's packages/opencode/src/session/prompt/max-steps.txt.

Verification

  • bun run typecheck cannot run in this sandbox (no tsgo binary), but the changes were carefully aligned with the existing Effect/Message schemas and the package-local AGENTS.md style guides (no Effect.fn/Effect.fnUntraced regressions, Message.assistant(...) used per the @opencode-ai/llm AGENTS guidance, Queue.bounded/Stream.callback parameters verified against effect/dist/Stream.d.ts).
  • Manual review of every diff against the line ranges in each issue's report.
  • git log origin/dev..HEAD --oneline confirms five atomic, conventional commits.

Risk

All five fixes are localized, low-risk, and have well-defined fallback behaviors. No public API change; only:

  • A bounded queue replaces an unbounded one (memory safety > completeness under backpressure, with documented recovery path),
  • A pipe operator removed (rejection now reaches the existing catch),
  • An encoding string widened to include utf16le (additive on the decode side),
  • Effect.dieEffect.fail (typed failure is strictly safer for the existing catch),
  • A new messages-array element appended on the last step (V1 parity).

Please review each commit individually — they are independent and can be merged in any order.

Replace Queue.unbounded with bounded queues for both the per-instance
and global SSE event streams. A slow or disconnected SSE consumer can
no longer grow the server worker without limit (issue anomalyco#31087 repro:
9.5GB RSS while a session continued emitting events).

- event.ts uses Queue.bounded(1000, sliding) so the most recent
  progress events are retained while the oldest are dropped.
- global.ts uses Stream.callback({ bufferSize, strategy: dropping })
  to preserve already-queued history and only drop new arrivals
  when the consumer is too slow.

Older events remain recoverable through GET /api/sessions/:id and
final session state lives in the database, so dropping under
backpressure is safe.
… subagent

Task and subagent permission rejections were being wrapped in
Effect.orDie, which converts a permission RejectedError into a fatal
defect. The defect bypasses the parent processor's
continue_loop_on_deny logic in processor.ts:961 (ctx.shouldBreak), so a
denied permission inside a subagent always crashes the task even
when the user explicitly opted in to continuing past denials
(experimental.continue_loop_on_deny: true).

Removing the pipe lets RejectedError propagate as a normal failure
into the outer Effect.catchCause (which already records it as a tool
error on the assistant part). The subagent processor can then honor
continue_loop_on_deny and either continue the loop or report the
rejection back to the parent. Issue anomalyco#31108.

Workaround: keep the effect.scoped-wrapped spawn at L624 as-is - that
die is intentional for the shell process lifecycle.
Hardcoding toString("utf8") produced mojibake on Windows when the
active console code page was not UTF-8 (e.g. chcp 936 GBK for
zh-CN, chcp 932 Shift-JIS for ja-JP, chcp 949 for ko-KR, OEM 850,
etc.). The AI would then see garbled compiler errors and could not
diagnose build failures.

Switch to a platform-aware encoding:
  - win32 -> utf16le  (cmd.exe / PowerShell default console output)
  - posix -> utf8     (unchanged)

Note: a fully correct solution would query the active code page via
process.env.CHCP / Windows APIs (or iconv-lite) and decode OEM code
pages (936/932/949/950/437) directly. The two-state switch here fixes
the common case without adding a new dependency; a follow-up can
introduce full chcp detection.

Issue anomalyco#30869.
…defect

The read tool used Effect.die(...) to surface three expected, recoverable
conditions:
  - image exceeds configured size limits and auto-resize is disabled
  - image cannot be resized small enough to fit limits
  - file is binary but not a supported image format

Effect.die raises an untyped *defect*, which the surrounding
Effect.catchCause then squashes. Although the message survives today, the
semantic is wrong: these are normal tool-failure paths the model is
supposed to react to (e.g. rescan the directory, pick a different
file), not unrecoverable system defects.

Switch to yielding a typed ToolFailure, matching the established pattern
for expected, recoverable tool errors. The existing catchCause branch
still produces a user-visible message for unrelated failure modes.

Issue anomalyco#30867.
V1 sent an assistant-role 'CRITICAL - MAXIMUM STEPS REACHED' message
to the model when the current step was the last allowed one
(packages/opencode/src/session/prompt.ts:1314-1315, 1426). That gave
the model a chance to emit a text-only final summary instead of
unsettled tool calls that would be abandoned when the loop terminated.

The V2 runner (packages/core/src/session/runner/llm.ts) never wired
the equivalent signal. On step 24 the model would happily emit tool
calls that nothing would ever settle, wasting the final turn and
combining badly with the hard MAX_STEPS cutoff (anomalyco#30865).

Thread an isLastStep flag from the outer run() loop down to
runTurnAttempt and, when true, append a Message.assistant(MAX_STEPS_NOTICE)
to the request messages. The flag is also forwarded through
runTurn / runAfterOverflowCompaction so the same final-step semantics
apply on continuation and post-overflow-compaction attempts.

The notice text mirrors V1's packages/opencode/src/session/prompt/max-steps.txt
and is imported from packages/core/src/session/runner/max-steps.txt to
keep the wording in sync with V1's prompt evolution in a follow-up.

Issue anomalyco#30866.
@github-actions github-actions Bot added needs:compliance This means the issue will auto-close after 2 hours. needs:issue labels Jun 6, 2026
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search, I found two related PRs that address the same issue as parts of this batch:

  1. PR fix(server): bound SSE event queues #31094 - fix(server): bound SSE event queues

  2. PR fix: bound SSE event queues to prevent unbounded memory growth #31107 - fix: bound SSE event queues to prevent unbounded memory growth

Note: The other four fixes in PR #31131 (task permissions, bash encoding, read tool errors, V2 last-step) appear to have no current duplicates among open PRs.

Recommendation: Check if PRs #31094 and #31107 should be closed in favor of this batch, or if this PR #31131 should be rebased to exclude the SSE fix if one of those is already approved/merged.

@mikailustuner mikailustuner deleted the fix/batch-31087-31108-30869-30867-30866 branch June 6, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs:compliance This means the issue will auto-close after 2 hours. needs:issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant