Skip to content

feat(providers/openai): add Responses computer use support#27

Draft
ibetitsmike wants to merge 11 commits intocoder_2_33from
mike/openai-computer-use
Draft

feat(providers/openai): add Responses computer use support#27
ibetitsmike wants to merge 11 commits intocoder_2_33from
mike/openai-computer-use

Conversation

@ibetitsmike
Copy link
Copy Markdown

Summary

Add OpenAI Responses computer-use support to the openai provider so
downstream consumers (Coder) can drive OpenAI's computer-use-preview
model through the existing fantasy language-model interface.

Problem

The openai provider only exposed text tools through the Responses
API. There was no way to declare the Responses-native computer-use
tool, parse computer_call outputs, or round-trip screenshots and
safety acknowledgments back to OpenAI. Coder's computer-use subagent
was consequently hardcoded to Anthropic.

Fix

  • New providers/openai/computer_use.go exposes a
    ProviderDefinedTool with id openai.computer, typed
    ComputerUseToolOptions (display dimensions + environment), and a
    local ComputerUseInput representation of batched actions.
  • responses_language_model.go toResponsesTools accepts the new tool
    and converts it to responses.ComputerUsePreviewToolParam. Invalid
    dimensions fail request preparation instead of warning silently.
  • toResponsesPrompt maps tool results back to their originating
    computer_call via OpenAIComputerUseCallMetadata (call id and
    pending safety checks). Results without metadata hard-fail so
    malformed prompts do not reach the API.
  • IsResponsesModel and getResponsesModelConfig share a narrow
    allowlist (computer-use-preview, computer-use-preview-2025-03-11)
    instead of broad strings.Contains.
  • Unit tests cover parsing, tool conversion, validation (including
    zero/negative widths and heights), and batched-actions handling.
  • providertests/openai_responses_test.go adds an integration test
    gated on a VCR cassette (cassette recording is a follow-up; the test
    is currently skipped with a clear message).

store=true is required for this to work because the Responses API
persists computer-use state server-side; Coder enforces that at the
model-config boundary.

Authored by Mux on behalf of Mike.

@hugodutka
Copy link
Copy Markdown
Collaborator

My implementation is at https://github.com/hugodutka/fantasy/tree/hugodutka/openai-computer-use.

🤖 I compared this PR against d139114, and I don’t think this is correct as-is.

The main bug is in providers/openai/computer_use.go: computerUseToolResultInput() maps metadata.PendingSafetyChecks straight into acknowledged_safety_checks. That means every pending safety check is auto-acknowledged on replay, without the executor explicitly opting in. I think that has to be executor-driven. d139114 fixes this by adding ToolResponse.ProviderMetadata plumbing in tool.go / agent.go and then reading explicit ComputerUseOutputMetadata in toResponsesPrompt().

I also think d139114 has the better core abstraction overall. This PR builds a local computer-use action model and reconstructs the OpenAI payload from that, while d139114 uses the SDK-native types and preserves the raw computer_call JSON (ComputerUseMetadata{RawJSON: ...}) for round-tripping. That matches the plan much better and avoids having the Fantasy-side representation drift from the wire format.

There’s also a correctness issue in toResponsesTools(): ToolChoice("computer") still falls through to OfFunctionTool instead of selecting the hosted computer tool. d139114 handles that case explicitly by emitting OfHostedTool when a computer-use tool is present.

If you want a concrete direction, I’d start from d139114’s shape in tool.go, agent.go, providers/openai/computer_use.go, and providers/openai/responses_language_model.go, then port over whichever tests from this PR you want to keep. I think the extra Generate/Stream test coverage here is useful, but the core implementation in d139114 is stronger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants