fix: support image input in OpenAI Chat user messages#26826
Conversation
Closes anomalyco#20802 Converts MediaPart to OpenAI image_url content block format in user messages.
|
The following comment was made by an LLM, it may be inaccurate: Related PR Found:
Why it's related: |
There was a problem hiding this comment.
Pull request overview
This PR fixes OpenAI Chat protocol request lowering to support multimodal user messages by translating MediaPart inputs into OpenAI Chat image_url content blocks, allowing image attachments to reach vision-capable OpenAI-compatible /chat/completions backends.
Changes:
- Extended the OpenAI Chat request schema so
user.contentcan be either astringor an array of{type: "text" | "image_url"}content blocks. - Implemented
lowerUserPart/ updatedlowerUserMessageto convertMediaPartintoimage_urldata URLs (base64), and emit content blocks when any media is present. - Added unit tests to cover media-only, mixed text+media, and text-only user message lowering behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| packages/llm/src/protocols/openai-chat.ts | Adds schemas for multimodal content blocks and lowers MediaPart into OpenAI Chat image_url blocks for user messages. |
| packages/llm/test/provider/openai-chat.test.ts | Adds/updates tests asserting correct request-body lowering for media-only, mixed, and text-only user messages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
This PR fixes a hard blocker for us — Kimi For Coding vision via static API token (employee subscription, no OAuth). The opencode-kimi-full plugin works but requires device-flow OAuth, which enterprise tokens can't use. The generic openai-compatible path is the only auth shape available to us, and it's broken on images until this lands. Tested the same payload outside opencode (Roo Code in VS Code, same token, same endpoint) — vision works. Confirms it's the opencode adapter, exactly what this PR addresses. Please prioritize merging 🙏 |
Issue for this PR
Closes #20802
Type of change
What does this PR do?
Adds image input support in the OpenAI Chat protocol layer by converting
MediaPartto OpenAI'simage_urlcontent block format in user messages.Root cause: In
packages/llm/src/protocols/openai-chat.ts, thelowerUserMessagefunction only acceptedTextPartcontent. When aMediaPartwas encountered, it returned anunsupportedContenterror, preventing image attachments from reaching vision-capable models.Fix:
OpenAIChatTextContentBlockandOpenAIChatImageUrlContentBlockschemasstring | ContentBlock[]lowerUserPartfunction that converts:TextPart→{ type: "text", text: "..." }MediaPart→{ type: "image_url", image_url: { url: "data:<mediaType>;base64,<data>" } }lowerUserMessageto use content blocks when media is presentThis is a protocol-layer fix that complements the provider-layer fix in #21627. While #21627 addresses capability detection, this PR ensures the conversion logic at the protocol level correctly transforms media parts into the OpenAI-compatible format.
How did you verify your code works?
prepares user message with media as image_url content blockprepares user message with mixed text and mediaprepares user message with only text (no content blocks)Checklist
Comparison with #21627
#21627 fixes image support at the provider capability detection layer (1-line change in
provider.ts).This PR fixes image support at the protocol conversion layer in
openai-chat.ts, ensuringMediaPartis correctly transformed toimage_urlcontent blocks. Both PRs address the same end goal but at different layers of the stack, and they are complementary.