Skip to content

feat: Phase 1 attachment – per-provider convertDocument wiring#2608

Open
simonferquel-clanker wants to merge 4 commits intodocker:mainfrom
simonferquel-clanker:feat/phase1-attachment-providers
Open

feat: Phase 1 attachment – per-provider convertDocument wiring#2608
simonferquel-clanker wants to merge 4 commits intodocker:mainfrom
simonferquel-clanker:feat/phase1-attachment-providers

Conversation

@simonferquel-clanker
Copy link
Copy Markdown
Contributor

Summary

Stacks on PR #2607 (feat/phase1-attachment-foundation). Merge #2607 first.

Implements per-provider convertDocument wiring for every supported provider.

Refs #2595.

Changes

pkg/model/provider/openai

  • attachments.go: capabilityTableByProvider map (openai, azure, mistral, xai, ollama, nebius, minimax, github-copilot, requesty). SupportedMIMETypes() implements attachment.Advisor. convertDocument() dispatches via attachment.Decide.
    • Note: application/pdf excluded from openai/azure/requesty in Phase 1 (requires Files API — Phase 2).
  • attachments_wire.go: expandDocumentParts pre-processes document parts into synthetic text/image parts, then delegates to existing converters (avoids duplicating Responses API logic).
  • client.go: uses convertMessagesCtx / convertMessagesToResponseInputCtx.

pkg/model/provider/anthropic

  • attachments.go: images→ImageBlock, PDFs→NewDocumentBlock, text→TXTEnvelope. convertBetaDocument() for Beta API.
  • Wired into convertUserMultiContent and convertBetaUserMultiContent.

pkg/model/provider/gemini

  • attachments.go: images/video/audio/PDF as B64 blobs, text as TXTEnvelope.
  • convertMultiContent/convertMessagesToGemini promoted to methods with context.

pkg/model/provider/bedrock

  • attachments.go: images→ContentBlockMemberImage, PDFs→ContentBlockMemberDocument (sanitized name), text→ContentBlockMemberText. Context-aware convertMessagesCtx.

pkg/model/provider/vertexai

  • Comment-only attachments.go (delegates to embedded anthropic/openai client).

Tests

  • Per-provider attachments_test.go: all 6 strategies covered with httptest.Server for fetch strategies.

What this PR does NOT do

  • No pkg/app, pkg/cli, pkg/runtime changes (PR 3)
  • No Files API (Phase 2)

…e, CapabilityTable

- Add MessagePartTypeDocument constant and Document/DocumentSource types to pkg/chat
- Extend MessagePart with Document field
- New pkg/attachment package with Strategy enum, Capability, CapabilityTable,
  Decide function, TXTEnvelope helper, FetchURL, and Advisor interface
- Table-driven tests covering all Decide branches (6 strategy outcomes + MIME miss)
- httptest-based tests for FetchURL (success, 404, 500, cancelled ctx, binary)

Closes docker#2595

Assisted-By: docker-agent
- B1: TXTEnvelope attribute renamed from type= to mime-type= (format
  string, godoc example, and test assertion updated)
- S2: InlineData check reverted to nil comparison (spec behaviour);
  new test case locks in that non-nil empty slice → StrategyB64
- S3: Decide godoc now documents URL > InlineData > InlineText
  precedence, and notes intentional non-empty reason strings for
  FetchAsB64/FetchAsTXT fallbacks (S1 decision)
- S4: double doc.Source.URL check collapsed into single switch block
- N1/N2: TODO comments added to fetch.go for per-runtime timeout
  configurability and SSRF mitigations in Phase 2

Assisted-By: docker-agent
Add document attachment support to every provider package.

## openai
- capabilityTableByProvider: per-alias MIME→Capability tables for openai,
  azure, mistral, xai, ollama, nebius, minimax, github-copilot, requesty.
  application/pdf excluded from openai/azure/requesty in Phase 1 — OpenAI
  requires the Files API for PDFs (Phase 2).
- SupportedMIMETypes() implements attachment.Advisor.
- convertDocument() dispatches via attachment.Decide to convertViaURL,
  convertViaB64 (data-URL), convertViaTXT (TXTEnvelope), convertViaFetch*.
- attachments_wire.go: expandDocumentParts pre-processes MultiContent
  document parts into synthetic text/image MessageParts before delegating
  to oaistream.ConvertMessages and the existing convertMessagesToResponseInput.
  Avoids duplicating the complex Responses API conversion logic.

## anthropic
- capabilityTable: images (B64+URL), PDF (B64), text (TXT).
- convertDocument: images→ImageBlock, PDFs→NewDocumentBlock, text→TextBlock.
- convertBetaDocument: same for the Beta API with Beta* block types.
- Wired into convertUserMultiContent and convertBetaUserMultiContent.

## gemini
- capabilityTable: images, video, audio, PDF (all B64), text (TXT).
- convertDocument: binary→genai.NewPartFromBytes, text→NewPartFromText.
- convertMultiContent and convertMessagesToGemini promoted from package
  functions to methods (c *Client) to carry context for document fetching.
- Client_test updated to call the method form.

## bedrock
- capabilityTable: images (B64), PDF (B64), text (TXT).
- convertDocument: images→ContentBlockMemberImage, PDF→ContentBlockMemberDocument
  (with sanitized name per Bedrock's constraint), text→ContentBlockMemberText.
- convertMessagesCtx method for context-aware document expansion; fast-paths
  to the existing convertMessages for non-document messages.
- buildConverseStreamInput now context-aware (takes ctx, returns error).

## vertexai
- Comment-only attachments.go explaining delegation to the embedded
  anthropic.Client or openai.Client.

## Tests
- Per-provider attachments_test.go covering all 6 strategies
  (TXT, B64, URL, FetchAsB64, FetchAsTXT, Drop) with httptest.Server for
  fetch strategies. Bedrock includes sanitizeDocName unit tests.

Refs docker#2595, PR docker#2607

Assisted-By: docker-agent
@simonferquel-clanker simonferquel-clanker requested a review from a team as a code owner April 30, 2026 17:31
B1: Remove application/pdf from Mistral's Phase 1 capability table.
Mistral requires a document_url content part (not image_url) for PDFs —
a type the openai-go SDK does not model. Deferring to Phase 2 is consistent
with OpenAI/Azure. Added explanatory comment in capabilityTableByProvider.

B2: Fix TestOpenAI_ConvertDocument_FetchAsB64 to actually exercise
StrategyFetchAsB64. The previous test used image/png on the openai profile
(which has URL capability), so it resolved to StrategyURL, not FetchAsB64.
New test uses the ollama profile (image/jpeg: {B64:true}, no URL) with a
URL source, confirming the URL is fetched and returned as a base64 data-URL.

S3: Add guard comment to anthropic convertDocViaURL noting that if PDF URL
capability is ever added, NewDocumentBlock(URLPDFSourceParam) must be used
instead of NewImageBlock.

S6: Deduplicate identical capability tables. Introduce two shared vars:
- capabilityTableImageURLB64 (openai / azure / requesty)
- capabilityTableImageB64Only (ollama / nebius / minimax / github-copilot)
capabilityTableByProvider now references these by value, eliminating
copy-paste drift risk.

Assisted-By: docker-agent
@docker-agent
Copy link
Copy Markdown

docker-agent Bot commented Apr 30, 2026

PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant