feat: Phase 1 attachment – per-provider convertDocument wiring#2608
Open
simonferquel-clanker wants to merge 4 commits intodocker:mainfrom
Open
feat: Phase 1 attachment – per-provider convertDocument wiring#2608simonferquel-clanker wants to merge 4 commits intodocker:mainfrom
simonferquel-clanker wants to merge 4 commits intodocker:mainfrom
Conversation
…e, CapabilityTable - Add MessagePartTypeDocument constant and Document/DocumentSource types to pkg/chat - Extend MessagePart with Document field - New pkg/attachment package with Strategy enum, Capability, CapabilityTable, Decide function, TXTEnvelope helper, FetchURL, and Advisor interface - Table-driven tests covering all Decide branches (6 strategy outcomes + MIME miss) - httptest-based tests for FetchURL (success, 404, 500, cancelled ctx, binary) Closes docker#2595 Assisted-By: docker-agent
- B1: TXTEnvelope attribute renamed from type= to mime-type= (format string, godoc example, and test assertion updated) - S2: InlineData check reverted to nil comparison (spec behaviour); new test case locks in that non-nil empty slice → StrategyB64 - S3: Decide godoc now documents URL > InlineData > InlineText precedence, and notes intentional non-empty reason strings for FetchAsB64/FetchAsTXT fallbacks (S1 decision) - S4: double doc.Source.URL check collapsed into single switch block - N1/N2: TODO comments added to fetch.go for per-runtime timeout configurability and SSRF mitigations in Phase 2 Assisted-By: docker-agent
Add document attachment support to every provider package. ## openai - capabilityTableByProvider: per-alias MIME→Capability tables for openai, azure, mistral, xai, ollama, nebius, minimax, github-copilot, requesty. application/pdf excluded from openai/azure/requesty in Phase 1 — OpenAI requires the Files API for PDFs (Phase 2). - SupportedMIMETypes() implements attachment.Advisor. - convertDocument() dispatches via attachment.Decide to convertViaURL, convertViaB64 (data-URL), convertViaTXT (TXTEnvelope), convertViaFetch*. - attachments_wire.go: expandDocumentParts pre-processes MultiContent document parts into synthetic text/image MessageParts before delegating to oaistream.ConvertMessages and the existing convertMessagesToResponseInput. Avoids duplicating the complex Responses API conversion logic. ## anthropic - capabilityTable: images (B64+URL), PDF (B64), text (TXT). - convertDocument: images→ImageBlock, PDFs→NewDocumentBlock, text→TextBlock. - convertBetaDocument: same for the Beta API with Beta* block types. - Wired into convertUserMultiContent and convertBetaUserMultiContent. ## gemini - capabilityTable: images, video, audio, PDF (all B64), text (TXT). - convertDocument: binary→genai.NewPartFromBytes, text→NewPartFromText. - convertMultiContent and convertMessagesToGemini promoted from package functions to methods (c *Client) to carry context for document fetching. - Client_test updated to call the method form. ## bedrock - capabilityTable: images (B64), PDF (B64), text (TXT). - convertDocument: images→ContentBlockMemberImage, PDF→ContentBlockMemberDocument (with sanitized name per Bedrock's constraint), text→ContentBlockMemberText. - convertMessagesCtx method for context-aware document expansion; fast-paths to the existing convertMessages for non-document messages. - buildConverseStreamInput now context-aware (takes ctx, returns error). ## vertexai - Comment-only attachments.go explaining delegation to the embedded anthropic.Client or openai.Client. ## Tests - Per-provider attachments_test.go covering all 6 strategies (TXT, B64, URL, FetchAsB64, FetchAsTXT, Drop) with httptest.Server for fetch strategies. Bedrock includes sanitizeDocName unit tests. Refs docker#2595, PR docker#2607 Assisted-By: docker-agent
B1: Remove application/pdf from Mistral's Phase 1 capability table.
Mistral requires a document_url content part (not image_url) for PDFs —
a type the openai-go SDK does not model. Deferring to Phase 2 is consistent
with OpenAI/Azure. Added explanatory comment in capabilityTableByProvider.
B2: Fix TestOpenAI_ConvertDocument_FetchAsB64 to actually exercise
StrategyFetchAsB64. The previous test used image/png on the openai profile
(which has URL capability), so it resolved to StrategyURL, not FetchAsB64.
New test uses the ollama profile (image/jpeg: {B64:true}, no URL) with a
URL source, confirming the URL is fetched and returned as a base64 data-URL.
S3: Add guard comment to anthropic convertDocViaURL noting that if PDF URL
capability is ever added, NewDocumentBlock(URLPDFSourceParam) must be used
instead of NewImageBlock.
S6: Deduplicate identical capability tables. Introduce two shared vars:
- capabilityTableImageURLB64 (openai / azure / requesty)
- capabilityTableImageB64Only (ollama / nebius / minimax / github-copilot)
capabilityTableByProvider now references these by value, eliminating
copy-paste drift risk.
Assisted-By: docker-agent
|
❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacks on PR #2607 (
feat/phase1-attachment-foundation). Merge #2607 first.Implements per-provider
convertDocumentwiring for every supported provider.Refs #2595.
Changes
pkg/model/provider/openaiattachments.go:capabilityTableByProvidermap (openai, azure, mistral, xai, ollama, nebius, minimax, github-copilot, requesty).SupportedMIMETypes()implementsattachment.Advisor.convertDocument()dispatches viaattachment.Decide.application/pdfexcluded from openai/azure/requesty in Phase 1 (requires Files API — Phase 2).attachments_wire.go:expandDocumentPartspre-processes document parts into synthetic text/image parts, then delegates to existing converters (avoids duplicating Responses API logic).client.go: usesconvertMessagesCtx/convertMessagesToResponseInputCtx.pkg/model/provider/anthropicattachments.go: images→ImageBlock, PDFs→NewDocumentBlock, text→TXTEnvelope.convertBetaDocument()for Beta API.convertUserMultiContentandconvertBetaUserMultiContent.pkg/model/provider/geminiattachments.go: images/video/audio/PDF as B64 blobs, text as TXTEnvelope.convertMultiContent/convertMessagesToGeminipromoted to methods with context.pkg/model/provider/bedrockattachments.go: images→ContentBlockMemberImage, PDFs→ContentBlockMemberDocument (sanitized name), text→ContentBlockMemberText. Context-awareconvertMessagesCtx.pkg/model/provider/vertexaiattachments.go(delegates to embedded anthropic/openai client).Tests
attachments_test.go: all 6 strategies covered withhttptest.Serverfor fetch strategies.What this PR does NOT do
pkg/app,pkg/cli,pkg/runtimechanges (PR 3)