🤖 feat: make Anthropic prompt cache TTL configurable#2293
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4b9d0ca9b9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed the TTL propagation issue for Anthropic-routed gateway models and pushed a follow-up fix. |
|
Codex Review: Didn't find any major issues. Can't wait for the next one! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
FYI: I addressed the Codex review thread, pushed a follow-up fix, and reran failed checks.
Latest failing job: https://github.com/coder/mux/actions/runs/21827954163/job/62979120548 |
e45bfa6 to
6fc00d9
Compare
ad46c50 to
643e093
Compare
Add Anthropic cache TTL support (`5m` / `1h`) across provider options, cache strategy, stream pipeline, and fetch-level cache_control injection, with tests for TTL propagation. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `.50`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=2.50 -->
Ensure stream-level system/tool cache markers honor configured Anthropic TTL even for Anthropic-routed gateway models whose providerOptions are not under the anthropic namespace. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `.50`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=2.50 -->
Add a provider-level Anthropic setting in Providers to configure prompt cache TTL with defensive value guards and default clearing behavior. Also add a regression test ensuring persisted Anthropic cache TTL is propagated into send options from storage. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$3.19`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=3.19 -->
Move Anthropic prompt cache TTL persistence from frontend localStorage to backend providers.jsonc and make backend config authoritative for Anthropic-routed models. - expose anthropic cacheTtl in provider config IPC schema - surface valid cacheTtl in ProviderService getConfig with tests - inject backend cacheTtl in ProviderModelFactory for anthropic and anthropic/* routes - update Providers settings UI to read/write cacheTtl through provider config API --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$4.21`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=4.21 -->
Fix flaky/deterministic failure in tests/ipc/resumeStream.test.ts where collector1 could observe duplicate user messages. The first collector started and sent a message immediately without waiting for the onChat subscription caught-up signal. Under CI timing, initial history replay can race with live append and emit the same user message twice. - await collector1.waitForSubscription(10000) before sending - add explanatory comment about replay/live race --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$4.21`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=4.21 -->
Resolve post-rebase CI failures by removing a leftover `anthropicCacheTtlOverride` reference in `buildStreamRequestConfig`. During rebase conflict resolution, StreamManager kept HEAD's request-header-based API but retained one line from an older cache-ttl-override approach, which triggered lint/type errors. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$4.21`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=4.21 -->
f853444 to
27153a9
Compare
Simplify the Anthropic prompt cache TTL selector by removing the explicit "5 minutes" entry and keeping only: - Default (5m) - 1 hour Also normalize persisted values in the selector so non-1h values map to default behavior. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$4.21`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=4.21 -->
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab4a022aac
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address Codex review feedback by restoring TTL override propagation for Anthropic-routed requests that do not carry `providerOptions.anthropic.cacheControl` (notably OpenRouter Anthropic routes). - thread optional `anthropicCacheTtlOverride` through StreamManager start/create/build request methods - prefer explicit override over providerOptions-derived TTL when building cached system/tool cache markers - pass effective mux Anthropic cache TTL from AIService into StreamManager.startStream This ensures 1h TTL selections are applied consistently across stream cache breakpoints. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$4.21`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=4.21 -->
|
@codex review Addressed the TTL propagation issue for Anthropic-routed OpenRouter requests and resolved the prior thread. Please re-review. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77df6f7677
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address Codex review feedback by applying configured Anthropic cache TTL even when cache markers are already present in request payloads. - add mergeAnthropicCacheControl() helper - merge existing marker values and override ttl when cacheTtl is configured - replace `??=` cache marker writes for tools, prompt providerOptions, and content parts This ensures 1h TTL selection is reliably reflected in final Anthropic/OpenRouter HTTP payloads. --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$4.21`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=4.21 -->
|
@codex review Addressed the cache-marker TTL override feedback in providerModelFactory and resolved the thread. Please re-review. |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Add configurable Anthropic prompt cache TTL support in Mux, allowing
5m(default) or1hto flow through provider options, message/tool cache markers, and fetch-level request patching.Background
Mux already applies Anthropic prompt caching automatically, but it always used
cache_control: { type: "ephemeral" }with no TTL selection. Anthropic now supports explicit TTL values of"5m"and"1h", with different write pricing. This change makes TTL selectable while preserving existing defaults.Implementation
anthropic.cacheTtltoMuxProviderOptionsSchemaasz.enum(["5m", "1h"]).nullish().applyCacheControl(..., cacheTtl?)createCachedSystemMessage(..., cacheTtl?)applyCacheControlToTools(..., cacheTtl?)cacheControlwhencacheTtlis configured.wrapFetchWithAnthropicCacheControlto inject TTL-aware rawcache_controlon tools/messages for both direct Anthropic and mux-gateway Anthropic routes.aiService -> prepareMessagesForProviderstreamManagertool/system cache control applicationstreamManagerwith typed guards:isRecordisAnthropicCacheTtlgetAnthropicCacheTtlsrc/common/utils/ai/cacheStrategy.test.tssrc/common/utils/ai/providerOptions.test.tsValidation
bun test src/common/utils/ai/cacheStrategy.test.ts src/common/utils/ai/providerOptions.test.tsmake typecheckbun test src/node/services/streamManager.test.ts src/node/services/aiService.test.tsmake static-checkRisks
Low-to-medium risk in Anthropic request shaping paths (provider options + fetch wrapper), mitigated by unit coverage and full static checks. Default behavior remains unchanged when
cacheTtlis unset.Generated with
mux• Model:openai:gpt-5.3-codex• Thinking:xhigh• Cost:$2.50