Python: Add prompt caching support to Anthropic connector#13947
Python: Add prompt caching support to Anthropic connector#13947Vizhy wants to merge 3 commits intomicrosoft:mainfrom
Conversation
Adds AnthropicCacheSettings and a `cache` field on AnthropicChatPromptExecutionSettings
to enable opt-in prompt caching via the Anthropic cache_control API.
When enabled, prepare_settings_dict() injects cache_control blocks on the system
message and the last tool definition before the request is sent. No changes to
AnthropicChatCompletion — caching is fully contained in the settings layer.
Off by default; opt in with cache=AnthropicCacheSettings.on().
Convenience constructors: .on() .off() .system() .tools() .short() .long()
TTL: "5m" -> {"type":"ephemeral"}, "1h" -> {"type":"ephemeral","ttl":3600}
Includes 16 new unit tests and a usage sample at
samples/concepts/caching/anthropic_prompt_caching.py.
There was a problem hiding this comment.
Pull request overview
Adds opt-in prompt caching support to the Python Anthropic connector by introducing a cache settings model and injecting Anthropic cache_control blocks into the serialized request payload (system content block and/or the last tool definition).
Changes:
- Introduces
AnthropicCacheSettingsand exposes it as a public API viasemantic_kernel.connectors.ai.anthropic. - Extends
AnthropicChatPromptExecutionSettingswith an excludedcachefield and injectscache_controlduringprepare_settings_dict(). - Adds unit tests for caching settings/injection behavior and a new sample demonstrating prompt caching usage.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| python/tests/unit/connectors/ai/anthropic/test_anthropic_request_settings.py | Adds unit tests covering cache settings constructors and prepare_settings_dict() injection behavior. |
| python/semantic_kernel/connectors/ai/anthropic/prompt_execution_settings/anthropic_prompt_execution_settings.py | Adds AnthropicCacheSettings, adds cache to execution settings (excluded from serialization), and injects cache_control into outbound payload. |
| python/semantic_kernel/connectors/ai/anthropic/init.py | Exports AnthropicCacheSettings as part of the Anthropic connector public surface. |
| python/samples/concepts/caching/anthropic_prompt_caching.py | Adds a runnable sample demonstrating multi-turn Anthropic prompt caching. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class AnthropicCacheSettings(BaseModel): | ||
| """Configuration for Anthropic prompt caching. | ||
|
|
||
| Controls which parts of the request receive cache_control injection. | ||
|
|
There was a problem hiding this comment.
Fixed in f71799f. AnthropicCacheSettings now inherits KernelBaseSettings with env_prefix = ANTHROPIC_CACHE_, consistent with the rest of the SDK (validate_assignment, populate_by_name, arbitrary_types_allowed). This also unlocks env-var control out of the box: ANTHROPIC_CACHE_ENABLED=true, ANTHROPIC_CACHE_INCLUDE_SYSTEM=true, ANTHROPIC_CACHE_TTL=1h, etc. Took the opportunity to rename cache_system/cache_tools to include_system/include_tools to remove the redundant cache prefix on fields inside a cache settings class.
| if self.cache.cache_tools: | ||
| tools: list[dict[str, Any]] | None = data.get("tools") | ||
| if tools: | ||
| tools = copy.deepcopy(tools) | ||
| tools[-1]["cache_control"] = cache_control | ||
| data["tools"] = tools |
There was a problem hiding this comment.
Fixed in f71799f. Replaced copy.deepcopy with a shallow list + dict spread: [*tools[:-1], {**tools[-1], cache_control: cache_control}]. Only the last element is modified so only that dict needs copying — no full deep clone of the entire tools list.
| tools: list[dict[str, Any]] | None = data.get("tools") | ||
| if tools: | ||
| tools = copy.deepcopy(tools) | ||
| tools[-1]["cache_control"] = cache_control |
There was a problem hiding this comment.
Fixed in f71799f. Injection now checks cache_control not in tools[-1] (and the same for system blocks) before writing — existing values are preserved as-is.
| ctrl = AnthropicCacheSettings.on(ttl="5m")._cache_control() | ||
| assert ctrl == {"type": "ephemeral"} | ||
|
|
||
|
|
||
| def test_cache_control_1h(): | ||
| ctrl = AnthropicCacheSettings.on(ttl="1h")._cache_control() | ||
| assert ctrl == {"type": "ephemeral", "ttl": 3600} |
There was a problem hiding this comment.
Fixed in f71799f. Replaced the two _cache_control() direct tests (test_cache_control_5m / test_cache_control_1h) with prepare_settings_dict() equivalents that validate the same TTL output via the public API. The private helper is now only exercised transitively.
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 92%
✓ Correctness
The PR adds Anthropic prompt caching support with a well-structured
AnthropicCacheSettingsmodel andprepare_settings_dictoverride. There is one correctness bug: the_cache_control()method emits"ttl": 3600(an integer) for the 1-hour TTL, but the Anthropic SDK'sCacheControlEphemeralParamtype definesttl: Literal["5m", "1h"]— it expects the string"1h", not an integer. This will cause a runtime API error or silent rejection when 1-hour caching is used. The corresponding tests also assert the wrong expected value, so they pass but do not catch the bug.
✓ Security Reliability
This PR adds Anthropic prompt caching support via a new
AnthropicCacheSettingsmodel andprepare_settings_dictoverride. The implementation is clean from a security and reliability standpoint: TTL values are constrained byLiteral["5m", "1h"], thecachefield is correctly excluded from API serialization (exclude=True), tools are deep-copied before mutation to prevent side effects, and edge cases (empty system string, missing tools) are handled properly. No secrets, injection risks, resource leaks, or unsafe deserialization were found.
✓ Test Coverage
The new AnthropicCacheSettings class and its integration into prepare_settings_dict are well-tested, covering factory methods, TTL variants, edge cases (empty system, no tools), mutation protection, and serialization exclusion. However, the PR widens the
systemfield type to acceptlist[dict[str, Any]]in addition tostr, yet there is no test verifying behavior whensystemis passed as a pre-structured list with caching enabled. The code silently skips cache injection in that case (line 158:isinstance(system, str)), and a test should document this intended behavior.
✗ Design Approach
I found one design-level issue. The new caching API broadens
systemto accept Anthropic-native block lists, but the caching implementation only injectscache_controlwhensystemis a plain string. That makes the newly supported structured-system form a silent no-op forcache_system, which is a contract gap in the core feature rather than a missing edge-case test.
Suggestions
- In python/semantic_kernel/connectors/ai/anthropic/prompt_execution_settings/anthropic_prompt_execution_settings.py:156-159, treat
systemas one normalized content-block sequence for serialization so caching works for both supported input shapes, rather than special-casing only raw strings.
Automated review by Vizhy's agents
… blocks
- _cache_control() now emits {"ttl":"1h"} string per CacheControlEphemeralParam
spec instead of integer 3600
- prepare_settings_dict() now injects cache_control on list[dict] system blocks
in addition to plain strings, closing the silent no-op design gap
- add test covering cache injection when system is pre-structured as list[dict]
- update 1h TTL test assertions to match corrected string value
|
@microsoft-github-policy-service agree |
|
Thanks for the thorough automated review — two valid issues were caught and both are addressed in the follow-up commit (da6de64): 1. TTL value fix (Correctness) 2. Pre-structured system blocks (Design Approach)
A test covering the |
…ow copy, no-overwrite - AnthropicCacheSettings now inherits KernelBaseSettings (consistent with rest of SDK; enables validate_assignment, populate_by_name) - Added env_prefix = "ANTHROPIC_CACHE_" so caching can be toggled via environment variables (ANTHROPIC_CACHE_ENABLED, ANTHROPIC_CACHE_TTL, etc.) - Renamed cache_system/cache_tools fields to include_system/include_tools (removes redundant "cache" prefix on fields inside a cache settings class) - Replaced copy.deepcopy with shallow list + dict spread — cheaper for large tool catalogs where caching is most beneficial - inject now skips if cache_control already present on last block — avoids silently clobbering a caller's explicit setting - Replaced two _cache_control() private-method tests with prepare_settings_dict() equivalents; added env-var tests (monkeypatch) and no-overwrite test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Adds opt-in prompt caching support to the Python Anthropic connector via a new
AnthropicCacheSettingsmodel and aprepare_settings_dict()override that injects Anthropiccache_controlblocks into the outbound request payload.What's included:
AnthropicCacheSettings— inheritsKernelBaseSettingswithenv_prefix = ANTHROPIC_CACHE_. Caching can be toggled via environment variables (ANTHROPIC_CACHE_ENABLED,ANTHROPIC_CACHE_INCLUDE_SYSTEM,ANTHROPIC_CACHE_INCLUDE_TOOLS,ANTHROPIC_CACHE_TTL) or set explicitly in code.AnthropicChatPromptExecutionSettings.cachefield (excluded from serialization) +prepare_settings_dict()override that injectscache_controlon the system message and/or the last tool definition.AnthropicCacheSettingsexported fromsemantic_kernel.connectors.ai.anthropic.samples/concepts/caching/anthropic_prompt_caching.py.Copilot review addressed (commit f71799f):
KernelBaseSettings(not bareBaseModel) — consistent with rest of SDK; enables env-var support.cache_system/cache_tools→include_system/include_tools— removes redundant cache prefix.copy.deepcopywith shallow list + dict spread — cheaper for large tool catalogs.cache_controlalready present — avoids clobbering caller's explicit setting.prepare_settings_dict()public surface (no longer calling private_cache_control()).Cache TTL:
5m(default): 1.25x write cost, 0.1x read cost. Breaks even after one cache hit.1h: 2x write cost, 0.1x read cost. Breaks even after two cache hits.Usage: