Skip to content

Python: Add prompt caching support to Anthropic connector#13947

Open
Vizhy wants to merge 3 commits intomicrosoft:mainfrom
Vizhy:feature/connectors-ai-anthropic-cache
Open

Python: Add prompt caching support to Anthropic connector#13947
Vizhy wants to merge 3 commits intomicrosoft:mainfrom
Vizhy:feature/connectors-ai-anthropic-cache

Conversation

@Vizhy
Copy link
Copy Markdown

@Vizhy Vizhy commented May 4, 2026

Summary

Adds opt-in prompt caching support to the Python Anthropic connector via a new AnthropicCacheSettings model and a prepare_settings_dict() override that injects Anthropic cache_control blocks into the outbound request payload.

What's included:

  • AnthropicCacheSettings — inherits KernelBaseSettings with env_prefix = ANTHROPIC_CACHE_. Caching can be toggled via environment variables (ANTHROPIC_CACHE_ENABLED, ANTHROPIC_CACHE_INCLUDE_SYSTEM, ANTHROPIC_CACHE_INCLUDE_TOOLS, ANTHROPIC_CACHE_TTL) or set explicitly in code.
  • AnthropicChatPromptExecutionSettings.cache field (excluded from serialization) + prepare_settings_dict() override that injects cache_control on the system message and/or the last tool definition.
  • AnthropicCacheSettings exported from semantic_kernel.connectors.ai.anthropic.
  • Unit tests covering all classmethods, both TTLs, all injection combinations, edge cases, env-var loading, and the no-overwrite guard.
  • Sample: samples/concepts/caching/anthropic_prompt_caching.py.

Copilot review addressed (commit f71799f):

  • Inherits KernelBaseSettings (not bare BaseModel) — consistent with rest of SDK; enables env-var support.
  • Renamed cache_system/cache_toolsinclude_system/include_tools — removes redundant cache prefix.
  • Replaced copy.deepcopy with shallow list + dict spread — cheaper for large tool catalogs.
  • Injection skips if cache_control already present — avoids clobbering caller's explicit setting.
  • TTL tests moved to prepare_settings_dict() public surface (no longer calling private _cache_control()).

Cache TTL:

  • 5m (default): 1.25x write cost, 0.1x read cost. Breaks even after one cache hit.
  • 1h: 2x write cost, 0.1x read cost. Breaks even after two cache hits.

Usage:

from semantic_kernel.connectors.ai.anthropic import AnthropicCacheSettings

settings = AnthropicChatPromptExecutionSettings(
    cache=AnthropicCacheSettings.on(ttl=1h),
)
# or via env: ANTHROPIC_CACHE_ENABLED=true ANTHROPIC_CACHE_TTL=1h

Adds AnthropicCacheSettings and a `cache` field on AnthropicChatPromptExecutionSettings
to enable opt-in prompt caching via the Anthropic cache_control API.

When enabled, prepare_settings_dict() injects cache_control blocks on the system
message and the last tool definition before the request is sent. No changes to
AnthropicChatCompletion — caching is fully contained in the settings layer.

Off by default; opt in with cache=AnthropicCacheSettings.on().
Convenience constructors: .on() .off() .system() .tools() .short() .long()
TTL: "5m" -> {"type":"ephemeral"}, "1h" -> {"type":"ephemeral","ttl":3600}

Includes 16 new unit tests and a usage sample at
samples/concepts/caching/anthropic_prompt_caching.py.
Copilot AI review requested due to automatic review settings May 4, 2026 11:57
@Vizhy Vizhy requested a review from a team as a code owner May 4, 2026 11:57
@moonbox3 moonbox3 added the python Pull requests for the Python Semantic Kernel label May 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds opt-in prompt caching support to the Python Anthropic connector by introducing a cache settings model and injecting Anthropic cache_control blocks into the serialized request payload (system content block and/or the last tool definition).

Changes:

  • Introduces AnthropicCacheSettings and exposes it as a public API via semantic_kernel.connectors.ai.anthropic.
  • Extends AnthropicChatPromptExecutionSettings with an excluded cache field and injects cache_control during prepare_settings_dict().
  • Adds unit tests for caching settings/injection behavior and a new sample demonstrating prompt caching usage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
python/tests/unit/connectors/ai/anthropic/test_anthropic_request_settings.py Adds unit tests covering cache settings constructors and prepare_settings_dict() injection behavior.
python/semantic_kernel/connectors/ai/anthropic/prompt_execution_settings/anthropic_prompt_execution_settings.py Adds AnthropicCacheSettings, adds cache to execution settings (excluded from serialization), and injects cache_control into outbound payload.
python/semantic_kernel/connectors/ai/anthropic/init.py Exports AnthropicCacheSettings as part of the Anthropic connector public surface.
python/samples/concepts/caching/anthropic_prompt_caching.py Adds a runnable sample demonstrating multi-turn Anthropic prompt caching.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +16 to +20
class AnthropicCacheSettings(BaseModel):
"""Configuration for Anthropic prompt caching.

Controls which parts of the request receive cache_control injection.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f71799f. AnthropicCacheSettings now inherits KernelBaseSettings with env_prefix = ANTHROPIC_CACHE_, consistent with the rest of the SDK (validate_assignment, populate_by_name, arbitrary_types_allowed). This also unlocks env-var control out of the box: ANTHROPIC_CACHE_ENABLED=true, ANTHROPIC_CACHE_INCLUDE_SYSTEM=true, ANTHROPIC_CACHE_TTL=1h, etc. Took the opportunity to rename cache_system/cache_tools to include_system/include_tools to remove the redundant cache prefix on fields inside a cache settings class.

Comment on lines +161 to +166
if self.cache.cache_tools:
tools: list[dict[str, Any]] | None = data.get("tools")
if tools:
tools = copy.deepcopy(tools)
tools[-1]["cache_control"] = cache_control
data["tools"] = tools
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f71799f. Replaced copy.deepcopy with a shallow list + dict spread: [*tools[:-1], {**tools[-1], cache_control: cache_control}]. Only the last element is modified so only that dict needs copying — no full deep clone of the entire tools list.

tools: list[dict[str, Any]] | None = data.get("tools")
if tools:
tools = copy.deepcopy(tools)
tools[-1]["cache_control"] = cache_control
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f71799f. Injection now checks cache_control not in tools[-1] (and the same for system blocks) before writing — existing values are preserved as-is.

Comment on lines +178 to +184
ctrl = AnthropicCacheSettings.on(ttl="5m")._cache_control()
assert ctrl == {"type": "ephemeral"}


def test_cache_control_1h():
ctrl = AnthropicCacheSettings.on(ttl="1h")._cache_control()
assert ctrl == {"type": "ephemeral", "ttl": 3600}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f71799f. Replaced the two _cache_control() direct tests (test_cache_control_5m / test_cache_control_1h) with prepare_settings_dict() equivalents that validate the same TTL output via the public API. The private helper is now only exercised transitively.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 92%

✓ Correctness

The PR adds Anthropic prompt caching support with a well-structured AnthropicCacheSettings model and prepare_settings_dict override. There is one correctness bug: the _cache_control() method emits "ttl": 3600 (an integer) for the 1-hour TTL, but the Anthropic SDK's CacheControlEphemeralParam type defines ttl: Literal["5m", "1h"] — it expects the string "1h", not an integer. This will cause a runtime API error or silent rejection when 1-hour caching is used. The corresponding tests also assert the wrong expected value, so they pass but do not catch the bug.

✓ Security Reliability

This PR adds Anthropic prompt caching support via a new AnthropicCacheSettings model and prepare_settings_dict override. The implementation is clean from a security and reliability standpoint: TTL values are constrained by Literal["5m", "1h"], the cache field is correctly excluded from API serialization (exclude=True), tools are deep-copied before mutation to prevent side effects, and edge cases (empty system string, missing tools) are handled properly. No secrets, injection risks, resource leaks, or unsafe deserialization were found.

✓ Test Coverage

The new AnthropicCacheSettings class and its integration into prepare_settings_dict are well-tested, covering factory methods, TTL variants, edge cases (empty system, no tools), mutation protection, and serialization exclusion. However, the PR widens the system field type to accept list[dict[str, Any]] in addition to str, yet there is no test verifying behavior when system is passed as a pre-structured list with caching enabled. The code silently skips cache injection in that case (line 158: isinstance(system, str)), and a test should document this intended behavior.

✗ Design Approach

I found one design-level issue. The new caching API broadens system to accept Anthropic-native block lists, but the caching implementation only injects cache_control when system is a plain string. That makes the newly supported structured-system form a silent no-op for cache_system, which is a contract gap in the core feature rather than a missing edge-case test.

Suggestions

  • In python/semantic_kernel/connectors/ai/anthropic/prompt_execution_settings/anthropic_prompt_execution_settings.py:156-159, treat system as one normalized content-block sequence for serialization so caching works for both supported input shapes, rather than special-casing only raw strings.

Automated review by Vizhy's agents

… blocks

- _cache_control() now emits {"ttl":"1h"} string per CacheControlEphemeralParam
  spec instead of integer 3600
- prepare_settings_dict() now injects cache_control on list[dict] system blocks
  in addition to plain strings, closing the silent no-op design gap
- add test covering cache injection when system is pre-structured as list[dict]
- update 1h TTL test assertions to match corrected string value
@Vizhy
Copy link
Copy Markdown
Author

Vizhy commented May 5, 2026

@microsoft-github-policy-service agree

@Vizhy
Copy link
Copy Markdown
Author

Vizhy commented May 5, 2026

Thanks for the thorough automated review — two valid issues were caught and both are addressed in the follow-up commit (da6de64):

1. TTL value fix (Correctness)
_cache_control() now emits {"ttl": "1h"} (string) instead of {"ttl": 3600} (integer), correctly matching the CacheControlEphemeralParam SDK type definition. The corresponding test assertions have been updated to match.

2. Pre-structured system blocks (Design Approach)
prepare_settings_dict() now handles both input shapes for system:

  • str → wrapped into a single content block with cache_control
  • list[dict]cache_control injected on the last block (same pattern used for tools), with a copy.deepcopy to avoid mutation

A test covering the list[dict] case has been added to document this behaviour explicitly.

…ow copy, no-overwrite

- AnthropicCacheSettings now inherits KernelBaseSettings (consistent with
  rest of SDK; enables validate_assignment, populate_by_name)
- Added env_prefix = "ANTHROPIC_CACHE_" so caching can be toggled via
  environment variables (ANTHROPIC_CACHE_ENABLED, ANTHROPIC_CACHE_TTL, etc.)
- Renamed cache_system/cache_tools fields to include_system/include_tools
  (removes redundant "cache" prefix on fields inside a cache settings class)
- Replaced copy.deepcopy with shallow list + dict spread — cheaper for large
  tool catalogs where caching is most beneficial
- inject now skips if cache_control already present on last block — avoids
  silently clobbering a caller's explicit setting
- Replaced two _cache_control() private-method tests with prepare_settings_dict()
  equivalents; added env-var tests (monkeypatch) and no-overwrite test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Pull requests for the Python Semantic Kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants