Skip to content

feat(conversation_manager): add token-aware context management#2147

Open
srbhsrkr wants to merge 3 commits intostrands-agents:mainfrom
srbhsrkr:feat/token-aware-context-management
Open

feat(conversation_manager): add token-aware context management#2147
srbhsrkr wants to merge 3 commits intostrands-agents:mainfrom
srbhsrkr:feat/token-aware-context-management

Conversation

@srbhsrkr
Copy link
Copy Markdown

@srbhsrkr srbhsrkr commented Apr 17, 2026

Summary

Closes #2146
Addresses #1294, #555, #298
Related to #1295, #1678, #1296, #2048

How this relates to existing issues

Issue Status
#1294 — Token Estimation API Addressed: estimate_tokens() + TokenCounter type on conversation managers. Complementary to a future Model.estimate_tokens() — ours is the lightweight heuristic, theirs would be model-specific.
#555 — Proactive Context Compression Addressed: max_context_tokens + BeforeModelCallEvent hook triggers reduction before ContextWindowOverflowException.
#298 — In-event-loop cycle context management Addressed: per_turn + compactable_after_messages + hook-based token budget checks enable within-cycle management.
#1295 — Context Limit Property on Model Complementary: if model.context_limit ships, it could auto-configure max_context_tokens.
#1678, #1296 — Large Content Aliasing/Externalization Related: micro-compaction replaces stale results with stubs (different strategy, same goal).
#2048 — Expose reduce_context() as Hook Event Related: our hook calls apply_management()reduce_context(), but doesn't fire a dedicated event.

Design Notes

  • _model_call_count only increments when per_turn is enabled (preserves existing per-turn semantics)
  • Summarizing manager's apply_management is an intentional no-op — proactive summarization runs exclusively via hook to prevent double-summarization with the agent's finally block
  • _last_compacted_index tracks compaction progress to avoid re-scanning already-processed messages
  • Hook registration in summarizing manager is guarded by max_context_tokens is not None

Test plan

  • 55 new tests in test_token_aware_context_management.py covering:
    • Token estimation for all block types (text, toolResult, toolUse, image, document, video, cachePoint, guardContent)
    • Token budget enforcement via apply_management and BeforeModelCallEvent hook
    • Micro-compaction: replace/preserve/skip-already-processed/image-blocks
    • Parameter validation (max_context_tokens, compactable_after_messages)
    • _model_call_count semantics regression (not incremented when per_turn=False)
    • Integration: hook → apply_managementreduce_context full pipeline
    • _last_compacted_index adjustment after message trimming
  • All 73 existing conversation manager tests pass (no regressions)
  • Lint clean (ruff check), type clean (mypy)

Add token-budget awareness to SlidingWindowConversationManager and
SummarizingConversationManager so context reduction can be driven by
estimated token counts, not just message counts.

Key changes:
- New `_token_utils.py` with `estimate_tokens` (chars/4 heuristic) and
  `TokenCounter` type alias, handling all ContentBlock types (text,
  toolResult, toolUse, image, document, video, reasoningContent, etc.)
- `SlidingWindowConversationManager`: new `max_context_tokens`,
  `token_counter`, and `compactable_after_messages` parameters; proactive
  token-budget enforcement via BeforeModelCallEvent hook; micro-compaction
  of stale tool results with `_last_compacted_index` tracking
- `SummarizingConversationManager`: new `max_context_tokens`,
  `proactive_threshold`, and `token_counter` parameters; proactive
  summarization via hook when token threshold exceeded
- Always uses heuristic estimator (never stale model-reported
  `latest_context_size`) to prevent over-reduction spirals
- 55 new tests covering token estimation, budget enforcement,
  micro-compaction, parameter validation, integration flows, and
  _model_call_count semantics regression

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
srbhsrkr and others added 2 commits April 21, 2026 10:34
Add read_only, destructive, and requires_confirmation boolean parameters to
the @tool decorator and corresponding properties on AgentTool, ToolSpec, and
MCPAgentTool. This enables hook-based permission policies to reason about
tool safety without hardcoding tool-name mappings.

Closes strands-agents#2154

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mmarization

- Loop reduce_context in apply_management until token budget is satisfied
  or no further progress can be made, fixing cases where a single
  reduce_context call was insufficient for large messages under window_size
- Prevent double apply_management calls in _on_before_model_call by
  unifying token-budget and per_turn triggers into a single dispatch
- Fix _micro_compact image reclaimed accounting to use IMAGE_CHAR_ESTIMATE
  instead of hardcoded 200, and subtract stub length from reclaimed_chars
- Add _do_proactive_summarization guard in SummarizingConversationManager
  to prevent hook and apply_management from both triggering summarization
  in the same agent cycle
- Make SummarizingConversationManager.apply_management honor the token
  budget contract instead of being a silent no-op
- Rename _IMAGE_CHAR_ESTIMATE to IMAGE_CHAR_ESTIMATE (cross-module usage)
- Use len(messages) as loop bound instead of fixed constant

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Token-aware context management for conversation managers

1 participant