Skip to content

fix(agent): account for tool schemas and cap output tokens in context…#1551

Merged
zerob13 merged 1 commit intodevfrom
feat/agent-tool-context-budget
Apr 28, 2026
Merged

fix(agent): account for tool schemas and cap output tokens in context…#1551
zerob13 merged 1 commit intodevfrom
feat/agent-tool-context-budget

Conversation

@zhangmo8
Copy link
Copy Markdown
Collaborator

@zhangmo8 zhangmo8 commented Apr 27, 2026

… budget

  • Cap agent-session default maxTokens to 16 384 and never reserve more than half the context window for output
  • Reserve tool-definition token cost when building initial and resume context, preventing silent context overflow on tool-heavy sessions
  • Preflight-fit request messages to the effective budget before every provider-loop call, protecting the active tool-continuation tail
  • Resolve a per-request effective output cap from fitted message size and tool-schema cost rather than using the raw maxTokens value
  • Harden legacy function-call parsing: tolerate JSON wrapped in Markdown fences, a trailing unclosed <function_call> tag at end of stream, and extra finish-reason aliases from non-standard providers

Closes the first increment of docs/specs/agent-tool-context-budget.

Summary by CodeRabbit

Release Notes

  • Improvements

    • Agents now more efficiently manage context budgets during conversations with multiple tools, improving reliability
    • Enhanced tool call detection in streaming scenarios with better completion signal recognition
  • Bug Fixes

    • Tool call parser now gracefully handles incomplete tags and markdown-fenced JSON, reducing parsing failures

… budget

- Cap agent-session default maxTokens to 16 384 and never reserve more
  than half the context window for output
- Reserve tool-definition token cost when building initial and resume
  context, preventing silent context overflow on tool-heavy sessions
- Preflight-fit request messages to the effective budget before every
  provider-loop call, protecting the active tool-continuation tail
- Resolve a per-request effective output cap from fitted message size
  and tool-schema cost rather than using the raw maxTokens value
- Harden legacy function-call parsing: tolerate JSON wrapped in
  Markdown fences, a trailing unclosed <function_call> tag at end of
  stream, and extra finish-reason aliases from non-standard providers

Closes the first increment of docs/specs/agent-tool-context-budget.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

This PR implements an agent context budgeting system to manage token allocation in tool-heavy conversations. It adds specification documentation, introduces token-capping utilities, threads tool-reserve budgeting through request preparation and compaction, and enhances legacy function-call parsing to handle incomplete tags and markdown-fenced JSON.

Changes

Cohort / File(s) Summary
Specification & Documentation
docs/specs/agent-tool-context-budget/plan.md, docs/specs/agent-tool-context-budget/spec.md, docs/specs/agent-tool-context-budget/tasks.md
New specification files documenting the agent context budgeting approach, implementation plan with concrete steps, failure modes, follow-up work, and task checklist tracking completion status.
Core Token Budgeting Utilities
src/main/presenter/agentRuntimePresenter/contextBudget.ts
New module providing token-cap constants, functions to cap output tokens, estimate tool-reserve tokens, build request budgets, fit messages into context windows while protecting recent message patterns, and compute effective max output token budgets after accounting for message and tool costs.
Context & Compaction Integration
src/main/presenter/agentRuntimePresenter/contextBuilder.ts, src/main/presenter/agentRuntimePresenter/compactionService.ts
Extended to support extraReserveTokens parameter threading throughout context/resume preparation and compaction eligibility checks; added estimateToolDefinitionTokens helper to compute token usage from tool definitions.
Runtime Request Flow
src/main/presenter/agentRuntimePresenter/index.ts
Major changes integrating tool-aware token budgeting: caps generation maxTokens at request/default levels, estimates and threads tool-reserve tokens through compaction/context/resume calls, builds request context budgets, protects message tails during fitting, and resolves effective max tokens for the provider using fitted message costs plus tool reserves.
Tool Protocol Parsing
src/main/presenter/llmProviderPresenter/aiSdk/streamAdapter.ts, src/main/presenter/llmProviderPresenter/aiSdk/toolProtocol.ts
Enhanced legacy function-call handling: mapFinishReason recognizes multiple tool-related finish reason variants; parseLegacyFunctionCalls extracts multiple candidates using matchAll, normalizes payloads by unwrapping markdown fences around JSON, and handles unterminated closing tags during stream end flushing.

Sequence Diagram

sequenceDiagram
    participant Client as Client Request
    participant Runtime as AgentRuntimePresenter
    participant Budget as ContextBudget
    participant Context as ContextBuilder
    participant Compact as CompactionService
    participant Provider as LLM Provider

    Client->>Runtime: Request with tools
    activate Runtime
    
    Runtime->>Budget: estimateToolReserveTokens(tools)
    Budget-->>Runtime: toolReserveTokens
    
    Runtime->>Runtime: capAgentRequestMaxTokens(maxTokens)
    
    loop Each iteration
        Runtime->>Context: buildContext(..., extraReserveTokens)
        Context->>Budget: Tool reserves subtracted<br/>from available budget
        Context-->>Runtime: Available tokens reduced
        
        Runtime->>Budget: buildRequestContextBudget(maxTokens, contextLength, tools)
        Budget-->>Runtime: RequestContextBudget
        
        Runtime->>Budget: fitRequestMessagesToContextWindow(messages, reserves)
        Budget->>Budget: Fit messages while protecting<br/>recent message patterns
        Budget-->>Runtime: Fitted messages
        
        Runtime->>Budget: resolveEffectiveRequestMaxTokens(fitted messages, tool reserves)
        Budget-->>Runtime: Effective maxTokens for provider
        
        Runtime->>Compact: prepareForNextUserTurn(..., extraReserveTokens)
        Compact->>Compact: Check eligibility using<br/>reduced budget
        Compact-->>Runtime: Compaction decision
        
        Runtime->>Provider: Send request with effective maxTokens
        Provider-->>Runtime: Response (with tool/finish reason)
    end
    
    deactivate Runtime
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • fix: context trim remove tools first #1196 — Refines context-trimming logic to prefer removal of tool-call blocks first when fitting messages, directly complementing this PR's tool-reserve budgeting and message-tail protection.
  • feat(llm): migrate runtime to ai sdk #1449 — Modifies legacy function-call parsing and stream-adapter tool finish-reason mapping within toolProtocol and streamAdapter, with overlapping code changes to those same files in this PR.
  • fix(agent): guard large tool outputs #1333 — Adds tool-output validation guards and threads context/token parameters through tool execution, aligning with this PR's tool-aware context budgeting architecture.

Poem

🐰 A rabbit hops through token fields,
Measuring what the toolbox yields,
With reserves and caps so carefully planned,
Each message fits where it should stand,
Now tools and context live as one! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: accounting for tool schemas and capping output tokens in context budget, which are the core objectives addressed throughout the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-tool-context-budget

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/main/presenter/agentRuntimePresenter/contextBudget.ts (1)

27-30: Upstream validation prevents non-finite maxTokens from reaching this function, but defensive fallback is recommended.

maxTokens is always validated through sanitizeGenerationSettings() before reaching capAgentRequestMaxTokens(), which uses parseFiniteNumericValue() to explicitly reject Infinity and NaN. However, the function itself has no guard against non-finite inputs, making it fragile if validation is bypassed or if callers are added in the future. Falling back to AGENT_REQUEST_MAX_OUTPUT_TOKENS_CAP instead of AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS is a more forgiving default and improves defensive design.

♻️ Suggested fallback
-  const normalizedMaxTokens = Number.isFinite(maxTokens)
-    ? Math.floor(maxTokens)
-    : AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS
-  const requested = Math.max(AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS, normalizedMaxTokens)
+  const normalizedMaxTokens = Number.isFinite(maxTokens)
+    ? Math.floor(maxTokens)
+    : AGENT_REQUEST_MAX_OUTPUT_TOKENS_CAP
+  const requested = Math.max(AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS, normalizedMaxTokens)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts` around lines 27 -
30, capAgentRequestMaxTokens lacks a defensive guard for non-finite maxTokens;
update capAgentRequestMaxTokens to check Number.isFinite(maxTokens) and when
non-finite fall back to AGENT_REQUEST_MAX_OUTPUT_TOKENS_CAP (instead of
AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS), and keep existing Math.floor/Math.max logic
for finite values; reference sanitizeGenerationSettings and
parseFiniteNumericValue as the upstream validators but ensure
capAgentRequestMaxTokens protects itself against bypassed callers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts`:
- Around line 110-124: The code only protects the last tool message when
trailing tool messages don't have a preceding assistant with non-empty
tool_calls; update the logic around toolTailStart/messages/tool_calls so that if
toolTailStart < messages.length - 1 but the preceding message is not an
assistant with tool_calls, you still return the full trailing tool-run length
(e.g., compute tailLen = messages.length - (toolTailStart + 1) and return
tailLen) instead of returning 1; change the conditional/return that currently
checks messages[toolTailStart]?.role === 'assistant' and falls through to return
1 so orphan trailing tool messages are preserved as a block.

In `@src/main/presenter/agentRuntimePresenter/index.ts`:
- Around line 1649-1663: The steer-injection truncation happens before your
protected-tail calculation, so ensure steer injection does not drop assistant
tool_calls/tool results: modify injectSteerInputsIntoRequest() so it performs a
pure splice (insert steer messages without calling
fitMessagesToContextWindow()), or alternatively update the earlier call to
fitMessagesToContextWindow() to accept and preserve a minimumProtectedTailCount
(the same logic used by fitRequestMessagesToContextWindow()) so
claimedSteerBatch and assistant tool_call/tool_result pairs are reserved; adjust
callers so buildRequestContextBudget(), claimedSteerBatch,
protectedSteerTailCount, fitRequestMessagesToContextWindow(), and
injectSteerInputsIntoRequest() coordinate on the same protected-tail contract.

In `@src/main/presenter/llmProviderPresenter/aiSdk/toolProtocol.ts`:
- Around line 128-136: The function normalizeLegacyFunctionCallContent is
improperly stripping all "<function_call>" tags from the payload which can
corrupt valid JSON like {"text":"<function_call>"}; remove the global replace
that removes those tags and rely on the existing fenced code block extraction
and any prior wrapper-slicing logic (i.e., delete the line using
content.replace(/<\/?function_call>/g, '') and operate on the original content
variable, keeping the fenced block extraction and trimming logic intact in
normalizeLegacyFunctionCallContent).

---

Nitpick comments:
In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts`:
- Around line 27-30: capAgentRequestMaxTokens lacks a defensive guard for
non-finite maxTokens; update capAgentRequestMaxTokens to check
Number.isFinite(maxTokens) and when non-finite fall back to
AGENT_REQUEST_MAX_OUTPUT_TOKENS_CAP (instead of
AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS), and keep existing Math.floor/Math.max logic
for finite values; reference sanitizeGenerationSettings and
parseFiniteNumericValue as the upstream validators but ensure
capAgentRequestMaxTokens protects itself against bypassed callers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18762c62-b2d9-4d62-b12a-6fdeb8ad942a

📥 Commits

Reviewing files that changed from the base of the PR and between 8b43cf6 and 011b95a.

📒 Files selected for processing (9)
  • docs/specs/agent-tool-context-budget/plan.md
  • docs/specs/agent-tool-context-budget/spec.md
  • docs/specs/agent-tool-context-budget/tasks.md
  • src/main/presenter/agentRuntimePresenter/compactionService.ts
  • src/main/presenter/agentRuntimePresenter/contextBudget.ts
  • src/main/presenter/agentRuntimePresenter/contextBuilder.ts
  • src/main/presenter/agentRuntimePresenter/index.ts
  • src/main/presenter/llmProviderPresenter/aiSdk/streamAdapter.ts
  • src/main/presenter/llmProviderPresenter/aiSdk/toolProtocol.ts

Comment on lines +110 to +124
let toolTailStart = messages.length - 1
while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') {
toolTailStart -= 1
}

if (
toolTailStart < messages.length - 1 &&
messages[toolTailStart]?.role === 'assistant' &&
Array.isArray(messages[toolTailStart]?.tool_calls) &&
messages[toolTailStart]?.tool_calls?.length
) {
return messages.length - toolTailStart
}

return 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Orphan trailing tool messages are only partially protected.

If the message tail is a run of tool messages whose preceding message is not an assistant with non-empty tool_calls (e.g., the assistant message was already trimmed in a prior round, or the history is otherwise malformed), the walk-back exits without satisfying the condition on lines 115–120 and the function falls through to return 1. Only the very last tool message is then protected, and the rest of the tool chain becomes eligible for trimming — which can produce an OpenAI-style "tool message must follow a tool_calls assistant" rejection downstream.

If you want to be defensive here, return the full trailing tool run even without a matching assistant so the chain is preserved (or have the caller normalize/drop orphan tools earlier).

♻️ Optional defensive tweak
   let toolTailStart = messages.length - 1
   while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') {
     toolTailStart -= 1
   }

   if (
     toolTailStart < messages.length - 1 &&
     messages[toolTailStart]?.role === 'assistant' &&
     Array.isArray(messages[toolTailStart]?.tool_calls) &&
     messages[toolTailStart]?.tool_calls?.length
   ) {
     return messages.length - toolTailStart
   }

+  // Orphan trailing tool messages: still protect the whole run so providers
+  // that require tool_calls/tool pairing don't reject the request after trimming.
+  if (toolTailStart < messages.length - 1) {
+    return messages.length - 1 - toolTailStart
+  }
+
   return 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let toolTailStart = messages.length - 1
while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') {
toolTailStart -= 1
}
if (
toolTailStart < messages.length - 1 &&
messages[toolTailStart]?.role === 'assistant' &&
Array.isArray(messages[toolTailStart]?.tool_calls) &&
messages[toolTailStart]?.tool_calls?.length
) {
return messages.length - toolTailStart
}
return 1
let toolTailStart = messages.length - 1
while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') {
toolTailStart -= 1
}
if (
toolTailStart < messages.length - 1 &&
messages[toolTailStart]?.role === 'assistant' &&
Array.isArray(messages[toolTailStart]?.tool_calls) &&
messages[toolTailStart]?.tool_calls?.length
) {
return messages.length - toolTailStart
}
// Orphan trailing tool messages: still protect the whole run so providers
// that require tool_calls/tool pairing don't reject the request after trimming.
if (toolTailStart < messages.length - 1) {
return messages.length - toolTailStart
}
return 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts` around lines 110 -
124, The code only protects the last tool message when trailing tool messages
don't have a preceding assistant with non-empty tool_calls; update the logic
around toolTailStart/messages/tool_calls so that if toolTailStart <
messages.length - 1 but the preceding message is not an assistant with
tool_calls, you still return the full trailing tool-run length (e.g., compute
tailLen = messages.length - (toolTailStart + 1) and return tailLen) instead of
returning 1; change the conditional/return that currently checks
messages[toolTailStart]?.role === 'assistant' and falls through to return 1 so
orphan trailing tool messages are preserved as a block.

Comment on lines +1649 to +1663
const requestBudget = buildRequestContextBudget(
requestMaxTokens,
requestModelConfig.contextLength,
requestTools
)
const protectedSteerTailCount =
claimedSteerBatch.length > 0
? claimedSteerBatch.length + (requestMessages.at(-1)?.role === 'user' ? 1 : 0)
: 0
const fittedMessages = fitRequestMessagesToContextWindow({
messages: injectedMessages,
contextLength: requestModelConfig.contextLength,
reserveTokens: requestBudget.totalReserveTokens,
minimumProtectedTailCount: protectedSteerTailCount
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The protected-tail fit happens too late to save tool-continuation turns.

This new pass runs after injectSteerInputsIntoRequest() has already called fitMessagesToContextWindow(). When a steer message is injected during a tool-continuation loop, that earlier truncation can evict the assistant tool_calls message and its matching tool result before fitRequestMessagesToContextWindow() runs, so the new protection logic never sees them.

Make steer injection a pure splice and let this budget-aware fit be the only truncation step in the loop, or teach the earlier fit about the protected tool tail too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/presenter/agentRuntimePresenter/index.ts` around lines 1649 - 1663,
The steer-injection truncation happens before your protected-tail calculation,
so ensure steer injection does not drop assistant tool_calls/tool results:
modify injectSteerInputsIntoRequest() so it performs a pure splice (insert steer
messages without calling fitMessagesToContextWindow()), or alternatively update
the earlier call to fitMessagesToContextWindow() to accept and preserve a
minimumProtectedTailCount (the same logic used by
fitRequestMessagesToContextWindow()) so claimedSteerBatch and assistant
tool_call/tool_result pairs are reserved; adjust callers so
buildRequestContextBudget(), claimedSteerBatch, protectedSteerTailCount,
fitRequestMessagesToContextWindow(), and injectSteerInputsIntoRequest()
coordinate on the same protected-tail contract.

Comment on lines +128 to +136
function normalizeLegacyFunctionCallContent(content: string): string {
let normalized = content.replace(/<\/?function_call>/g, '').trim()

const fenced = normalized.match(/^```(?:json|JSON)?\s*([\s\S]*?)\s*```$/)
if (fenced?.[1]) {
normalized = fenced[1].trim()
}

return normalized
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't strip <function_call> substrings out of the JSON payload.

At this point the wrapper is already removed by matchAll(...)[1] and by the trailing-open-tag slice, so the global replace can only mutate payload data. A valid argument like {"text":"<function_call>"} will be silently rewritten before parsing.

Suggested fix
 function normalizeLegacyFunctionCallContent(content: string): string {
-  let normalized = content.replace(/<\/?function_call>/g, '').trim()
+  let normalized = content.trim()
 
   const fenced = normalized.match(/^```(?:json|JSON)?\s*([\s\S]*?)\s*```$/)
   if (fenced?.[1]) {
     normalized = fenced[1].trim()
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function normalizeLegacyFunctionCallContent(content: string): string {
let normalized = content.replace(/<\/?function_call>/g, '').trim()
const fenced = normalized.match(/^```(?:json|JSON)?\s*([\s\S]*?)\s*```$/)
if (fenced?.[1]) {
normalized = fenced[1].trim()
}
return normalized
function normalizeLegacyFunctionCallContent(content: string): string {
let normalized = content.trim()
const fenced = normalized.match(/^
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/presenter/llmProviderPresenter/aiSdk/toolProtocol.ts` around lines
128 - 136, The function normalizeLegacyFunctionCallContent is improperly
stripping all "<function_call>" tags from the payload which can corrupt valid
JSON like {"text":"<function_call>"}; remove the global replace that removes
those tags and rely on the existing fenced code block extraction and any prior
wrapper-slicing logic (i.e., delete the line using
content.replace(/<\/?function_call>/g, '') and operate on the original content
variable, keeping the fenced block extraction and trimming logic intact in
normalizeLegacyFunctionCallContent).

@zerob13 zerob13 merged commit d770406 into dev Apr 28, 2026
3 checks passed
@zhangmo8 zhangmo8 deleted the feat/agent-tool-context-budget branch April 28, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants