fix(agent): account for tool schemas and cap output tokens in context…#1551
fix(agent): account for tool schemas and cap output tokens in context…#1551
Conversation
… budget - Cap agent-session default maxTokens to 16 384 and never reserve more than half the context window for output - Reserve tool-definition token cost when building initial and resume context, preventing silent context overflow on tool-heavy sessions - Preflight-fit request messages to the effective budget before every provider-loop call, protecting the active tool-continuation tail - Resolve a per-request effective output cap from fitted message size and tool-schema cost rather than using the raw maxTokens value - Harden legacy function-call parsing: tolerate JSON wrapped in Markdown fences, a trailing unclosed <function_call> tag at end of stream, and extra finish-reason aliases from non-standard providers Closes the first increment of docs/specs/agent-tool-context-budget.
📝 WalkthroughWalkthroughThis PR implements an agent context budgeting system to manage token allocation in tool-heavy conversations. It adds specification documentation, introduces token-capping utilities, threads tool-reserve budgeting through request preparation and compaction, and enhances legacy function-call parsing to handle incomplete tags and markdown-fenced JSON. Changes
Sequence DiagramsequenceDiagram
participant Client as Client Request
participant Runtime as AgentRuntimePresenter
participant Budget as ContextBudget
participant Context as ContextBuilder
participant Compact as CompactionService
participant Provider as LLM Provider
Client->>Runtime: Request with tools
activate Runtime
Runtime->>Budget: estimateToolReserveTokens(tools)
Budget-->>Runtime: toolReserveTokens
Runtime->>Runtime: capAgentRequestMaxTokens(maxTokens)
loop Each iteration
Runtime->>Context: buildContext(..., extraReserveTokens)
Context->>Budget: Tool reserves subtracted<br/>from available budget
Context-->>Runtime: Available tokens reduced
Runtime->>Budget: buildRequestContextBudget(maxTokens, contextLength, tools)
Budget-->>Runtime: RequestContextBudget
Runtime->>Budget: fitRequestMessagesToContextWindow(messages, reserves)
Budget->>Budget: Fit messages while protecting<br/>recent message patterns
Budget-->>Runtime: Fitted messages
Runtime->>Budget: resolveEffectiveRequestMaxTokens(fitted messages, tool reserves)
Budget-->>Runtime: Effective maxTokens for provider
Runtime->>Compact: prepareForNextUserTurn(..., extraReserveTokens)
Compact->>Compact: Check eligibility using<br/>reduced budget
Compact-->>Runtime: Compaction decision
Runtime->>Provider: Send request with effective maxTokens
Provider-->>Runtime: Response (with tool/finish reason)
end
deactivate Runtime
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/main/presenter/agentRuntimePresenter/contextBudget.ts (1)
27-30: Upstream validation prevents non-finitemaxTokensfrom reaching this function, but defensive fallback is recommended.
maxTokensis always validated throughsanitizeGenerationSettings()before reachingcapAgentRequestMaxTokens(), which usesparseFiniteNumericValue()to explicitly rejectInfinityandNaN. However, the function itself has no guard against non-finite inputs, making it fragile if validation is bypassed or if callers are added in the future. Falling back toAGENT_REQUEST_MAX_OUTPUT_TOKENS_CAPinstead ofAGENT_MIN_EFFECTIVE_OUTPUT_TOKENSis a more forgiving default and improves defensive design.♻️ Suggested fallback
- const normalizedMaxTokens = Number.isFinite(maxTokens) - ? Math.floor(maxTokens) - : AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS - const requested = Math.max(AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS, normalizedMaxTokens) + const normalizedMaxTokens = Number.isFinite(maxTokens) + ? Math.floor(maxTokens) + : AGENT_REQUEST_MAX_OUTPUT_TOKENS_CAP + const requested = Math.max(AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS, normalizedMaxTokens)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts` around lines 27 - 30, capAgentRequestMaxTokens lacks a defensive guard for non-finite maxTokens; update capAgentRequestMaxTokens to check Number.isFinite(maxTokens) and when non-finite fall back to AGENT_REQUEST_MAX_OUTPUT_TOKENS_CAP (instead of AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS), and keep existing Math.floor/Math.max logic for finite values; reference sanitizeGenerationSettings and parseFiniteNumericValue as the upstream validators but ensure capAgentRequestMaxTokens protects itself against bypassed callers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts`:
- Around line 110-124: The code only protects the last tool message when
trailing tool messages don't have a preceding assistant with non-empty
tool_calls; update the logic around toolTailStart/messages/tool_calls so that if
toolTailStart < messages.length - 1 but the preceding message is not an
assistant with tool_calls, you still return the full trailing tool-run length
(e.g., compute tailLen = messages.length - (toolTailStart + 1) and return
tailLen) instead of returning 1; change the conditional/return that currently
checks messages[toolTailStart]?.role === 'assistant' and falls through to return
1 so orphan trailing tool messages are preserved as a block.
In `@src/main/presenter/agentRuntimePresenter/index.ts`:
- Around line 1649-1663: The steer-injection truncation happens before your
protected-tail calculation, so ensure steer injection does not drop assistant
tool_calls/tool results: modify injectSteerInputsIntoRequest() so it performs a
pure splice (insert steer messages without calling
fitMessagesToContextWindow()), or alternatively update the earlier call to
fitMessagesToContextWindow() to accept and preserve a minimumProtectedTailCount
(the same logic used by fitRequestMessagesToContextWindow()) so
claimedSteerBatch and assistant tool_call/tool_result pairs are reserved; adjust
callers so buildRequestContextBudget(), claimedSteerBatch,
protectedSteerTailCount, fitRequestMessagesToContextWindow(), and
injectSteerInputsIntoRequest() coordinate on the same protected-tail contract.
In `@src/main/presenter/llmProviderPresenter/aiSdk/toolProtocol.ts`:
- Around line 128-136: The function normalizeLegacyFunctionCallContent is
improperly stripping all "<function_call>" tags from the payload which can
corrupt valid JSON like {"text":"<function_call>"}; remove the global replace
that removes those tags and rely on the existing fenced code block extraction
and any prior wrapper-slicing logic (i.e., delete the line using
content.replace(/<\/?function_call>/g, '') and operate on the original content
variable, keeping the fenced block extraction and trimming logic intact in
normalizeLegacyFunctionCallContent).
---
Nitpick comments:
In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts`:
- Around line 27-30: capAgentRequestMaxTokens lacks a defensive guard for
non-finite maxTokens; update capAgentRequestMaxTokens to check
Number.isFinite(maxTokens) and when non-finite fall back to
AGENT_REQUEST_MAX_OUTPUT_TOKENS_CAP (instead of
AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS), and keep existing Math.floor/Math.max logic
for finite values; reference sanitizeGenerationSettings and
parseFiniteNumericValue as the upstream validators but ensure
capAgentRequestMaxTokens protects itself against bypassed callers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 18762c62-b2d9-4d62-b12a-6fdeb8ad942a
📒 Files selected for processing (9)
docs/specs/agent-tool-context-budget/plan.mddocs/specs/agent-tool-context-budget/spec.mddocs/specs/agent-tool-context-budget/tasks.mdsrc/main/presenter/agentRuntimePresenter/compactionService.tssrc/main/presenter/agentRuntimePresenter/contextBudget.tssrc/main/presenter/agentRuntimePresenter/contextBuilder.tssrc/main/presenter/agentRuntimePresenter/index.tssrc/main/presenter/llmProviderPresenter/aiSdk/streamAdapter.tssrc/main/presenter/llmProviderPresenter/aiSdk/toolProtocol.ts
| let toolTailStart = messages.length - 1 | ||
| while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') { | ||
| toolTailStart -= 1 | ||
| } | ||
|
|
||
| if ( | ||
| toolTailStart < messages.length - 1 && | ||
| messages[toolTailStart]?.role === 'assistant' && | ||
| Array.isArray(messages[toolTailStart]?.tool_calls) && | ||
| messages[toolTailStart]?.tool_calls?.length | ||
| ) { | ||
| return messages.length - toolTailStart | ||
| } | ||
|
|
||
| return 1 |
There was a problem hiding this comment.
Orphan trailing tool messages are only partially protected.
If the message tail is a run of tool messages whose preceding message is not an assistant with non-empty tool_calls (e.g., the assistant message was already trimmed in a prior round, or the history is otherwise malformed), the walk-back exits without satisfying the condition on lines 115–120 and the function falls through to return 1. Only the very last tool message is then protected, and the rest of the tool chain becomes eligible for trimming — which can produce an OpenAI-style "tool message must follow a tool_calls assistant" rejection downstream.
If you want to be defensive here, return the full trailing tool run even without a matching assistant so the chain is preserved (or have the caller normalize/drop orphan tools earlier).
♻️ Optional defensive tweak
let toolTailStart = messages.length - 1
while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') {
toolTailStart -= 1
}
if (
toolTailStart < messages.length - 1 &&
messages[toolTailStart]?.role === 'assistant' &&
Array.isArray(messages[toolTailStart]?.tool_calls) &&
messages[toolTailStart]?.tool_calls?.length
) {
return messages.length - toolTailStart
}
+ // Orphan trailing tool messages: still protect the whole run so providers
+ // that require tool_calls/tool pairing don't reject the request after trimming.
+ if (toolTailStart < messages.length - 1) {
+ return messages.length - 1 - toolTailStart
+ }
+
return 1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let toolTailStart = messages.length - 1 | |
| while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') { | |
| toolTailStart -= 1 | |
| } | |
| if ( | |
| toolTailStart < messages.length - 1 && | |
| messages[toolTailStart]?.role === 'assistant' && | |
| Array.isArray(messages[toolTailStart]?.tool_calls) && | |
| messages[toolTailStart]?.tool_calls?.length | |
| ) { | |
| return messages.length - toolTailStart | |
| } | |
| return 1 | |
| let toolTailStart = messages.length - 1 | |
| while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') { | |
| toolTailStart -= 1 | |
| } | |
| if ( | |
| toolTailStart < messages.length - 1 && | |
| messages[toolTailStart]?.role === 'assistant' && | |
| Array.isArray(messages[toolTailStart]?.tool_calls) && | |
| messages[toolTailStart]?.tool_calls?.length | |
| ) { | |
| return messages.length - toolTailStart | |
| } | |
| // Orphan trailing tool messages: still protect the whole run so providers | |
| // that require tool_calls/tool pairing don't reject the request after trimming. | |
| if (toolTailStart < messages.length - 1) { | |
| return messages.length - toolTailStart | |
| } | |
| return 1 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts` around lines 110 -
124, The code only protects the last tool message when trailing tool messages
don't have a preceding assistant with non-empty tool_calls; update the logic
around toolTailStart/messages/tool_calls so that if toolTailStart <
messages.length - 1 but the preceding message is not an assistant with
tool_calls, you still return the full trailing tool-run length (e.g., compute
tailLen = messages.length - (toolTailStart + 1) and return tailLen) instead of
returning 1; change the conditional/return that currently checks
messages[toolTailStart]?.role === 'assistant' and falls through to return 1 so
orphan trailing tool messages are preserved as a block.
| const requestBudget = buildRequestContextBudget( | ||
| requestMaxTokens, | ||
| requestModelConfig.contextLength, | ||
| requestTools | ||
| ) | ||
| const protectedSteerTailCount = | ||
| claimedSteerBatch.length > 0 | ||
| ? claimedSteerBatch.length + (requestMessages.at(-1)?.role === 'user' ? 1 : 0) | ||
| : 0 | ||
| const fittedMessages = fitRequestMessagesToContextWindow({ | ||
| messages: injectedMessages, | ||
| contextLength: requestModelConfig.contextLength, | ||
| reserveTokens: requestBudget.totalReserveTokens, | ||
| minimumProtectedTailCount: protectedSteerTailCount | ||
| }) |
There was a problem hiding this comment.
The protected-tail fit happens too late to save tool-continuation turns.
This new pass runs after injectSteerInputsIntoRequest() has already called fitMessagesToContextWindow(). When a steer message is injected during a tool-continuation loop, that earlier truncation can evict the assistant tool_calls message and its matching tool result before fitRequestMessagesToContextWindow() runs, so the new protection logic never sees them.
Make steer injection a pure splice and let this budget-aware fit be the only truncation step in the loop, or teach the earlier fit about the protected tool tail too.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/presenter/agentRuntimePresenter/index.ts` around lines 1649 - 1663,
The steer-injection truncation happens before your protected-tail calculation,
so ensure steer injection does not drop assistant tool_calls/tool results:
modify injectSteerInputsIntoRequest() so it performs a pure splice (insert steer
messages without calling fitMessagesToContextWindow()), or alternatively update
the earlier call to fitMessagesToContextWindow() to accept and preserve a
minimumProtectedTailCount (the same logic used by
fitRequestMessagesToContextWindow()) so claimedSteerBatch and assistant
tool_call/tool_result pairs are reserved; adjust callers so
buildRequestContextBudget(), claimedSteerBatch, protectedSteerTailCount,
fitRequestMessagesToContextWindow(), and injectSteerInputsIntoRequest()
coordinate on the same protected-tail contract.
| function normalizeLegacyFunctionCallContent(content: string): string { | ||
| let normalized = content.replace(/<\/?function_call>/g, '').trim() | ||
|
|
||
| const fenced = normalized.match(/^```(?:json|JSON)?\s*([\s\S]*?)\s*```$/) | ||
| if (fenced?.[1]) { | ||
| normalized = fenced[1].trim() | ||
| } | ||
|
|
||
| return normalized |
There was a problem hiding this comment.
Don't strip <function_call> substrings out of the JSON payload.
At this point the wrapper is already removed by matchAll(...)[1] and by the trailing-open-tag slice, so the global replace can only mutate payload data. A valid argument like {"text":"<function_call>"} will be silently rewritten before parsing.
Suggested fix
function normalizeLegacyFunctionCallContent(content: string): string {
- let normalized = content.replace(/<\/?function_call>/g, '').trim()
+ let normalized = content.trim()
const fenced = normalized.match(/^```(?:json|JSON)?\s*([\s\S]*?)\s*```$/)
if (fenced?.[1]) {
normalized = fenced[1].trim()
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function normalizeLegacyFunctionCallContent(content: string): string { | |
| let normalized = content.replace(/<\/?function_call>/g, '').trim() | |
| const fenced = normalized.match(/^```(?:json|JSON)?\s*([\s\S]*?)\s*```$/) | |
| if (fenced?.[1]) { | |
| normalized = fenced[1].trim() | |
| } | |
| return normalized | |
| function normalizeLegacyFunctionCallContent(content: string): string { | |
| let normalized = content.trim() | |
| const fenced = normalized.match(/^ |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/presenter/llmProviderPresenter/aiSdk/toolProtocol.ts` around lines
128 - 136, The function normalizeLegacyFunctionCallContent is improperly
stripping all "<function_call>" tags from the payload which can corrupt valid
JSON like {"text":"<function_call>"}; remove the global replace that removes
those tags and rely on the existing fenced code block extraction and any prior
wrapper-slicing logic (i.e., delete the line using
content.replace(/<\/?function_call>/g, '') and operate on the original content
variable, keeping the fenced block extraction and trimming logic intact in
normalizeLegacyFunctionCallContent).
… budget
Closes the first increment of docs/specs/agent-tool-context-budget.
Summary by CodeRabbit
Release Notes
Improvements
Bug Fixes