fix(agent): account for tool schemas and cap output tokens in context… by zhangmo8 · Pull Request #1551 · ThinkInAIXYZ/deepchat

zhangmo8 · 2026-04-27T10:07:38Z

… budget

Cap agent-session default maxTokens to 16 384 and never reserve more than half the context window for output
Reserve tool-definition token cost when building initial and resume context, preventing silent context overflow on tool-heavy sessions
Preflight-fit request messages to the effective budget before every provider-loop call, protecting the active tool-continuation tail
Resolve a per-request effective output cap from fitted message size and tool-schema cost rather than using the raw maxTokens value
Harden legacy function-call parsing: tolerate JSON wrapped in Markdown fences, a trailing unclosed <function_call> tag at end of stream, and extra finish-reason aliases from non-standard providers

Closes the first increment of docs/specs/agent-tool-context-budget.

Summary by CodeRabbit

Release Notes

Improvements
- Agents now more efficiently manage context budgets during conversations with multiple tools, improving reliability
- Enhanced tool call detection in streaming scenarios with better completion signal recognition
Bug Fixes
- Tool call parser now gracefully handles incomplete tags and markdown-fenced JSON, reducing parsing failures

… budget - Cap agent-session default maxTokens to 16 384 and never reserve more than half the context window for output - Reserve tool-definition token cost when building initial and resume context, preventing silent context overflow on tool-heavy sessions - Preflight-fit request messages to the effective budget before every provider-loop call, protecting the active tool-continuation tail - Resolve a per-request effective output cap from fitted message size and tool-schema cost rather than using the raw maxTokens value - Harden legacy function-call parsing: tolerate JSON wrapped in Markdown fences, a trailing unclosed <function_call> tag at end of stream, and extra finish-reason aliases from non-standard providers Closes the first increment of docs/specs/agent-tool-context-budget.

coderabbitai · 2026-04-27T10:07:52Z

📝 Walkthrough

Walkthrough

This PR implements an agent context budgeting system to manage token allocation in tool-heavy conversations. It adds specification documentation, introduces token-capping utilities, threads tool-reserve budgeting through request preparation and compaction, and enhances legacy function-call parsing to handle incomplete tags and markdown-fenced JSON.

Changes

Cohort / File(s)	Summary
Specification & Documentation `docs/specs/agent-tool-context-budget/plan.md`, `docs/specs/agent-tool-context-budget/spec.md`, `docs/specs/agent-tool-context-budget/tasks.md`	New specification files documenting the agent context budgeting approach, implementation plan with concrete steps, failure modes, follow-up work, and task checklist tracking completion status.
Core Token Budgeting Utilities `src/main/presenter/agentRuntimePresenter/contextBudget.ts`	New module providing token-cap constants, functions to cap output tokens, estimate tool-reserve tokens, build request budgets, fit messages into context windows while protecting recent message patterns, and compute effective max output token budgets after accounting for message and tool costs.
Context & Compaction Integration `src/main/presenter/agentRuntimePresenter/contextBuilder.ts`, `src/main/presenter/agentRuntimePresenter/compactionService.ts`	Extended to support `extraReserveTokens` parameter threading throughout context/resume preparation and compaction eligibility checks; added `estimateToolDefinitionTokens` helper to compute token usage from tool definitions.
Runtime Request Flow `src/main/presenter/agentRuntimePresenter/index.ts`	Major changes integrating tool-aware token budgeting: caps generation `maxTokens` at request/default levels, estimates and threads tool-reserve tokens through compaction/context/resume calls, builds request context budgets, protects message tails during fitting, and resolves effective max tokens for the provider using fitted message costs plus tool reserves.
Tool Protocol Parsing `src/main/presenter/llmProviderPresenter/aiSdk/streamAdapter.ts`, `src/main/presenter/llmProviderPresenter/aiSdk/toolProtocol.ts`	Enhanced legacy function-call handling: `mapFinishReason` recognizes multiple tool-related finish reason variants; `parseLegacyFunctionCalls` extracts multiple candidates using `matchAll`, normalizes payloads by unwrapping markdown fences around JSON, and handles unterminated closing tags during stream end flushing.

Sequence Diagram

sequenceDiagram
    participant Client as Client Request
    participant Runtime as AgentRuntimePresenter
    participant Budget as ContextBudget
    participant Context as ContextBuilder
    participant Compact as CompactionService
    participant Provider as LLM Provider

    Client->>Runtime: Request with tools
    activate Runtime
    
    Runtime->>Budget: estimateToolReserveTokens(tools)
    Budget-->>Runtime: toolReserveTokens
    
    Runtime->>Runtime: capAgentRequestMaxTokens(maxTokens)
    
    loop Each iteration
        Runtime->>Context: buildContext(..., extraReserveTokens)
        Context->>Budget: Tool reserves subtracted<br/>from available budget
        Context-->>Runtime: Available tokens reduced
        
        Runtime->>Budget: buildRequestContextBudget(maxTokens, contextLength, tools)
        Budget-->>Runtime: RequestContextBudget
        
        Runtime->>Budget: fitRequestMessagesToContextWindow(messages, reserves)
        Budget->>Budget: Fit messages while protecting<br/>recent message patterns
        Budget-->>Runtime: Fitted messages
        
        Runtime->>Budget: resolveEffectiveRequestMaxTokens(fitted messages, tool reserves)
        Budget-->>Runtime: Effective maxTokens for provider
        
        Runtime->>Compact: prepareForNextUserTurn(..., extraReserveTokens)
        Compact->>Compact: Check eligibility using<br/>reduced budget
        Compact-->>Runtime: Compaction decision
        
        Runtime->>Provider: Send request with effective maxTokens
        Provider-->>Runtime: Response (with tool/finish reason)
    end
    
    deactivate Runtime

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

fix: context trim remove tools first #1196 — Refines context-trimming logic to prefer removal of tool-call blocks first when fitting messages, directly complementing this PR's tool-reserve budgeting and message-tail protection.
feat(llm): migrate runtime to ai sdk #1449 — Modifies legacy function-call parsing and stream-adapter tool finish-reason mapping within toolProtocol and streamAdapter, with overlapping code changes to those same files in this PR.
fix(agent): guard large tool outputs #1333 — Adds tool-output validation guards and threads context/token parameters through tool execution, aligning with this PR's tool-aware context budgeting architecture.

Poem

🐰 A rabbit hops through token fields,
Measuring what the toolbox yields,
With reserves and caps so carefully planned,
Each message fits where it should stand,
Now tools and context live as one! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main changes: accounting for tool schemas and capping output tokens in context budget, which are the core objectives addressed throughout the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/agent-tool-context-budget

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

src/main/presenter/agentRuntimePresenter/contextBudget.ts (1)
27-30: Upstream validation prevents non-finite maxTokens from reaching this function, but defensive fallback is recommended.

maxTokens is always validated through sanitizeGenerationSettings() before reaching capAgentRequestMaxTokens(), which uses parseFiniteNumericValue() to explicitly reject Infinity and NaN. However, the function itself has no guard against non-finite inputs, making it fragile if validation is bypassed or if callers are added in the future. Falling back to AGENT_REQUEST_MAX_OUTPUT_TOKENS_CAP instead of AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS is a more forgiving default and improves defensive design.
♻️ Suggested fallback
-  const normalizedMaxTokens = Number.isFinite(maxTokens)
-    ? Math.floor(maxTokens)
-    : AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS
-  const requested = Math.max(AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS, normalizedMaxTokens)
+  const normalizedMaxTokens = Number.isFinite(maxTokens)
+    ? Math.floor(maxTokens)
+    : AGENT_REQUEST_MAX_OUTPUT_TOKENS_CAP
+  const requested = Math.max(AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS, normalizedMaxTokens)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts` around lines 27 -
30, capAgentRequestMaxTokens lacks a defensive guard for non-finite maxTokens;
update capAgentRequestMaxTokens to check Number.isFinite(maxTokens) and when
non-finite fall back to AGENT_REQUEST_MAX_OUTPUT_TOKENS_CAP (instead of
AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS), and keep existing Math.floor/Math.max logic
for finite values; reference sanitizeGenerationSettings and
parseFiniteNumericValue as the upstream validators but ensure
capAgentRequestMaxTokens protects itself against bypassed callers.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts`:
- Around line 110-124: The code only protects the last tool message when
trailing tool messages don't have a preceding assistant with non-empty
tool_calls; update the logic around toolTailStart/messages/tool_calls so that if
toolTailStart < messages.length - 1 but the preceding message is not an
assistant with tool_calls, you still return the full trailing tool-run length
(e.g., compute tailLen = messages.length - (toolTailStart + 1) and return
tailLen) instead of returning 1; change the conditional/return that currently
checks messages[toolTailStart]?.role === 'assistant' and falls through to return
1 so orphan trailing tool messages are preserved as a block.

In `@src/main/presenter/agentRuntimePresenter/index.ts`:
- Around line 1649-1663: The steer-injection truncation happens before your
protected-tail calculation, so ensure steer injection does not drop assistant
tool_calls/tool results: modify injectSteerInputsIntoRequest() so it performs a
pure splice (insert steer messages without calling
fitMessagesToContextWindow()), or alternatively update the earlier call to
fitMessagesToContextWindow() to accept and preserve a minimumProtectedTailCount
(the same logic used by fitRequestMessagesToContextWindow()) so
claimedSteerBatch and assistant tool_call/tool_result pairs are reserved; adjust
callers so buildRequestContextBudget(), claimedSteerBatch,
protectedSteerTailCount, fitRequestMessagesToContextWindow(), and
injectSteerInputsIntoRequest() coordinate on the same protected-tail contract.

In `@src/main/presenter/llmProviderPresenter/aiSdk/toolProtocol.ts`:
- Around line 128-136: The function normalizeLegacyFunctionCallContent is
improperly stripping all "<function_call>" tags from the payload which can
corrupt valid JSON like {"text":"<function_call>"}; remove the global replace
that removes those tags and rely on the existing fenced code block extraction
and any prior wrapper-slicing logic (i.e., delete the line using
content.replace(/<\/?function_call>/g, '') and operate on the original content
variable, keeping the fenced block extraction and trimming logic intact in
normalizeLegacyFunctionCallContent).

---

Nitpick comments:
In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts`:
- Around line 27-30: capAgentRequestMaxTokens lacks a defensive guard for
non-finite maxTokens; update capAgentRequestMaxTokens to check
Number.isFinite(maxTokens) and when non-finite fall back to
AGENT_REQUEST_MAX_OUTPUT_TOKENS_CAP (instead of
AGENT_MIN_EFFECTIVE_OUTPUT_TOKENS), and keep existing Math.floor/Math.max logic
for finite values; reference sanitizeGenerationSettings and
parseFiniteNumericValue as the upstream validators but ensure
capAgentRequestMaxTokens protects itself against bypassed callers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18762c62-b2d9-4d62-b12a-6fdeb8ad942a

📥 Commits

Reviewing files that changed from the base of the PR and between 8b43cf6 and 011b95a.

📒 Files selected for processing (9)

docs/specs/agent-tool-context-budget/plan.md
docs/specs/agent-tool-context-budget/spec.md
docs/specs/agent-tool-context-budget/tasks.md
src/main/presenter/agentRuntimePresenter/compactionService.ts
src/main/presenter/agentRuntimePresenter/contextBudget.ts
src/main/presenter/agentRuntimePresenter/contextBuilder.ts
src/main/presenter/agentRuntimePresenter/index.ts
src/main/presenter/llmProviderPresenter/aiSdk/streamAdapter.ts
src/main/presenter/llmProviderPresenter/aiSdk/toolProtocol.ts

coderabbitai · 2026-04-27T10:14:18Z

+  let toolTailStart = messages.length - 1
+  while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') {
+    toolTailStart -= 1
+  }
+
+  if (
+    toolTailStart < messages.length - 1 &&
+    messages[toolTailStart]?.role === 'assistant' &&
+    Array.isArray(messages[toolTailStart]?.tool_calls) &&
+    messages[toolTailStart]?.tool_calls?.length
+  ) {
+    return messages.length - toolTailStart
+  }
+
+  return 1


⚠️ Potential issue | 🟡 Minor

Orphan trailing tool messages are only partially protected.

If the message tail is a run of tool messages whose preceding message is not an assistant with non-empty tool_calls (e.g., the assistant message was already trimmed in a prior round, or the history is otherwise malformed), the walk-back exits without satisfying the condition on lines 115–120 and the function falls through to return 1. Only the very last tool message is then protected, and the rest of the tool chain becomes eligible for trimming — which can produce an OpenAI-style "tool message must follow a tool_calls assistant" rejection downstream.

If you want to be defensive here, return the full trailing tool run even without a matching assistant so the chain is preserved (or have the caller normalize/drop orphan tools earlier).

♻️ Optional defensive tweak

let toolTailStart = messages.length - 1 while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') { toolTailStart -= 1 } if ( toolTailStart < messages.length - 1 && messages[toolTailStart]?.role === 'assistant' && Array.isArray(messages[toolTailStart]?.tool_calls) && messages[toolTailStart]?.tool_calls?.length ) { return messages.length - toolTailStart } + // Orphan trailing tool messages: still protect the whole run so providers + // that require tool_calls/tool pairing don't reject the request after trimming. + if (toolTailStart < messages.length - 1) { + return messages.length - 1 - toolTailStart + } + return 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let toolTailStart = messages.length - 1

while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') {

toolTailStart -= 1

}

if (

toolTailStart < messages.length - 1 &&

messages[toolTailStart]?.role === 'assistant' &&

Array.isArray(messages[toolTailStart]?.tool_calls) &&

messages[toolTailStart]?.tool_calls?.length

) {

return messages.length - toolTailStart

}

return 1

let toolTailStart = messages.length - 1

while (toolTailStart >= 0 && messages[toolTailStart]?.role === 'tool') {

toolTailStart -= 1

}

if (

toolTailStart < messages.length - 1 &&

messages[toolTailStart]?.role === 'assistant' &&

Array.isArray(messages[toolTailStart]?.tool_calls) &&

messages[toolTailStart]?.tool_calls?.length

) {

return messages.length - toolTailStart

}

// Orphan trailing tool messages: still protect the whole run so providers

// that require tool_calls/tool pairing don't reject the request after trimming.

if (toolTailStart < messages.length - 1) {

return messages.length - toolTailStart

}

return 1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/main/presenter/agentRuntimePresenter/contextBudget.ts` around lines 110 - 124, The code only protects the last tool message when trailing tool messages don't have a preceding assistant with non-empty tool_calls; update the logic around toolTailStart/messages/tool_calls so that if toolTailStart < messages.length - 1 but the preceding message is not an assistant with tool_calls, you still return the full trailing tool-run length (e.g., compute tailLen = messages.length - (toolTailStart + 1) and return tailLen) instead of returning 1; change the conditional/return that currently checks messages[toolTailStart]?.role === 'assistant' and falls through to return 1 so orphan trailing tool messages are preserved as a block.

coderabbitai · 2026-04-27T10:14:18Z

+            const requestBudget = buildRequestContextBudget(
+              requestMaxTokens,
+              requestModelConfig.contextLength,
+              requestTools
+            )
+            const protectedSteerTailCount =
+              claimedSteerBatch.length > 0
+                ? claimedSteerBatch.length + (requestMessages.at(-1)?.role === 'user' ? 1 : 0)
+                : 0
+            const fittedMessages = fitRequestMessagesToContextWindow({
+              messages: injectedMessages,
+              contextLength: requestModelConfig.contextLength,
+              reserveTokens: requestBudget.totalReserveTokens,
+              minimumProtectedTailCount: protectedSteerTailCount
+            })


⚠️ Potential issue | 🟠 Major

The protected-tail fit happens too late to save tool-continuation turns.

This new pass runs after injectSteerInputsIntoRequest() has already called fitMessagesToContextWindow(). When a steer message is injected during a tool-continuation loop, that earlier truncation can evict the assistant tool_calls message and its matching tool result before fitRequestMessagesToContextWindow() runs, so the new protection logic never sees them.

Make steer injection a pure splice and let this budget-aware fit be the only truncation step in the loop, or teach the earlier fit about the protected tool tail too.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/main/presenter/agentRuntimePresenter/index.ts` around lines 1649 - 1663, The steer-injection truncation happens before your protected-tail calculation, so ensure steer injection does not drop assistant tool_calls/tool results: modify injectSteerInputsIntoRequest() so it performs a pure splice (insert steer messages without calling fitMessagesToContextWindow()), or alternatively update the earlier call to fitMessagesToContextWindow() to accept and preserve a minimumProtectedTailCount (the same logic used by fitRequestMessagesToContextWindow()) so claimedSteerBatch and assistant tool_call/tool_result pairs are reserved; adjust callers so buildRequestContextBudget(), claimedSteerBatch, protectedSteerTailCount, fitRequestMessagesToContextWindow(), and injectSteerInputsIntoRequest() coordinate on the same protected-tail contract.

coderabbitai · 2026-04-27T10:14:18Z

+function normalizeLegacyFunctionCallContent(content: string): string {
+  let normalized = content.replace(/<\/?function_call>/g, '').trim()
+
+  const fenced = normalized.match(/^```(?:json|JSON)?\s*([\s\S]*?)\s*```$/)
+  if (fenced?.[1]) {
+    normalized = fenced[1].trim()
+  }
+
+  return normalized


⚠️ Potential issue | 🟠 Major

Don't strip <function_call> substrings out of the JSON payload.

At this point the wrapper is already removed by matchAll(...)[1] and by the trailing-open-tag slice, so the global replace can only mutate payload data. A valid argument like {"text":"<function_call>"} will be silently rewritten before parsing.

Suggested fix

function normalizeLegacyFunctionCallContent(content: string): string { - let normalized = content.replace(/<\/?function_call>/g, '').trim() + let normalized = content.trim() const fenced = normalized.match(/^```(?:json|JSON)?\s*([\s\S]*?)\s*```$/) if (fenced?.[1]) { normalized = fenced[1].trim() }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function normalizeLegacyFunctionCallContent(content: string): string {

let normalized = content.replace(/<\/?function_call>/g, '').trim()

const fenced = normalized.match(/^```(?:json|JSON)?\s*([\s\S]*?)\s*```$/)

if (fenced?.[1]) {

normalized = fenced[1].trim()

}

return normalized

function normalizeLegacyFunctionCallContent(content: string): string {

let normalized = content.trim()

const fenced = normalized.match(/^

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/main/presenter/llmProviderPresenter/aiSdk/toolProtocol.ts` around lines 128 - 136, The function normalizeLegacyFunctionCallContent is improperly stripping all "<function_call>" tags from the payload which can corrupt valid JSON like {"text":"<function_call>"}; remove the global replace that removes those tags and rely on the existing fenced code block extraction and any prior wrapper-slicing logic (i.e., delete the line using content.replace(/<\/?function_call>/g, '') and operate on the original content variable, keeping the fenced block extraction and trimming logic intact in normalizeLegacyFunctionCallContent).

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

zerob13 merged commit d770406 into dev Apr 28, 2026
3 checks passed

zhangmo8 deleted the feat/agent-tool-context-budget branch April 28, 2026 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): account for tool schemas and cap output tokens in context…#1551

fix(agent): account for tool schemas and cap output tokens in context…#1551
zerob13 merged 1 commit intodevfrom
feat/agent-tool-context-budget

zhangmo8 commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhangmo8 commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhangmo8 commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading