Skip to content

fix: handle compaction truncation and output budgets#267

Open
kermanx wants to merge 2 commits into
mainfrom
xtr/named-compaction-truncation
Open

fix: handle compaction truncation and output budgets#267
kermanx wants to merge 2 commits into
mainfrom
xtr/named-compaction-truncation

Conversation

@kermanx
Copy link
Copy Markdown
Collaborator

@kermanx kermanx commented Jun 1, 2026

Related Issue

No linked issue. This fixes compaction diagnostics and completion budget propagation issues found while handling long-context model requests.

Problem

Full compaction used a generic local error when the model response was truncated, so exhausted retries surfaced as a plain Error instead of a useful compaction-specific failure. The compaction path could also try to apply a zero completion budget when the selected model's context window was unknown. Separately, only some providers implemented the shared completion-budget hook, so ordinary turns and compaction did not consistently send an explicit output-token budget across supported provider backends.

What changed

  • Added a named compaction truncation error and kept truncated summaries on the same reduction-and-retry path as context overflow.
  • Avoid applying a compaction completion budget when the model context window is unknown.
  • Added provider-level completion-budget support for Anthropic, OpenAI Chat Completions, OpenAI Responses, Google GenAI, and Vertex AI.
  • Added regression coverage for compaction unknown-context behavior and provider request bodies.
  • Merged the changeset into a single entry covering agent-core, kosong, and the CLI bundle.

Checklist

  • I have read the CONTRIBUTING document.
  • I have linked a related issue, or explained the problem above.
  • I have added tests that prove my feature works.
  • Ran gen-changesets skill, or this PR needs no changeset.
  • Ran gen-docs skill, or this PR needs no doc update.

Validation:

pnpm exec vitest run packages/agent-core/test/agent/compaction/full.test.ts packages/kosong/test/openai-legacy.test.ts packages/kosong/test/openai-responses.test.ts packages/kosong/test/google-genai.test.ts packages/kosong/test/anthropic.test.ts
pnpm --filter @moonshot-ai/agent-core run typecheck
pnpm --filter @moonshot-ai/kosong run typecheck

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 1, 2026

🦋 Changeset detected

Latest commit: 6a15625

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@moonshot-ai/agent-core Patch
@moonshot-ai/kosong Patch
@moonshot-ai/kimi-code Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 1, 2026

pnpm dlx https://pkg.pr.new/@moonshot-ai/kimi-code@6a15625
npx https://pkg.pr.new/@moonshot-ai/kimi-code@6a15625

commit: 6a15625

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1fe86cf7fb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/agent-core/src/agent/compaction/full.ts Outdated
@RealKai42 RealKai42 changed the title fix(agent-core): name truncated compaction errors fix: handle compaction truncation and output budgets Jun 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a15625d40

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1085 to +1088
withMaxCompletionTokens(maxCompletionTokens: number): AnthropicChatProvider {
return this._withGenerationKwargs({
max_tokens: resolveDefaultMaxTokens(this._model, maxCompletionTokens),
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve lower Anthropic max token caps

When an Anthropic model alias sets maxOutputSize, provider construction passes it as defaultMaxTokens, but after this method exists the agent's per-request completion-budget path calls withMaxCompletionTokens(maxContextSize) for normal and compaction requests. Because this assignment recomputes max_tokens from the larger context-window cap, a user-configured lower output limit is raised back to the model ceiling, so aliases intended to limit cost or force shorter responses no longer do so. Please keep the existing lower max_tokens when it is below maxCompletionTokens.

Useful? React with 👍 / 👎.

Comment on lines +236 to +240
const maxContextTokens = this.agent.config.modelCapabilities.max_context_tokens;
const provider =
maxContextTokens > 0
? baseProvider.withMaxCompletionTokens?.(maxContextTokens) ?? baseProvider
: baseProvider;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor completion-budget env caps during compaction

When an operator sets KIMI_MODEL_MAX_COMPLETION_TOKENS (or the legacy env var) to a lower hard cap, or sets it non-positive to opt out, normal turns go through resolveCompletionBudget/applyCompletionBudget, but this compaction path always clones the provider with max_context_tokens. On large-window models this means compaction still serializes a much larger completion budget than the configured cap (or sends one despite opt-out), so the requests operators explicitly capped can fail or become unexpectedly expensive during compaction. Please route compaction through the same budget resolution or skip this clone when the budget is disabled.

Useful? React with 👍 / 👎.

Comment on lines +978 to +979
withMaxCompletionTokens(maxCompletionTokens: number): OpenAIResponsesChatProvider {
return this.withGenerationKwargs({ max_output_tokens: maxCompletionTokens });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clamp OpenAI output caps below context size

For OpenAI Responses aliases whose max_context_size is larger than the model's maximum output size, adding this method activates the generic completion-budget path and sends that context window as max_output_tokens on every turn. The repo already treats upstream messages such as max_output_tokens must not exceed 8192 as plain APIStatusErrors rather than context-overflow retries, so a correctly configured large-context OpenAI model can start failing before generation instead of just allowing a large completion. Please clamp to the provider/model output ceiling or preserve an existing lower cap before serializing this value.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant