-
Notifications
You must be signed in to change notification settings - Fork 140
fix: handle compaction truncation and output budgets #267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| --- | ||
| "@moonshot-ai/agent-core": patch | ||
| "@moonshot-ai/kosong": patch | ||
| "@moonshot-ai/kimi-code": patch | ||
| --- | ||
|
|
||
| Report truncated compaction summaries clearly and apply valid completion token budgets across supported providers. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -815,6 +815,7 @@ export class AnthropicChatProvider implements ChatProvider { | |
| private _defaultHeaders: Record<string, string> | undefined; | ||
| private _clientFactory: ((auth: ProviderRequestAuth) => Anthropic) | undefined; | ||
| private _adaptiveThinking: boolean | undefined; | ||
| private _explicitMaxTokens: boolean; | ||
|
|
||
| constructor(options: AnthropicOptions) { | ||
| this._model = options.model; | ||
|
|
@@ -827,6 +828,7 @@ export class AnthropicChatProvider implements ChatProvider { | |
| this._defaultHeaders = options.defaultHeaders; | ||
| this._clientFactory = options.clientFactory; | ||
| this._client = this._apiKey === undefined ? undefined : this._buildClient(this._apiKey); | ||
| this._explicitMaxTokens = options.defaultMaxTokens !== undefined; | ||
| this._generationKwargs = { | ||
| max_tokens: resolveDefaultMaxTokens(options.model, options.defaultMaxTokens), | ||
| betaFeatures: options.betaFeatures ?? [INTERLEAVED_THINKING_BETA], | ||
|
|
@@ -1082,9 +1084,25 @@ export class AnthropicChatProvider implements ChatProvider { | |
| return this._withGenerationKwargs(kwargs); | ||
| } | ||
|
|
||
| withMaxCompletionTokens(maxCompletionTokens: number): AnthropicChatProvider { | ||
| const requestedCap = resolveDefaultMaxTokens(this._model, maxCompletionTokens); | ||
| const existingCap = this._generationKwargs.max_tokens; | ||
| const clone = this._withGenerationKwargs({ | ||
| max_tokens: | ||
| existingCap === undefined || this._explicitMaxTokens | ||
| ? existingCap ?? requestedCap | ||
| : Math.min(existingCap, requestedCap), | ||
| }); | ||
|
Comment on lines
+1087
to
+1095
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When an Anthropic model alias sets Useful? React with 👍 / 👎. |
||
| clone._explicitMaxTokens = this._explicitMaxTokens; | ||
| return clone; | ||
| } | ||
|
|
||
| private _withGenerationKwargs(kwargs: Partial<AnthropicGenerationKwargs>): AnthropicChatProvider { | ||
| const clone = this._clone(); | ||
| clone._generationKwargs = { ...clone._generationKwargs, ...kwargs }; | ||
| if ('max_tokens' in kwargs) { | ||
| clone._explicitMaxTokens = kwargs.max_tokens !== undefined; | ||
| } | ||
| return clone; | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -888,6 +888,10 @@ export class GoogleGenAIChatProvider implements ChatProvider { | |
| return clone; | ||
| } | ||
|
|
||
| withMaxCompletionTokens(maxCompletionTokens: number): GoogleGenAIChatProvider { | ||
| return this.withGenerationKwargs({ max_output_tokens: maxCompletionTokens }); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When a google-genai/vertexai alias has Useful? React with 👍 / 👎. |
||
| } | ||
|
|
||
| private _clone(): GoogleGenAIChatProvider { | ||
| const clone = Object.assign( | ||
| Object.create(Object.getPrototypeOf(this) as object) as GoogleGenAIChatProvider, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -476,6 +476,10 @@ export class OpenAILegacyChatProvider implements ChatProvider { | |
| return clone; | ||
| } | ||
|
|
||
| withMaxCompletionTokens(maxCompletionTokens: number): OpenAILegacyChatProvider { | ||
| return this.withGenerationKwargs({ max_tokens: maxCompletionTokens }); | ||
|
Comment on lines
+479
to
+480
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When an OpenAI Chat Completions provider is constructed with Useful? React with 👍 / 👎. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When this provider is used with an o-series Chat Completions model such as Useful? React with 👍 / 👎. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For non-o-series Chat Completions aliases, the default budget path passes the full configured context window into this hook (for env models that is 262144 unless overridden), so even a tiny prompt is sent with Useful? React with 👍 / 👎. |
||
| } | ||
|
|
||
| private _clone(): OpenAILegacyChatProvider { | ||
| const clone = Object.assign( | ||
| Object.create(Object.getPrototypeOf(this) as object) as OpenAILegacyChatProvider, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -975,6 +975,10 @@ export class OpenAIResponsesChatProvider implements ChatProvider { | |
| return clone; | ||
| } | ||
|
|
||
| withMaxCompletionTokens(maxCompletionTokens: number): OpenAIResponsesChatProvider { | ||
| return this.withGenerationKwargs({ max_output_tokens: maxCompletionTokens }); | ||
|
Comment on lines
+978
to
+979
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For OpenAI Responses aliases whose Useful? React with 👍 / 👎. |
||
| } | ||
|
Comment on lines
+978
to
+980
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When an OpenAI Responses provider is constructed with Useful? React with 👍 / 👎. |
||
|
|
||
| private _clone(): OpenAIResponsesChatProvider { | ||
| const clone = Object.assign( | ||
| Object.create(Object.getPrototypeOf(this) as object) as OpenAIResponsesChatProvider, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fixed-budget Anthropic thinking (for example
withThinking('high')on pre-adaptive Claude models) plus a lower completion cap such asKIMI_MODEL_MAX_COMPLETION_TOKENS=8192, this branch lowersmax_tokensto the cap but leaves the existingthinking.budget_tokensat 32000. Anthropic's extended-thinking docs requirebudget_tokensto be less thanmax_tokensoutside the interleaved-tools exception (https://platform.claude.com/docs/en/build-with-claude/extended-thinking), so simple Anthropic requests in that configuration become invalid; please lower/disable the thinking budget or avoid applying an incompatible cap.Useful? React with 👍 / 👎.