Skip to content

feat(events): emit 5-way token breakdown + context-window utilization in message_complete#87

Open
byapparov wants to merge 2 commits into
mainfrom
feat/usage-token-breakdown
Open

feat(events): emit 5-way token breakdown + context-window utilization in message_complete#87
byapparov wants to merge 2 commits into
mainfrom
feat/usage-token-breakdown

Conversation

@byapparov

Copy link
Copy Markdown
Contributor

Summary

  • 5-way token breakdown in message_complete: expands tokens from an opaque info.tokens passthrough to an explicit object — input / output / reasoning / cache.read / cache.write — mirroring upstream LLM.Usage. The data was already captured in MessageV2.Assistant.tokens via StepFinishPart accumulation; this change surfaces all fields explicitly.
  • Context-window utilization added as context: { used, limit, ratio } where used = input + cache.read, limit comes from Provider.getModel() → model.limit.context (models.dev registry), and ratio = used / limit. Emits null when the model's context limit is not known (unregistered custom endpoint).
  • cost: 0 trap avoided: info.cost is kept as-is — it accumulates real per-step cost from StepFinishPart. The new upstream step.ended event emits cost: 0 (reconciled later by a projector); we do not touch that path.
  • EVENTS.md updated with the extended schema, field-by-field documentation, and non-overlap invariant note.

How the cache split was already captured

MessageV2.StepFinishPart (the legacy step-finish message part) already has tokens: { input, output, reasoning, cache: { read, write } }. The assistant message info.tokens is accumulated from these step parts — cache split included. No new provider-level capture was needed; we just stop dropping it in the emit call.

Context limit source

Provider.getModel(providerID, modelID) returns the model record from the models.dev registry, which has model.limit.context. The lookup is wrapped in a .catch(() => null) so an unknown model (custom endpoint) gracefully emits context: null rather than throwing.

Test plan

  • TDD: wrote failing test in packages/cli/test/cli/usage-token-breakdown.test.ts (8 cases) — confirmed RED before implementation
  • Implemented changes in packages/cli/src/cli/cmd/run.ts
  • bun test test/cli/usage-token-breakdown.test.ts — 8/8 GREEN
  • bun test test/cli/ — 77/77 GREEN (no regressions)
  • bun run typecheck — clean
  • Pre-push hook ran bun turbo typecheck across all 5 packages — 6/6 tasks successful

Closes #86

🤖 Generated with Claude Code

https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ

… in message_complete (#86)

- Expand `tokens` in `message_complete` from an opaque `info.tokens` passthrough
  to an explicit object with all 5 fields: input / output / reasoning /
  cache.read / cache.write — mirroring upstream LLM.Usage shape.
  The data was already captured in MessageV2.Assistant.tokens via StepFinishPart
  accumulation; this change surfaces it explicitly.

- Add `context: { used, limit, ratio }` to `message_complete`:
  - `used = input + cache.read` (tokens occupying the context window this turn)
  - `limit` sourced from Provider.getModel() → model.limit.context (models.dev)
  - `ratio = used / limit`; emits `null` when limit is unknown (unregistered endpoint)

- Cost kept correct: `info.cost` accumulates real per-step cost from StepFinishPart,
  NOT from the new step.ended event which emits cost:0 (the cost:0 trap).

- Update EVENTS.md with the extended schema and field-by-field documentation.

- Add TDD test file (RED→GREEN): `test/cli/usage-token-breakdown.test.ts`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ
Comment thread packages/cli/src/cli/cmd/run.ts Outdated
Comment thread packages/cli/test/cli/usage-token-breakdown.test.ts Outdated
@aictrl-dev

aictrl-dev Bot commented Jun 22, 2026

Copy link
Copy Markdown

Code review

Verdict: Address the major findings before merging. · 🔴 0 · 🟠 1 · 🟡 1 · ⚪ 0 · 0/2 resolved

  • 🟠 packages/cli/src/cli/cmd/run.ts:488-494 — limit:0 yields Infinity ratio, breaks null contract
  • 🟡 packages/cli/test/cli/usage-token-breakdown.test.ts:63 — Tests grep source text instead of running emit path
🤖 Fix all 2 open findings with your agent
Fix the following code review findings on aictrl-dev/cli PR #87 (head branch).
Run the relevant tests/linters after each change.

1. packages/cli/src/cli/cmd/run.ts:488-494 — limit:0 yields Infinity ratio, breaks null contract
   Detail: The guard `contextLimit != null` does not catch the case where the model's context limit is `0`. `Provider.Model.limit.context` is a required `z.number()` (provider.ts:703), and config-defined custom models without a limit default `context` to `0` (provider.ts:929: `context: model.limit?.context ?? existingModel?.limit?.context ?? 0`). For such models `Provider.getModel` succeeds and returns `context: 0`, so `contextLimit` is `0` (not `null`), the guard passes, and `ratio = contextUsed / 0` evaluates to `Infinity` (or `NaN` when `contextUsed` is also `0`). `JSON.stringify` then serializes `Infinity`/`NaN` as `null` *inside* the context object, yielding `{ used: <N>, limit: 0, ratio: null }` instead of the documented top-level `null`. EVENTS.md explicitly promises: "null — emitted when the model's context limit is not known (e.g. unregistered custom endpoint)". This diverges from the documented contract and emits a misleading object for the exact scenario it claims to handle. Fix: also require `contextLimit > 0` before emitting the object.
   Suggested fix: Change the guard from `contextLimit != null` to `contextLimit != null && contextLimit > 0` so a context limit of `0` (config/custom models without a known limit) is treated as "unknown" and emits `null` as documented, avoiding `Infinity`/`NaN` in the ratio.
2. packages/cli/test/cli/usage-token-breakdown.test.ts:63 — Tests grep source text instead of running emit path
   Detail: The new tests assert behavior by reading `run.ts` as a string and matching regexes against the source text rather than exercising the emit path. Several assertions are loose enough to pass even if the logic is wrong:\n\n- `test("context.ratio is used / limit")` (line 63) uses `/ratio.*\\/|\\/.*ratio|used\\s*\\/\\s*limit|contextLimit/`. The `contextLimit` alternative matches the bare variable declaration on line 483, so the test stays green even if `ratio` is changed to a constant (e.g. `ratio: 0`).\n- `test("context.limit is sourced from Provider.getModel context limit")` (line 57) uses `/Provider\\.getModel|limit\\.context|contextLimit/` — any one of those tokens anywhere in the file passes; it does not verify the value is actually wired into `context.limit`.\n\nThese give a false sense of coverage: a refactor that renames a variable or removes the real computation but leaves a stray token would not be caught. Prefer asserting against actual emitted JSON (e.g. stub `Provider.getModel` and capture `process.stdout` writes from `emit("message_complete", …)`) so the 5-way `tokens` and `context` shapes and the `used = input + cache.read` / `ratio = used/limit` computations are verified at runtime.
📋 Out-of-diff findings (2)
Sev Location Finding
🟠 packages/cli/src/cli/cmd/run.ts:488-494 limit:0 yields Infinity ratio, breaks null contract
🟡 packages/cli/test/cli/usage-token-breakdown.test.ts:63 Tests grep source text instead of running emit path

Reviewed 3 files · 0 inline · view all 2 findings ↗


aictrl · AI code review for fast-moving teams · aictrl.dev

…dContextWindow

Custom models without a registered context limit default to `limit.context = 0`
(provider.ts:929). The old guard `contextLimit != null` passed for 0, causing
`ratio = used / 0 = Infinity`, which JSON.stringify serialises as `null` *inside*
the context object — diverging from the documented top-level null contract in
EVENTS.md ("null — emitted when the model's context limit is not known").

Fix: extract pure helper `buildContextWindow(limit, used)` that returns null when
limit is null or <=0. This also makes the computation unit-testable.

Replace source-grep tests (which could pass even with wrong logic, per bot review)
with 10 behavioural unit tests of `buildContextWindow` covering: null limit, zero
limit (🟠 regression case), ratio computation, JSON-serialisability, and the
top-level-null contract. Retain slim source-text checks for structural wiring.

Fixes review findings from PR #87 (aictrl-dev bot):
- 🟠 limit:0 yields Infinity ratio, breaks null contract
- 🟡 Tests grep source text instead of running emit path
@byapparov

Copy link
Copy Markdown
Contributor Author

Review response — PR #87

Triaged 2 findings (🟠 1, 🟡 1). Both verified TRUE; both fixed.

Issues addressed (pushed to this PR)

  • limit:0 yields Infinity ratio, breaks null contractpackages/cli/src/cli/cmd/run.ts:488: Verified TRUE. provider.ts:929 confirms custom models default context to 0. The old contextLimit != null guard passes for 0, so ratio = used / 0 = Infinity, which JSON.stringify serialises as null inside the object — breaking the documented top-level null contract in EVENTS.md line 113. Fixed by extracting buildContextWindow(limit, used) which returns null when limit == null || limit <= 0. (commit 7bbf2045d)

  • Tests grep source text instead of running emit pathpackages/cli/test/cli/usage-token-breakdown.test.ts:63: Verified TRUE. context.ratio is used / limit matched the bare contextLimit variable declaration, not the computation — the test stayed green with ratio: 0 hardcoded. Fixed by replacing the 5 loose regex tests with 10 behavioural unit tests of the extracted buildContextWindow helper: null limit, zero limit (regression case), ratio math, JSON-serialisability, top-level-null contract. Source-text checks retained only for structural wiring that can't be covered by the pure helper. (commit 7bbf2045d)

Review claims verified false (no change needed)

(none — both findings were genuine)

Not addressed here

(none — all findings fixed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Emit 5-way token breakdown + context-window utilization in usage events (align with upstream LLM.Usage)

1 participant