Skip to content

feat(run): emit tool_catalog NDJSON event at session start#88

Open
byapparov wants to merge 3 commits into
mainfrom
feat/tool-catalog-session-start
Open

feat(run): emit tool_catalog NDJSON event at session start#88
byapparov wants to merge 3 commits into
mainfrom
feat/tool-catalog-session-start

Conversation

@byapparov

Copy link
Copy Markdown
Contributor

Summary

  • Emits a tool_catalog NDJSON event immediately after session_start and before the first model turn, listing every resolved builtin tool, MCP tool, and skill with version.
  • Adds MCP.toolEntries() — a lightweight sibling to MCP.tools() that returns {toolKey, serverName}[] without building full AI-SDK tool objects.
  • Threads version from SKILL.md frontmatter through Skill.Info (new optional field; null in the event when not declared).
  • New src/cli/cmd/tool-catalog.ts with injectable deps for testing; 9 TDD tests cover builtin/mcp shape, the consumer gate ("tool exposed vs missing"), skill version presence/absence, and tools/skills separation.
  • EVENTS.md updated with full schema documentation.

Test plan

  • bun test test/tool-catalog.test.ts — 9/9 pass (RED→GREEN TDD)
  • bun test full suite — 1294 pass, 0 fail
  • bun run typecheck (turbo, all packages) — clean

Skill version gap

Skill.Info.version is now threaded through from the SKILL.md frontmatter version field. Skills that don't declare a version emit "version": null. No fabricated versions — if a skill pack needs version tracking, add version: x.y.z to its frontmatter.

Closes #85

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com
Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ

Emits a single `tool_catalog` event immediately after `session_start` and
before the first model turn so the server-side completion gate can structurally
verify that required MCP tools (e.g. `record_finding`) were actually exposed to
the model — without string-matching prose output.

Changes:
- `src/cli/cmd/tool-catalog.ts` (new): `buildToolCatalogItems()` collects
  builtin tool IDs via `ToolRegistry.ids()` and MCP entries via the new
  `MCP.toolEntries()`, returning `{ tools, skills }` with injectable deps for
  unit testing.
- `src/mcp/index.ts`: adds `MCP.toolEntries()` — lightweight sibling to
  `tools()` that returns `{ toolKey, serverName }[]` without building full
  AI-SDK tool objects; used by `buildToolCatalogItems()`.
- `src/skill/skill.ts`: extends `Skill.Info` with optional `version` field
  threaded from SKILL.md frontmatter; `null` when not declared.
- `src/cli/cmd/run.ts`: `await buildToolCatalogItems()` + `emit("tool_catalog",
  ...)` inserted between `session_start` and the first `loop()` call.
- `test/tool-catalog.test.ts` (new, TDD): 9 tests covering builtin/mcp
  tool shape, the "tool exposed vs missing" consumer gate, skill version
  presence/absence, and skills/tools separation.
- `EVENTS.md`: documents the new event schema including the version gap note.

Skill version gap: `Skill.Info.version` is now threaded through from
frontmatter. Skills without a `version` field in their SKILL.md emit
`version: null`.

Closes #85

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ
@byapparov byapparov self-assigned this Jun 22, 2026
@byapparov byapparov added this to the Enterprise Observability milestone Jun 22, 2026
…capability (#88)

Instead of a one-off `version` field bolt-on, `Skill.Info` now carries a
`metadata: Record<string, string>` field that retains every SKILL.md
frontmatter key beyond `name`/`description`, with values string-coerced
(handles YAML numbers like `version: 1.0` → `"1.0"`).

`version` becomes a derived convenience accessor (`metadata.version`) rather
than a parallel extraction path. The dual parse path is gone — one coherent
pass builds `metadata`, then derives `version` from it.

Changes:
- `Skill.Info`: add `metadata: z.record(z.string(), z.string()).default({})`;
  keep `version: z.string().optional()` as derived accessor
- Loader: replace separate `"version" in md.data` extraction with a single
  `Object.fromEntries(…filter name/description…map String)` pass
- Tests: add "non-version frontmatter keys survive in skill.metadata" test
  (license+author keys survive; name/description excluded; version still
  resolves via metadata.version) — written RED first, then GREEN

`tool_catalog` event payload unchanged — still emits `version` sourced from
`metadata.version`. `metadata` not added to event payload (kept lean per spec).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ
Comment thread packages/cli/src/cli/cmd/run.ts Outdated
Comment thread packages/cli/src/cli/cmd/tool-catalog.ts
Comment thread packages/cli/src/cli/cmd/run.ts
Comment thread packages/cli/src/mcp/index.ts
Comment thread packages/cli/src/mcp/index.ts
@aictrl-dev

aictrl-dev Bot commented Jun 22, 2026

Copy link
Copy Markdown

Code review

Verdict: Address the major findings before merging. · 🔴 0 · 🟠 2 · 🟡 3 · ⚪ 0 · 0/5 resolved

  • 🟡 packages/cli/src/cli/cmd/run.ts:710 — Catalog built even when output is not JSON
  • 🟠 packages/cli/src/cli/cmd/run.ts:716-718 — Catalog errors swallowed with no logging
  • 🟠 packages/cli/src/cli/cmd/tool-catalog.ts:90 — Catalog builtins are unfiltered, not the dispatched set
  • 🟡 packages/cli/src/mcp/index.ts:670-677 — listTools() runs twice per session start
  • 🟡 packages/cli/src/mcp/index.ts:680-687 — MCP key derivation duplicated in toolEntries() and tools()
🤖 Fix all 5 open findings with your agent
Fix the following code review findings on aictrl-dev/cli PR #88 (head branch).
Run the relevant tests/linters after each change.

1. packages/cli/src/cli/cmd/run.ts:710 — Catalog built even when output is not JSON
   Detail: `emit()` only writes when `args.format === "json"`, but the catalog block unconditionally awaits `buildToolCatalogItems()` — which enumerates every MCP server via `listTools()` and scans the skill filesystem — even in the default human-readable mode where the result is never emitted. For interactive (non-JSON) sessions this adds unnecessary MCP round-trips and startup latency before the first turn.

Guard the collection on `args.format === "json"` (matching how the result is consumed), e.g. `if (args.format === "json") { await buildToolCatalogItems().then(...).catch(...) }`.
2. packages/cli/src/cli/cmd/run.ts:716-718 — Catalog errors swallowed with no logging
   Detail: The `.catch(() => {})` handler silently drops every error from `buildToolCatalogItems()`. The entire purpose of this event (per EVENTS.md) is to let the server-side gate detect "silent success" failure modes — but if catalog collection itself fails (e.g. `Skill.all()` throws on a malformed SKILL.md, `MCP.clients()` rejects, or `ToolRegistry.ids()` fails), the run completes with NO `tool_catalog` event at all and NO signal that anything went wrong. That is a brand-new silent-success path created by the very feature meant to eliminate them.

Because `buildToolCatalogItems` uses `Promise.all` over builtins, MCP, and skills, a single failing source (most likely `Skill.all()` hitting the filesystem) suppresses the whole catalog including the unrelated, healthy tool lists.

Please log the error (at least `log.warn`/`log.error`) inside the catch so operators can diagnose missing catalogs, while still keeping the session alive.
   Suggested fix:         .catch((err) => {
          // Never let catalog collection block or crash the session, but
          // surface the failure so a missing event is diagnosable.
          log.error("failed to emit tool_catalog", { error: err instanceof Error ? err.message : String(err) })
        })
3. packages/cli/src/cli/cmd/tool-catalog.ts:90 — Catalog builtins are unfiltered, not the dispatched set
   Detail: `buildToolCatalogItems()` resolves builtins via `ToolRegistry.ids()`, which returns the instance-level superset with NO model/agent filtering. But `resolveTools` (session/prompt.ts) dispatches via `ToolRegistry.tools({modelID, providerID}, agent, sessionID)`, which applies real per-model filters:

- `apply_patch` is only exposed for gpt-* models, and `edit`/`write` are HIDDEDEN for those same models — they are mutually exclusive at dispatch, yet the catalog will always report BOTH.
- `codesearch`/`websearch` are gated on `providerID === "aictrl" || AICTRL_ENABLE_EXA`, but the catalog always lists them.

EVENTS.md overstates the guarantee: it claims the gate "uses `tools[]` to verify that [tools] were actually in the model's function list at dispatch time." That is true for MCP tools (record_finding, etc.), but NOT for the builtin portion. A consumer that structurally checks for `edit` (or absence of `apply_patch`) on a gpt run will get wrong answers.

The author already notes "instance-level superset, without model-specific filtering" in a code comment — so the fix is either (a) pass model+agent into `buildToolCatalogItems` and call `ToolRegistry.tools(...)` so the catalog matches dispatch, or (b) correct EVENTS.md to scope the claim to MCP tools only.
4. packages/cli/src/mcp/index.ts:670-677 — listTools() runs twice per session start
   Detail: `MCP.toolEntries()` calls `client.listTools()` on every connected client at catalog time (before the first turn). Moments later, the first turn's `resolveTools` → `MCP.tools()` calls `client.listTools()` again on the same clients. For sessions with several MCP servers this doubles the round-trips to each server at session start, and for slow/remote MCP servers it adds latency directly on the critical path before the first model turn (the catalog `await` blocks `loop()`).

Consider memoizing the listTools result on the client/state for the session, or having `tools()` produce the entries so the work happens once.
5. packages/cli/src/mcp/index.ts:680-687 — MCP key derivation duplicated in toolEntries() and tools()
   Detail: `MCP.toolEntries()` re-implements the connected-client iteration and `sanitizedClientName + "_" + sanitizedToolName` key derivation that already lives in `MCP.tools()`. The catalog's value depends entirely on those keys matching what `resolveTools` exposes to the model (which comes from `MCP.tools()`'s keys). The two implementations must stay byte-for-byte in sync; if either sanitization changes, the catalog silently stops matching dispatch and the consumer gate produces false negatives/positives.

Extract a single shared helper (e.g. `mcpToolKey(clientName, toolName)` plus a shared `connectedClients()` iterator) so there is one source of truth, or have `tools()` delegate to `toolEntries()` for keying.
📋 Out-of-diff findings (5)
Sev Location Finding
🟡 packages/cli/src/cli/cmd/run.ts:710 Catalog built even when output is not JSON
🟠 packages/cli/src/cli/cmd/run.ts:716-718 Catalog errors swallowed with no logging
🟠 packages/cli/src/cli/cmd/tool-catalog.ts:90 Catalog builtins are unfiltered, not the dispatched set
🟡 packages/cli/src/mcp/index.ts:670-677 listTools() runs twice per session start
🟡 packages/cli/src/mcp/index.ts:680-687 MCP key derivation duplicated in toolEntries() and tools()

Reviewed 6 files · 0 inline · view all 5 findings ↗


aictrl · AI code review for fast-moving teams · aictrl.dev

- run.ts: guard buildToolCatalogItems on args.format==="json" so MCP
  listTools + skill scan are skipped for interactive sessions (#88 🟡)
- run.ts: log catalog build failures in catch instead of silencing them
  so a missing tool_catalog event is diagnosable (#88 🟠)
- mcp/index.ts: extract mcpToolKey() helper shared by tools() and
  toolEntries() — single source of truth for key derivation (#88 🟡)
- EVENTS.md: document that builtin tools[] reflects the instance-level
  superset, not the per-model filtered dispatch set (#88 🟠)
- test: regression tests for error propagation and key sanitization

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ
@byapparov

Copy link
Copy Markdown
Contributor Author

Review response — PR #88

All 5 findings triaged. 2 fixed directly, 1 fixed via EVENTS.md correction, 1 fixed by extracting shared key helper (🟡 duplication), 1 deferred (🟡 double-listTools — substantially mitigated by the JSON-mode guard).

Issues addressed (pushed to this PR)

  • 🟠 Catalog errors swallowed with no loggingpackages/cli/src/cli/cmd/run.ts:715-718: replaced .catch(() => {}) with .catch((err) => { console.error("failed to emit tool_catalog", ...) }). Error is now surfaced so a missing tool_catalog event is diagnosable, while still keeping the session alive. (commit 1a10f2306)

  • 🟠 Catalog builtins are unfiltered, not the dispatched setEVENTS.md: option (b) — corrected the documentation to explicitly state that source:"builtin" entries reflect the instance-level superset and that per-model filters (apply_patch/edit/write gating, codesearch/websearch providerID gate) are applied at dispatch time and not reflected in the catalog. The strong structural guarantee now scopes correctly to MCP tools only. (commit 1a10f2306)

  • 🟡 Catalog built even when output is not JSONpackages/cli/src/cli/cmd/run.ts:711: wrapped the entire buildToolCatalogItems() block in if (args.format === "json"). MCP listTools() round-trips and skill filesystem scan no longer run in interactive (non-JSON) sessions. (commit 1a10f2306)

  • 🟡 MCP key derivation duplicated in toolEntries() and tools()packages/cli/src/mcp/index.ts: extracted mcpToolKey(clientName, toolName) private helper; both tools() and toolEntries() now call it. One source of truth — if the sanitization regex ever changes, both paths update together. (commit 1a10f2306)

Review claims verified false (no change needed)

None — all 5 findings verified TRUE.

Not addressed here

  • 🟡 listTools() runs twice per session startpackages/cli/src/mcp/index.ts:670-677: TRUE but substantially mitigated by the JSON-format guard (fix feat: lean headless flags #3 above). toolEntries() now only runs when args.format==="json", so the double listTools() only affects json-mode sessions. Proper fix (memoizing listTools or having toolEntries derive from tools()) requires more careful coupling of tools()'s error-handling side-effects (s.status[clientName]=failed/delete s.clients[clientName]) with the catalog path. Deferring to a follow-up — the regression impact is now limited to json-mode startup latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Emit resolved tool catalog in session_start NDJSON (tool_catalog event)

1 participant