Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions EVENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,45 @@ Emitted once when the session begins.
- `schemaVersion` (string, **required**) — pinned contract version. Currently `"1"`.
- `permissions` (array, **required**) — the fully resolved permission ruleset for the session (merge of agent + session rules). In headless mode, any permission not matching an `allow` rule is auto-rejected.

### `tool_catalog`
Comment thread
byapparov marked this conversation as resolved.

Emitted once per session, immediately after `session_start` and before the first model turn (present even for zero-turn or immediately-failing runs). Lists every tool and skill that was resolved for the session so consumers can structurally verify tool exposure without string-matching the model's prose.

```json
{
"type": "tool_catalog",
"sessionID": "ses_…",
"timestamp": 1741500000000,
"tools": [
{ "name": "record_finding", "source": "mcp", "server": "aictrl" },
{ "name": "record_review_completed", "source": "mcp", "server": "aictrl" },
{ "name": "bash", "source": "builtin" },
{ "name": "read", "source": "builtin" }
],
"skills": [
{ "name": "code-review", "version": "1.4.0" },
{ "name": "fullstack-code-review", "version": null }
]
}
```

#### `tools[]` — builtin and MCP tools resolved for this session

- **`name`** (string, **required**) — the tool/function name as exposed to the model. Mirrors upstream `ToolListItem.id`. For MCP tools this is `{serverName}_{toolName}` (both sanitised to `[a-zA-Z0-9_-]`).
- **`source`** (string, **required**) — `"builtin"` for tools built into the CLI; `"mcp"` for tools provided by a connected MCP server.
- **`server`** (string, optional) — the MCP server (client name from config) that provides this tool. Present only when `source` is `"mcp"`.

`description` and `parameters` are intentionally omitted to keep the event lean. The primary consumer only needs `name` + `source` to verify tool exposure.

#### `skills[]` — skill packs resolved for this session (separate from `tools[]`)

- **`name`** (string, **required**) — skill name from `SKILL.md` frontmatter.
- **`version`** (string or null, **required**) — version string from the `version` field in `SKILL.md` frontmatter. `null` when no version is declared.

> **Use case:** The server-side completion gate (aictrl-dev/aictrl #3216) uses `tools[]` to verify that `record_finding` and `record_review_completed` were actually in the model's function list at dispatch time, and `skills[]` to record which skill version produced the review. Detects the "silent success" failure mode where a missing MCP server lets a run complete green without ever having the review tools available.
>
> **Note on builtin tool filtering:** The `source: "builtin"` entries in `tools[]` reflect the _instance-level superset_ registered in `ToolRegistry`. Per-model filters (e.g. `apply_patch` only on gpt-*, `codesearch`/`websearch` only for aictrl provider) are applied at dispatch time inside `resolveTools` and are **not** reflected here. Consumers should treat the presence of a builtin tool name as "registered and potentially available", not as "guaranteed to appear in the model's function list". The strong structural guarantee (tool present ↔ tool in dispatch list) applies only to `source: "mcp"` entries.

### `session_complete`

Emitted once when the session ends (success or failure).
Expand Down
30 changes: 30 additions & 0 deletions packages/cli/src/cli/cmd/run.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ import { pathToFileURL } from "bun"
import { UI } from "../ui"
import { cmd } from "./cmd"
import { classifySessionError, SCHEMA_VERSION } from "./run.errors"
import { buildToolCatalogItems } from "./tool-catalog"
import { withTimeout } from "../../util/timeout"
import { Flag } from "../../flag/flag"
import { bootstrap } from "../bootstrap"
import { EOL } from "os"
Expand Down Expand Up @@ -704,6 +706,34 @@ export const RunCommand = cmd({
permissions: PermissionNext.merge(agentInfo.permission, rules),
})

// Emit the resolved tool catalog (builtin + MCP tools) and available
// skills before the first model turn. Enables structural detection of
// "tool was not exposed" failure modes (issue #85).
Comment thread
byapparov marked this conversation as resolved.
// Only build the catalog when output is JSON — emit() is a no-op
Comment thread
byapparov marked this conversation as resolved.
// otherwise, and the catalog construction (MCP listTools + skill scan)
// adds unnecessary latency for interactive sessions.
//
// Bounded by TOOL_CATALOG_TIMEOUT_MS so a hanging MCP server cannot
// indefinitely stall session start. On timeout or error the failure is
// emitted as a structured stdout event (tool_catalog_error) so JSON
// consumers can distinguish "catalog failed" from "catalog succeeded with
// no tools" — stderr is invisible to consumers reading the NDJSON stream.
const TOOL_CATALOG_TIMEOUT_MS = 10_000
if (args.format === "json") {
await withTimeout(buildToolCatalogItems(), TOOL_CATALOG_TIMEOUT_MS)
.then(({ tools, skills }) => {
emit("tool_catalog", { tools, skills })
})
Comment thread
byapparov marked this conversation as resolved.
.catch((err) => {
// Never let catalog collection block or crash the session.
// Emit a structured event so consumers reading stdout can detect
// the failure rather than silently missing the tool_catalog event.
emit("tool_catalog_error", {
error: err instanceof Error ? err.message : String(err),
})
})
}

const loopDone = loop()
.then(() => {
emit("session_complete", {
Expand Down
113 changes: 113 additions & 0 deletions packages/cli/src/cli/cmd/tool-catalog.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
/**
* tool-catalog.ts — builds the resolved tool/skill inventory emitted in the
* `tool_catalog` NDJSON event (issue #85).
*
* Exported as a standalone module so it can be unit-tested independently of
* the full run.ts wiring.
*/
import { ToolRegistry } from "../../tool/registry"
import { MCP } from "../../mcp"
import { Skill } from "../../skill"

/** A single entry in the `tools[]` array of the `tool_catalog` event. */
export interface ToolCatalogEntry {
/** Tool name as exposed to the model. Mirrors upstream ToolListItem.id. */
name: string
/** Whether the tool is a builtin (local) or an MCP tool. */
source: "builtin" | "mcp"
/**
* The MCP server name for source=mcp tools (sanitized client name).
* Omitted for builtin tools.
*/
server?: string
}

/** A single entry in the `skills[]` array of the `tool_catalog` event. */
export interface SkillCatalogEntry {
/** Skill name from SKILL.md frontmatter. */
name: string
/**
* Skill version from SKILL.md frontmatter `version` field.
* null when no version is declared in frontmatter.
*/
version: string | null
}

/** Result of buildToolCatalogItems(). */
export interface ToolCatalogItems {
tools: ToolCatalogEntry[]
skills: SkillCatalogEntry[]
}

/**
* Dependency injection shape, used for testing to avoid touching real MCP
* state and the real filesystem.
*/
export interface ToolCatalogDeps {
/**
* Returns builtin tool IDs (the resolved set for the current instance).
* Defaults to ToolRegistry.ids().
*/
getBuiltinIds?: () => Promise<string[]>

/**
* Returns connected MCP tools as { toolKey, serverName } pairs.
* toolKey = the combined `serverName_toolName` key given to the model.
* Defaults to an internal implementation using MCP.clients() + listTools().
*/
getMcpTools?: () => Promise<{ toolKey: string; serverName: string }[]>

/**
* Returns the list of resolved skills.
* Defaults to Skill.all().
*/
getSkills?: () => Promise<Skill.Info[]>
}

/**
* Collects the resolved tool catalog and skill list for the current session.
*
* Uses ToolRegistry.ids() for builtins (the instance-level superset, without
* model-specific filtering which only applies per-turn inside resolveTools).
* Uses MCP connected clients for MCP tools.
* Uses Skill.all() for skills with optional version from frontmatter.
*/
export async function buildToolCatalogItems(deps: ToolCatalogDeps = {}): Promise<ToolCatalogItems> {
const getBuiltinIds = deps.getBuiltinIds ?? (() => ToolRegistry.ids())
const getMcpTools = deps.getMcpTools ?? defaultGetMcpTools
const getSkills = deps.getSkills ?? (() => Skill.all())

const [builtinIds, mcpTools, skills] = await Promise.all([getBuiltinIds(), getMcpTools(), getSkills()])

const tools: ToolCatalogEntry[] = [
...builtinIds.map(
(name): ToolCatalogEntry => ({
name,
source: "builtin",
}),
),
...mcpTools.map(
({ toolKey, serverName }): ToolCatalogEntry => ({
Comment thread
byapparov marked this conversation as resolved.
name: toolKey,
source: "mcp",
server: serverName,
}),
),
]

const skillEntries: SkillCatalogEntry[] = skills.map((s) => ({
name: s.name,
version: s.version ?? null,
}))

return { tools, skills: skillEntries }
}

/**
* Default MCP tool resolver: delegates to MCP.toolEntries() which queries
* each connected client's tool list and returns { toolKey, serverName } pairs
* (matching the key format used in SessionPrompt.resolveTools).
*/
async function defaultGetMcpTools(): Promise<{ toolKey: string; serverName: string }[]> {
return MCP.toolEntries()
}
55 changes: 52 additions & 3 deletions packages/cli/src/mcp/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -600,6 +600,17 @@ export namespace MCP {
s.status[name] = { status: "disabled" }
}

/**
* Single source of truth for the MCP tool key used to identify a tool
* both in `tools()` (dispatch) and `toolEntries()` (catalog). Both paths
* must produce byte-identical keys so catalog consumers get correct answers.
*/
function mcpToolKey(clientName: string, toolName: string): string {
const sanitizedClientName = clientName.replace(/[^a-zA-Z0-9_-]/g, "_")
const sanitizedToolName = toolName.replace(/[^a-zA-Z0-9_-]/g, "_")
return sanitizedClientName + "_" + sanitizedToolName
}

export async function tools() {
const result: Record<string, Tool> = {}
const s = await state()
Expand Down Expand Up @@ -634,11 +645,49 @@ export namespace MCP {
const entry = isMcpConfigured(mcpConfig) ? mcpConfig : undefined
const timeout = entry?.timeout ?? defaultTimeout
for (const mcpTool of toolsResult.tools) {
const sanitizedClientName = clientName.replace(/[^a-zA-Z0-9_-]/g, "_")
const sanitizedToolName = mcpTool.name.replace(/[^a-zA-Z0-9_-]/g, "_")
result[sanitizedClientName + "_" + sanitizedToolName] = await convertMcpTool(mcpTool, client, timeout)
result[mcpToolKey(clientName, mcpTool.name)] = await convertMcpTool(mcpTool, client, timeout)
}
}
return result
}

/**
* Returns the resolved MCP tool list as lightweight entries for the
* `tool_catalog` NDJSON event (issue #85). Each entry carries the tool key
* (as exposed to the model) and the originating server name.
*
* Uses `mcpToolKey()` — the same helper as `tools()` — so the key format is
* guaranteed identical to what `resolveTools` dispatches to the model.
*/
export async function toolEntries(): Promise<{ toolKey: string; serverName: string }[]> {
const result: { toolKey: string; serverName: string }[] = []
const s = await state()
const clientsSnapshot = await clients()

const connectedClients = Object.entries(clientsSnapshot).filter(
([clientName]) => s.status[clientName]?.status === "connected",
Comment thread
byapparov marked this conversation as resolved.
)

const toolsResults = await Promise.all(
connectedClients.map(async ([clientName, client]) => {
const toolsResult = await client.listTools().catch((e) => {
log.error("failed to get tool entries", { clientName, error: e.message })
return undefined
})
return { clientName, toolsResult }
}),
)
Comment thread
byapparov marked this conversation as resolved.

for (const { clientName, toolsResult } of toolsResults) {
if (!toolsResult) continue
for (const mcpTool of toolsResult.tools) {
result.push({
toolKey: mcpToolKey(clientName, mcpTool.name),
serverName: clientName,
})
Comment thread
byapparov marked this conversation as resolved.
}
}

return result
}

Expand Down
33 changes: 33 additions & 0 deletions packages/cli/src/skill/skill.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,25 @@ export namespace Skill {
description: z.string(),
location: z.string(),
content: z.string(),
/**
* All frontmatter keys beyond `name`/`description`, with values
* string-coerced. Retains arbitrary skill metadata (e.g. version, license,
* author) without requiring per-field changes to the schema.
*
* NOTE: `metadata` and `version` below are TYPE-ONLY fields — they are NOT
* validated by the `Info.pick({ name, description })` Zod parse in `addSkill`.
* Both are populated manually from raw frontmatter after the parse succeeds
* (see the `Object.fromEntries` block below). The schema fields exist solely
* to give `Skill.Info` a correct TypeScript type; runtime coercion is
* intentional to handle e.g. `version: 1.0` (YAML number) → `"1.0"` (string).
*/
metadata: z.record(z.string(), z.string()).default({}),
/**
* Convenience accessor derived from `metadata.version`.
* undefined when the frontmatter has no `version` key.
* TYPE-ONLY — see note on `metadata` above.
*/
version: z.string().optional(),
Comment thread
byapparov marked this conversation as resolved.
})
export type Info = z.infer<typeof Info>

Expand Down Expand Up @@ -79,11 +98,25 @@ export namespace Skill {

dirs.add(path.dirname(match))

// Build metadata from every frontmatter key except name/description,
// string-coercing values (handles YAML numbers like `version: 1.0` → "1.0").
// version is derived from metadata.version as a convenience accessor.
const metadata: Record<string, string> =
typeof md.data === "object" && md.data !== null
? Object.fromEntries(
Object.entries(md.data)
.filter(([k]) => k !== "name" && k !== "description")
.map(([k, v]) => [k, String(v)]),
)
: {}

skills[parsed.data.name] = {
name: parsed.data.name,
description: parsed.data.description,
location: match,
content: md.content,
metadata,
version: metadata.version,
}
}

Expand Down
Loading