diff --git a/.github/workflows/publish-npm-on-release.yml b/.github/workflows/publish-npm-on-release.yml index be62c13..023126a 100644 --- a/.github/workflows/publish-npm-on-release.yml +++ b/.github/workflows/publish-npm-on-release.yml @@ -8,9 +8,9 @@ on: workflow_dispatch: inputs: tag: - description: 'Tag to publish (e.g. v1.6.2)' + description: 'Tag to publish (e.g. v2.2.0)' required: true - default: 'v1.6.2' + default: 'v2.2.0' permissions: contents: read diff --git a/.release-please-manifest.json b/.release-please-manifest.json index d9246dd..a5d1cf2 100644 --- a/.release-please-manifest.json +++ b/.release-please-manifest.json @@ -1,3 +1,3 @@ { - ".": "1.10.0" + ".": "2.2.0" } diff --git a/CHANGELOG.md b/CHANGELOG.md index 6bfbaab..dbff7fc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,18 @@ # Changelog -## Unreleased +## [2.2.0](https://github.com/PatrickSys/codebase-context/compare/v1.10.0...v2.2.0) (2026-04-17) + +### Features + +* relaunch around a bounded conventions map and local-pattern discovery for `map + find` +* add explicit full-map resources while keeping the default first-call map bounded and action-oriented +* align public proof surfaces to the discovery-only benchmark posture (`pending_evidence`, `claimAllowed: false`) + +### Bug Fixes + +* make the packaged README tarball-safe by sending benchmark, demo, motivation, and contributing links to stable GitHub URLs +* quarantine historical v1.8.x launch-planning docs so they no longer read as current release guidance +* stop the built CLI entrypoint from eagerly importing MCP server runtime modules before CLI subcommand dispatch ## [1.10.0](https://github.com/PatrickSys/codebase-context/compare/v1.9.0...v1.10.0) (2026-04-14) diff --git a/README.md b/README.md index 9ec1d4c..92453b4 100644 --- a/README.md +++ b/README.md @@ -1,26 +1,26 @@ # codebase-context -## Stop paying for AI agents to explore your codebase. codebase-context pre-maps the architecture, conventions, and team memory so they don't have to. +## Map your team's conventions before your AI agent starts searching. -[![npm version](https://img.shields.io/npm/v/codebase-context)](https://www.npmjs.com/package/codebase-context) [![license](https://img.shields.io/npm/l/codebase-context)](./LICENSE) [![node](https://img.shields.io/node/v/codebase-context)](./package.json) +[![npm version](https://img.shields.io/npm/v/codebase-context)](https://www.npmjs.com/package/codebase-context) [![license](https://img.shields.io/npm/l/codebase-context)](./LICENSE) [![node](https://img.shields.io/node/v/codebase-context)](https://github.com/PatrickSys/codebase-context/blob/master/package.json) -You're tired of AI agents writing code that 'just works' but fits like a square peg in a round hole - not your conventions, not your architecture, not your repo. Even with well-curated instructions. You correct the agent, it doesn't remember. Next session, same mistakes. +You're tired of AI agents writing code that "just works" but still misses how your team actually builds things. They search too broadly, pick generic examples, and spend tokens exploring before they understand the shape of the repo. -This MCP gives agents _just enough_ context so they match _how_ your team codes, know _why_, and _remember_ every correction. +`codebase-context` changes the first step. Start with a bounded conventions map that shows the architecture, dominant patterns, and strongest local examples. Then search for the exact file, symbol, or workflow you need. Here's what codebase-context does: -**Finds the right context** - Search that doesn't just return code. Each result comes back with analyzed and quantified coding patterns and conventions, related team memories, file relationships, and quality indicators. It knows whether you're looking for a specific file, a concept, or how things wire together - and filters out the noise (test files, configs, old utilities) before the agent sees them. The agent gets curated context, not raw hits. +**Starts with a bounded conventions map** - The first call shows architecture layers, active patterns, golden files, and next calls without dumping vendored repos, fixtures, generated output, or oversized entrypoint lists into the default surface. -**Knows your conventions** - Detected from your code and git history, not only from rules you wrote. Seeks team consensus and direction by adoption percentages and trends (rising/declining), golden files. Tells the difference between code that's _common_ and code that's _current_ - what patterns the team is moving toward and what's being left behind. +**Finds the right local example** - Search does not just return code. Each result comes back with pattern signals, file relationships, and quality indicators so the agent can move from the map to the most relevant local example instead of wandering through raw hits. -**Remembers across sessions** - Decisions, failures, workarounds that look wrong but exist for a reason - the battle scars that aren't in the comments. Recorded once, surfaced automatically so the agent doesn't "clean up" something you spent a week getting right. Conventional git commits (`refactor:`, `migrate:`, `fix:`) auto-extract into memory with zero effort. Stale memories decay and get flagged instead of blindly trusted. +**Knows what is current** - Conventions are detected from your code and git history, not only from rules you wrote. The map distinguishes what is common from what is rising or declining, and points at the files that best represent the current direction. -**Checks before editing** - Before editing something, you get a decision card showing whether there's enough evidence to proceed. If a symbol has four callers and only two appear in your search results, the card shows that coverage gap. If coverage is low, `whatWouldHelp` lists the specific searches to run before you touch anything. +**Adds support signals when you need them** - Team memory and edit-readiness checks stay available, but as supporting context after the map and search have already narrowed the work. -One tool call returns all of it. Local-first - your code never leaves your machine by default. +Map first, search second, local-first throughout. Your code never leaves your machine by default. -See the [current discovery benchmark](./docs/benchmark.md) for the checked-in proof results and current gate truth. +See the [current discovery benchmark](https://github.com/PatrickSys/codebase-context/blob/master/docs/benchmark.md) for the checked-in discovery-only proof. The gate is still `pending_evidence`, and `claimAllowed` remains `false`. ### What it looks like @@ -38,7 +38,7 @@ This is the part most tools miss: what the team is doing now, what it is moving When the agent searches with edit intent, it gets a compact decision card: confidence, whether it's safe to proceed, which patterns apply, the best example, and which files are likely to be affected. -More CLI examples in [`docs/cli.md`](./docs/cli.md). Full walkthrough: [`docs/demo.md`](./docs/demo.md). +More CLI examples in [`docs/cli.md`](./docs/cli.md). Full walkthrough: [demo.md on GitHub](https://github.com/PatrickSys/codebase-context/blob/master/docs/demo.md). ## Quick Start @@ -71,7 +71,7 @@ Full per-client setup, HTTP server instructions, and local build testing: [`docs ## First Use -Get a conventions map of your codebase before exploring or searching: +Get a conventions map of your codebase before exploring or editing: ```bash # See your codebase conventions — architecture layers, patterns, golden files @@ -85,20 +85,20 @@ Your AI agent uses the same map via the `codebase://context` MCP resource on fir ## Common First Commands -Three commands to get what usually takes a new developer weeks to piece together: +Three commands to understand a repo before you edit it: ```bash -# What tech stack, architecture, and file count? -npx -y codebase-context metadata +# What are the main conventions and best examples? +npx -y codebase-context map -# What does the team actually code like right now? -npx -y codebase-context patterns +# Then search for the local example you need +npx -y codebase-context search --query "auth middleware" -# What team decisions were made (and why)? -npx -y codebase-context memory list +# What patterns is the team actually using right now? +npx -y codebase-context patterns ``` -This is also what your AI agent consumes automatically via MCP tools; the CLI is the human-readable version. +This is also what your AI agent consumes automatically via MCP tools; the CLI is the human-readable version of the same map-plus-search flow. ## What it does @@ -224,14 +224,14 @@ These are the behaviors that make the most difference day-to-day. Copy, trim wha ## Links -- [Benchmark](./docs/benchmark.md) — current discovery suite results and gate truth -- [Demo](./docs/demo.md) — real CLI walkthrough +- [Benchmark](https://github.com/PatrickSys/codebase-context/blob/master/docs/benchmark.md) — current discovery suite results and gate truth +- [Demo](https://github.com/PatrickSys/codebase-context/blob/master/docs/demo.md) — real CLI walkthrough - [Client Setup](./docs/client-setup.md) — per-client config, HTTP setup, local build testing - [Capabilities Reference](./docs/capabilities.md) — tool API, retrieval pipeline, decision card schema - [CLI Gallery](./docs/cli.md) — formatted command output examples -- [Motivation](./MOTIVATION.md) — research and design rationale -- [Contributing](./CONTRIBUTING.md) — dev setup and eval harness -- [Changelog](./CHANGELOG.md) +- [Motivation](https://github.com/PatrickSys/codebase-context/blob/master/MOTIVATION.md) — research and design rationale +- [Contributing](https://github.com/PatrickSys/codebase-context/blob/master/CONTRIBUTING.md) — dev setup and eval harness +- [Changelog](https://github.com/PatrickSys/codebase-context/blob/master/CHANGELOG.md) ## License diff --git a/docs/benchmark.md b/docs/benchmark.md index 4b9a5e8..d48f807 100644 --- a/docs/benchmark.md +++ b/docs/benchmark.md @@ -37,7 +37,7 @@ From `results/gate-evaluation.json`: - `claimAllowed`: `false` - `totalTasks`: `24` - `averageUsefulness`: `0.75` -- `averageEstimatedTokens`: `1822.25` +- `averageEstimatedTokens`: `1827.0833` - `bestExampleUsefulnessRate`: `0.125` Repo-level outputs from the same rerun: @@ -53,8 +53,10 @@ The gate is intentionally still blocked. - The combined suite covers both public repos. - `claimAllowed` remains `false` because comparator evidence still does not support a benchmark-win claim. -- Two comparator lanes now return `status: "ok"`, but both are effectively near-empty on the frozen tasks and contribute `0` average usefulness. -- Three comparator lanes still fail setup entirely. +- Two comparator artifacts now return `status: "ok"`, but that does not yet close the gate: + - `raw Claude Code` still leaves the baseline `pending_evidence` because `averageFirstRelevantHit` is `null` + - `codebase-memory-mcp` now has real current metrics, but the gate still marks it `failed` on the frozen tolerance rule +- Three comparator lanes still fail setup entirely: `GrepAI`, `jCodeMunch`, and `CodeGraphContext`. ## Comparator Reality @@ -62,11 +64,11 @@ The current comparator artifact records incomplete comparator evidence, not benc | Comparator | Status | Current reason | | --- | --- | --- | -| `codebase-memory-mcp` | `ok` | Runs, but the checked-in artifact still averages `0` usefulness and `5` estimated tokens per task, so it does not yet contribute meaningful benchmark evidence | +| `codebase-memory-mcp` | comparator artifact: `ok`; gate: `failed` | Runs through the repaired graph-backed path and now records real metrics (`averageUsefulness: 0.1875`, `averageFirstRelevantHit: 1.2857`, `bestExampleUsefulnessRate: 0.5`), but the frozen gate still fails it on the required usefulness comparisons | | `jCodeMunch` | `setup_failed` | `MCP error -32000: Connection closed` | | `GrepAI` | `setup_failed` | Local Go binary and Ollama model path not present | | `CodeGraphContext` | `setup_failed` | `MCP error -32000: Connection closed` | -| `raw Claude Code` | `ok` | Runs, but the checked-in artifact still averages `0` usefulness and only `18.5` estimated tokens per task, so it does not yet contribute meaningful benchmark evidence | +| `raw Claude Code` | comparator artifact: `ok`; gate: `pending_evidence` | The explicit Haiku CLI runner now returns current metrics (`averageUsefulness: 0.0278`, `averageEstimatedTokens: 32.1667`), but the baseline still lacks `averageFirstRelevantHit`, so the gate keeps this lane as missing evidence | `CodeGraphContext` remains part of the frozen comparison frame. It is not omitted from the public story just because the lane still fails to start. @@ -74,9 +76,9 @@ The current comparator artifact records incomplete comparator evidence, not benc - This benchmark measures discovery usefulness and payload cost only. - It does not measure implementation correctness, patch quality, or end-to-end task completion. -- Comparator setup remains environment-sensitive, and the checked-in comparator outputs are still too weak to justify a claim. +- Comparator setup remains environment-sensitive, and the checked-in comparator outputs still do not satisfy the frozen claim gate. - The reranker cache is currently corrupted on this machine. During the proof rerun, search fell back to original ordering after `Protobuf parsing failed` while still completing the harness. -- `averageFirstRelevantHit` remains `null` in the current gate output because this compact response surface does not expose a comparable ranked-hit metric across the incomplete comparator set. +- `averageFirstRelevantHit` remains `null` in the current gate output, which is enough to keep the raw-Claude baseline in `pending_evidence`. ## What This Proof Can Support diff --git a/docs/capabilities.md b/docs/capabilities.md index 03db7ca..9085a07 100644 --- a/docs/capabilities.md +++ b/docs/capabilities.md @@ -1,6 +1,6 @@ # Capabilities Reference -Technical reference for what `codebase-context` ships today. For the user-facing overview, see [README.md](../README.md). +Technical reference for what `codebase-context` ships today. The public product posture is map first, find second: the bounded conventions map is the first-call surface, and search narrows to the right local example after that. For the user-facing overview, see [README.md](../README.md). ## Transport Modes @@ -298,6 +298,7 @@ Reproducible evaluation is shipped as a CLI entrypoint backed by shared scoring/ - **Retrieval metrics:** Top-1 accuracy, Top-3 recall, spec contamination rate, and a gate pass/fail - **Discovery metrics:** usefulness score, payload bytes, estimated tokens, first relevant hit, and best-example usefulness - **Discovery gate:** discovery mode evaluates the frozen ship gate only when the full public suite and comparator metrics are available; missing comparator evidence is reported as pending, not silently treated as pass/fail +- **Current checked-in gate truth:** `results/gate-evaluation.json` remains `pending_evidence` with `claimAllowed: false`; the raw-Claude baseline still lacks `averageFirstRelevantHit`, `codebase-memory-mcp` still fails the frozen usefulness comparisons, and the remaining named lanes are still `setup_failed` - **Limits:** discovery mode is discovery-only, uses current shipped surfaces only, and does not claim implementation quality; named competitor runs remain a documented hybrid/manual lane rather than a built-in automated benchmark ## Limitations diff --git a/docs/cli.md b/docs/cli.md index d50814d..5025f65 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -1,8 +1,9 @@ # CLI Gallery (Human-readable) -`codebase-context` exposes its tools as a local CLI so humans can: +`codebase-context` exposes its tools as a local CLI so humans can follow the same map-first workflow the MCP server gives to agents: -- Get the conventions map before exploring or editing (`map`) +- Get the bounded conventions map before exploring or editing (`map`) +- Search for the right local example after the map narrows the repo shape - Onboard themselves onto an unfamiliar repo - Debug what the MCP server is doing - Use outputs in CI/scripts (via `--json`) @@ -50,7 +51,7 @@ CODEBASE_CONTEXT_ASCII=1 npx -y codebase-context patterns npx -y codebase-context map ``` -The conventions map — run this first on an unfamiliar repo. Shows architecture layers, active patterns with adoption rates and trend direction, and the golden files the team treats as the strongest examples. This is also what the MCP server delivers to AI agents via the `codebase://context` resource on first call. +The conventions map - run this first on an unfamiliar repo. It shows architecture layers, active patterns with adoption rates and trend direction, and the golden files the team treats as the strongest examples. This is also what the MCP server delivers to AI agents via the `codebase://context` resource on first call, before search narrows to a specific local example. Example output (truncated): diff --git a/docs/client-setup.md b/docs/client-setup.md index 23304e6..032b0cf 100644 --- a/docs/client-setup.md +++ b/docs/client-setup.md @@ -1,6 +1,6 @@ # Client Setup -Full setup instructions for each AI client. For the quick-start summary, see [README.md](../README.md). +Full setup instructions for each AI client. This guide is about transport and wiring, not a different product mode: each client gets the same bounded conventions map first and local-pattern discovery second. For the quick-start summary, see [README.md](../README.md). ## Transport modes diff --git a/docs/comparison-table.md b/docs/comparison-table.md index 2d30c95..aee2eaa 100644 --- a/docs/comparison-table.md +++ b/docs/comparison-table.md @@ -5,17 +5,19 @@ It is a setup-status table first, not a marketing scoreboard. | Comparator | Intended role in gate | Current status | Evidence summary | | --- | --- | --- | --- | -| `raw Claude Code` | Baseline for payload cost and at least one usefulness comparison | `setup_failed` | The local `claude` CLI baseline is unavailable in this environment, so the gate records missing baseline metrics. | +| `raw Claude Code` | Baseline for payload cost and at least one usefulness comparison | comparator artifact: `ok`; gate: `pending_evidence` | The Haiku-backed Claude CLI runner now returns current payloads, but the checked-in baseline still has `averageFirstRelevantHit: null`, so the gate still records missing baseline metrics. | | `GrepAI` | Named MCP comparator | `setup_failed` | Requires the GrepAI binary plus a local Ollama embedding setup that is not present in this proof environment. | | `jCodeMunch` | Named MCP comparator | `setup_failed` | The MCP server still closes on startup during the current rerun, so no comparable discovery metrics were produced. | -| `codebase-memory-mcp` | Named MCP comparator | `setup_failed` | The documented install path still depends on the external shell installer instead of a working local benchmark path. | +| `codebase-memory-mcp` | Named MCP comparator | comparator artifact: `ok`; gate: `failed` | The repaired graph-backed runner now produces real current metrics, but the frozen gate still fails this lane because `codebase-context` does not stay within tolerance on every required usefulness metric. | | `CodeGraphContext` | Graph-native comparator in the relaunch frame | `setup_failed` | The MCP server still closes on startup during the current rerun, so this lane remains missing evidence. | ## Reading This Table - `setup_failed` means the lane was attempted and did not reach a credible metric-producing state. +- `pending_evidence` in the gate means the lane is still missing one or more required metrics. +- `failed` in the gate means the lane has real metrics, but the frozen comparison rule still does not pass. - A missing metric is not treated as a win for `codebase-context`. -- The combined gate in `results/gate-evaluation.json` remains `pending_evidence` until these lanes produce real metrics. +- The combined gate in `results/gate-evaluation.json` remains `pending_evidence`, and `claimAllowed` stays `false`, until these lanes produce real metrics. ## Current codebase-context result @@ -25,8 +27,8 @@ For reference, the current combined discovery output across `angular-spotify` an | --- | ---: | | `totalTasks` | 24 | | `averageUsefulness` | 0.75 | -| `averagePayloadBytes` | 3613.6667 | -| `averageEstimatedTokens` | 903.7083 | +| `averagePayloadBytes` | 7306.4583 | +| `averageEstimatedTokens` | 1827.0833 | | `bestExampleUsefulnessRate` | 0.125 | | `gate.status` | `pending_evidence` | diff --git a/docs/demo.md b/docs/demo.md index 8c6285e..825b2ce 100644 --- a/docs/demo.md +++ b/docs/demo.md @@ -1,7 +1,7 @@ # Demo Script This walkthrough uses real CLI output captured against `repos/angular-spotify` during the Phase 10 proof rerun. -Run it from the repo root with `CODEBASE_ROOT` pointed at the frozen sample repo. +Run it from the repo root with `CODEBASE_ROOT` pointed at the frozen sample repo. The public flow is simple: start with the conventions map, then search for the local example you need. ## 1. Start With The Conventions Map @@ -75,7 +75,7 @@ Captured output excerpt: What this shows: -- Search remains the second step after the map. +- Search is the second step after the map, not a separate headline workflow. - `intent=edit` adds preflight evidence instead of forcing a separate call. - The response stays compact while still surfacing a best example and impact hints. @@ -117,4 +117,4 @@ What this shows: ## Caveats - These excerpts were captured from the current local proof run and will change if the frozen sample repo or index state changes. -- The benchmark gate is still `pending_evidence`, so this walkthrough demonstrates shipped behavior, not a released performance claim. +- The discovery benchmark gate is still `pending_evidence`, and `claimAllowed` remains `false`, so this walkthrough demonstrates shipped behavior, not a released performance claim. diff --git a/docs/registry-sync-checklist.md b/docs/registry-sync-checklist.md index 0a1d5a9..5cb5a59 100644 --- a/docs/registry-sync-checklist.md +++ b/docs/registry-sync-checklist.md @@ -1,6 +1,6 @@ # Registry Sync Checklist -Use this checklist before publishing any Phase 10-facing metadata or registry copy. +Use this checklist before publishing any relaunch-facing metadata or registry copy. The purpose is to keep the public surface aligned with the current proof bundle. ## Required Artifacts @@ -23,9 +23,12 @@ The purpose is to keep the public surface aligned with the current proof bundle. ## Required Truth Checks - If the gate is `pending_evidence`, say so explicitly. +- If the raw-Claude baseline still has `averageFirstRelevantHit: null`, say the baseline remains `pending_evidence`. +- If `codebase-memory-mcp` still reads comparator artifact `ok` but gate `failed`, say so explicitly. - If any comparator lane is `setup_failed`, say so explicitly. - Do not claim benchmark wins against `raw Claude Code`, `GrepAI`, `jCodeMunch`, `codebase-memory-mcp`, or `CodeGraphContext` without real metrics in `results/comparator-evidence.json`. - Do not claim implementation quality from this discovery benchmark. +- Do not turn this discovery-only proof into relaunch-release, risky-edit, or patch-quality proof language. - Do not omit the current reranker fallback limitation if the proof run still shows `Protobuf parsing failed`. ## Before Registry Or README Updates diff --git a/package.json b/package.json index 68780ad..20ad3c6 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { "name": "codebase-context", - "version": "1.10.0", - "description": "Pre-maps your codebase architecture, conventions, and team memory so AI agents navigate with precision instead of exploring. Local-first MCP server with AST-backed hybrid search.", + "version": "2.2.0", + "description": "Bounded conventions map and local-pattern discovery for AI coding agents. Local-first MCP server with AST-backed hybrid search.", "type": "module", "main": "./dist/lib.js", "types": "./dist/lib.d.ts", @@ -61,9 +61,9 @@ "mcp-server", "model-context-protocol", "codebase-context", - "code-intelligence", "code-patterns", "team-conventions", + "conventions-map", "pattern-detection", "semantic-search", "vector-search", @@ -76,8 +76,6 @@ "local-first", "privacy-first", "embeddings", - "preflight", - "evidence-scoring", "golden-files", "ai-coding", "ai-agents", @@ -93,9 +91,7 @@ "developer-tools", "static-analysis", "code-quality", - "team-memory", - "code-search", - "codebase-intelligence" + "code-search" ], "repository": { "type": "git", diff --git a/results/comparator-evidence.json b/results/comparator-evidence.json index ac47bfe..d7ce191 100644 --- a/results/comparator-evidence.json +++ b/results/comparator-evidence.json @@ -1,12 +1,12 @@ { "codebase-memory-mcp": { - "averageUsefulness": 0, - "averagePayloadBytes": 19, - "averageEstimatedTokens": 5, - "averageFirstRelevantHit": null, - "bestExampleUsefulnessRate": null, + "averageUsefulness": 0.1875, + "averagePayloadBytes": 2151.0416666666665, + "averageEstimatedTokens": 538.2916666666666, + "averageFirstRelevantHit": 1.2857142857142858, + "bestExampleUsefulnessRate": 0.5, "averageToolCallCount": 1, - "averageElapsedMs": 0.3333333333333333, + "averageElapsedMs": 161.375, "status": "ok", "taskResults": [ { @@ -20,10 +20,10 @@ "patterns", "generated:" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 938, + "estimatedTokens": 235, "toolCallCount": 1, - "elapsedMs": 2 + "elapsedMs": 17 }, { "taskId": "as-map-02", @@ -36,10 +36,10 @@ "architecture", "statistics" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 938, + "estimatedTokens": 235, "toolCallCount": 1, - "elapsedMs": 1 + "elapsedMs": 3 }, { "taskId": "as-map-03", @@ -52,10 +52,10 @@ "patterns", "libraries actually used" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 938, + "estimatedTokens": 235, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 2 }, { "taskId": "as-map-04", @@ -67,10 +67,10 @@ "import aliases", "tsconfig" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 938, + "estimatedTokens": 235, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 3 }, { "taskId": "as-find-01", @@ -81,10 +81,11 @@ "missingSignals": [ "dependencyInjection" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 62, + "estimatedTokens": 16, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 55, + "bestExampleUseful": false }, { "taskId": "as-find-02", @@ -95,10 +96,11 @@ "missingSignals": [ "stateManagement" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 3450, + "estimatedTokens": 863, "toolCallCount": 1, - "elapsedMs": 1 + "elapsedMs": 45, + "bestExampleUseful": false }, { "taskId": "as-find-03", @@ -111,10 +113,11 @@ "bestExample", "patterns" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 3395, + "estimatedTokens": 849, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 30, + "bestExampleUseful": true }, { "taskId": "as-find-04", @@ -126,70 +129,78 @@ "unitTestFramework", "test" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 2067, + "estimatedTokens": 517, "toolCallCount": 1, - "elapsedMs": 1 + "elapsedMs": 30, + "bestExampleUseful": false }, { "taskId": "as-search-01", "job": "search", "surface": "search_codebase", - "usefulnessScore": 0, - "matchedSignals": [], + "usefulnessScore": 0.5, + "matchedSignals": [ + "results" + ], "missingSignals": [ - "results", "searchQuality" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 3430, + "estimatedTokens": 858, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 25, + "firstRelevantHit": 1 }, { "taskId": "as-search-02", "job": "search", "surface": "search_codebase", - "usefulnessScore": 0, - "matchedSignals": [], + "usefulnessScore": 0.5, + "matchedSignals": [ + "results" + ], "missingSignals": [ - "results", "searchQuality" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 3421, + "estimatedTokens": 856, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 69 }, { "taskId": "as-search-03", "job": "search", "surface": "search_codebase", - "usefulnessScore": 0, - "matchedSignals": [], + "usefulnessScore": 0.5, + "matchedSignals": [ + "results" + ], "missingSignals": [ - "results", "searchQuality" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 3389, + "estimatedTokens": 848, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 39, + "firstRelevantHit": 3 }, { "taskId": "as-search-04", "job": "search", "surface": "search_codebase", - "usefulnessScore": 0, - "matchedSignals": [], + "usefulnessScore": 0.5, + "matchedSignals": [ + "results" + ], "missingSignals": [ - "results", "searchQuality" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 3535, + "estimatedTokens": 884, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 29, + "firstRelevantHit": 1 }, { "taskId": "ex-map-01", @@ -202,10 +213,10 @@ "architecture", "statistics" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 1009, + "estimatedTokens": 253, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 68 }, { "taskId": "ex-map-02", @@ -218,10 +229,10 @@ "libraries actually used", "patterns" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 1009, + "estimatedTokens": 253, "toolCallCount": 1, - "elapsedMs": 1 + "elapsedMs": 15 }, { "taskId": "ex-map-03", @@ -233,10 +244,10 @@ "import aliases", "tsconfig" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 1009, + "estimatedTokens": 253, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 14 }, { "taskId": "ex-map-04", @@ -249,10 +260,10 @@ "libraries actually used", "generated:" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 1009, + "estimatedTokens": 253, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 15 }, { "taskId": "ex-find-01", @@ -263,10 +274,11 @@ "missingSignals": [ "stateManagement" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 2959, + "estimatedTokens": 740, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 505, + "bestExampleUseful": false }, { "taskId": "ex-find-02", @@ -279,25 +291,28 @@ "bestExample", "patterns" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 2709, + "estimatedTokens": 678, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 288, + "bestExampleUseful": true }, { "taskId": "ex-find-03", "job": "find", "surface": "get_team_patterns", - "usefulnessScore": 0, - "matchedSignals": [], + "usefulnessScore": 0.5, + "matchedSignals": [ + "test" + ], "missingSignals": [ - "test", "framework" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 343, + "estimatedTokens": 86, "toolCallCount": 1, - "elapsedMs": 1 + "elapsedMs": 291, + "bestExampleUseful": true }, { "taskId": "ex-find-04", @@ -308,70 +323,79 @@ "missingSignals": [ "dependencyInjection" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 3051, + "estimatedTokens": 763, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 457, + "bestExampleUseful": true }, { "taskId": "ex-search-01", "job": "search", "surface": "search_codebase", - "usefulnessScore": 0, - "matchedSignals": [], + "usefulnessScore": 0.5, + "matchedSignals": [ + "results" + ], "missingSignals": [ - "results", "searchQuality" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 3009, + "estimatedTokens": 753, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 527, + "firstRelevantHit": 1 }, { "taskId": "ex-search-02", "job": "search", "surface": "search_codebase", - "usefulnessScore": 0, - "matchedSignals": [], + "usefulnessScore": 0.5, + "matchedSignals": [ + "results" + ], "missingSignals": [ - "results", "searchQuality" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 2867, + "estimatedTokens": 717, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 553, + "firstRelevantHit": 1 }, { "taskId": "ex-search-03", "job": "search", "surface": "search_codebase", - "usefulnessScore": 0, - "matchedSignals": [], + "usefulnessScore": 0.5, + "matchedSignals": [ + "results" + ], "missingSignals": [ - "results", "searchQuality" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 3061, + "estimatedTokens": 766, "toolCallCount": 1, - "elapsedMs": 1 + "elapsedMs": 418, + "firstRelevantHit": 1 }, { "taskId": "ex-search-04", "job": "search", "surface": "search_codebase", - "usefulnessScore": 0, - "matchedSignals": [], + "usefulnessScore": 0.5, + "matchedSignals": [ + "results" + ], "missingSignals": [ - "results", "searchQuality" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 3089, + "estimatedTokens": 773, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 375, + "firstRelevantHit": 1 } ] }, @@ -388,13 +412,13 @@ "reason": "MCP error -32000: Connection closed" }, "raw Claude Code": { - "averageUsefulness": 0, - "averagePayloadBytes": 71.54166666666667, - "averageEstimatedTokens": 18.5, + "averageUsefulness": 0.027777777777777776, + "averagePayloadBytes": 127.54166666666667, + "averageEstimatedTokens": 32.166666666666664, "averageFirstRelevantHit": null, - "bestExampleUsefulnessRate": null, + "bestExampleUsefulnessRate": 0, "averageToolCallCount": null, - "averageElapsedMs": 9590.208333333334, + "averageElapsedMs": 8270.875, "status": "ok", "taskResults": [ { @@ -408,26 +432,27 @@ "patterns", "generated:" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 1082, + "estimatedTokens": 271, "toolCallCount": null, - "elapsedMs": 12461 + "elapsedMs": 23238 }, { "taskId": "as-map-02", "job": "map", "surface": "get_codebase_metadata", - "usefulnessScore": 0, - "matchedSignals": [], - "missingSignals": [ + "usefulnessScore": 0.6666666666666666, + "matchedSignals": [ "framework", - "architecture", + "architecture" + ], + "missingSignals": [ "statistics" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 857, + "estimatedTokens": 215, "toolCallCount": null, - "elapsedMs": 9390 + "elapsedMs": 19844 }, { "taskId": "as-map-03", @@ -440,10 +465,10 @@ "patterns", "libraries actually used" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 9836 + "elapsedMs": 35243 }, { "taskId": "as-map-04", @@ -455,10 +480,10 @@ "import aliases", "tsconfig" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 10098 + "elapsedMs": 5566 }, { "taskId": "as-find-01", @@ -469,10 +494,11 @@ "missingSignals": [ "dependencyInjection" ], - "payloadBytes": 70, - "estimatedTokens": 18, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8937 + "elapsedMs": 5474, + "bestExampleUseful": false }, { "taskId": "as-find-02", @@ -483,10 +509,11 @@ "missingSignals": [ "stateManagement" ], - "payloadBytes": 75, - "estimatedTokens": 19, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8747 + "elapsedMs": 5550, + "bestExampleUseful": false }, { "taskId": "as-find-03", @@ -499,10 +526,11 @@ "bestExample", "patterns" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8747 + "elapsedMs": 5919, + "bestExampleUseful": false }, { "taskId": "as-find-04", @@ -514,10 +542,11 @@ "unitTestFramework", "test" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 9351 + "elapsedMs": 5777, + "bestExampleUseful": false }, { "taskId": "as-search-01", @@ -529,10 +558,10 @@ "results", "searchQuality" ], - "payloadBytes": 73, - "estimatedTokens": 19, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 9376 + "elapsedMs": 5595 }, { "taskId": "as-search-02", @@ -544,10 +573,10 @@ "results", "searchQuality" ], - "payloadBytes": 70, - "estimatedTokens": 18, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 9891 + "elapsedMs": 5690 }, { "taskId": "as-search-03", @@ -559,10 +588,10 @@ "results", "searchQuality" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 11377 + "elapsedMs": 5911 }, { "taskId": "as-search-04", @@ -574,10 +603,10 @@ "results", "searchQuality" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8972 + "elapsedMs": 5507 }, { "taskId": "ex-map-01", @@ -590,10 +619,10 @@ "architecture", "statistics" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 10195 + "elapsedMs": 6352 }, { "taskId": "ex-map-02", @@ -606,10 +635,10 @@ "libraries actually used", "patterns" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8753 + "elapsedMs": 5791 }, { "taskId": "ex-map-03", @@ -621,10 +650,10 @@ "import aliases", "tsconfig" ], - "payloadBytes": 71, - "estimatedTokens": 18, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8860 + "elapsedMs": 6010 }, { "taskId": "ex-map-04", @@ -637,10 +666,10 @@ "libraries actually used", "generated:" ], - "payloadBytes": 75, - "estimatedTokens": 19, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8623 + "elapsedMs": 5791 }, { "taskId": "ex-find-01", @@ -651,10 +680,11 @@ "missingSignals": [ "stateManagement" ], - "payloadBytes": 150, - "estimatedTokens": 38, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 12098 + "elapsedMs": 5813, + "bestExampleUseful": false }, { "taskId": "ex-find-02", @@ -667,10 +697,11 @@ "bestExample", "patterns" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8783 + "elapsedMs": 5574, + "bestExampleUseful": false }, { "taskId": "ex-find-03", @@ -682,10 +713,11 @@ "test", "framework" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8785 + "elapsedMs": 5672, + "bestExampleUseful": false }, { "taskId": "ex-find-04", @@ -696,10 +728,11 @@ "missingSignals": [ "dependencyInjection" ], - "payloadBytes": 83, - "estimatedTokens": 21, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8912 + "elapsedMs": 5809, + "bestExampleUseful": false }, { "taskId": "ex-search-01", @@ -711,10 +744,10 @@ "results", "searchQuality" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8043 + "elapsedMs": 5590 }, { "taskId": "ex-search-02", @@ -726,10 +759,10 @@ "results", "searchQuality" ], - "payloadBytes": 75, - "estimatedTokens": 19, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8755 + "elapsedMs": 5591 }, { "taskId": "ex-search-03", @@ -741,10 +774,10 @@ "results", "searchQuality" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 12373 + "elapsedMs": 5653 }, { "taskId": "ex-search-04", @@ -756,11 +789,11 @@ "results", "searchQuality" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 51, + "estimatedTokens": 13, "toolCallCount": null, - "elapsedMs": 8802 + "elapsedMs": 5541 } ] } -} \ No newline at end of file +} diff --git a/results/gate-evaluation.json b/results/gate-evaluation.json index d3e5788..59ccbf9 100644 --- a/results/gate-evaluation.json +++ b/results/gate-evaluation.json @@ -1,14 +1,14 @@ { "totalTasks": 24, "averageUsefulness": 0.75, - "averagePayloadBytes": 7287.625, - "averageEstimatedTokens": 1822.25, + "averagePayloadBytes": 7306.458333333333, + "averageEstimatedTokens": 1827.0833333333333, "searchTasks": 8, "findTasks": 8, "mapTasks": 8, "averageFirstRelevantHit": null, "bestExampleUsefulnessRate": 0.125, - "averageElapsedMs": 546.75, + "averageElapsedMs": 473.5833333333333, "averageToolCallCount": 1, "results": [ { @@ -25,9 +25,9 @@ "generated:" ], "forbiddenHits": [], - "payloadBytes": 23720, - "estimatedTokens": 5930, - "elapsedMs": 74, + "payloadBytes": 23694, + "estimatedTokens": 5924, + "elapsedMs": 30, "toolCallCount": 1 }, { @@ -43,9 +43,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 5751, - "estimatedTokens": 1438, - "elapsedMs": 29, + "payloadBytes": 5906, + "estimatedTokens": 1477, + "elapsedMs": 25, "toolCallCount": 1 }, { @@ -62,9 +62,9 @@ "libraries actually used" ], "forbiddenHits": [], - "payloadBytes": 23720, - "estimatedTokens": 5930, - "elapsedMs": 18, + "payloadBytes": 23694, + "estimatedTokens": 5924, + "elapsedMs": 15, "toolCallCount": 1 }, { @@ -79,9 +79,9 @@ "tsconfig" ], "forbiddenHits": [], - "payloadBytes": 23720, - "estimatedTokens": 5930, - "elapsedMs": 13, + "payloadBytes": 23694, + "estimatedTokens": 5924, + "elapsedMs": 11, "toolCallCount": 1 }, { @@ -98,7 +98,7 @@ "payloadBytes": 1802, "estimatedTokens": 451, "bestExampleUseful": true, - "elapsedMs": 4, + "elapsedMs": 2, "toolCallCount": 1 }, { @@ -115,7 +115,7 @@ "payloadBytes": 1727, "estimatedTokens": 432, "bestExampleUseful": false, - "elapsedMs": 2, + "elapsedMs": 1, "toolCallCount": 1 }, { @@ -131,10 +131,10 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 4960, - "estimatedTokens": 1240, + "payloadBytes": 5095, + "estimatedTokens": 1274, "bestExampleUseful": false, - "elapsedMs": 6310, + "elapsedMs": 1981, "toolCallCount": 1 }, { @@ -167,9 +167,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 3695, - "estimatedTokens": 924, - "elapsedMs": 130, + "payloadBytes": 3721, + "estimatedTokens": 931, + "elapsedMs": 123, "toolCallCount": 1 }, { @@ -184,9 +184,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 4627, - "estimatedTokens": 1157, - "elapsedMs": 378, + "payloadBytes": 4654, + "estimatedTokens": 1164, + "elapsedMs": 1817, "toolCallCount": 1 }, { @@ -201,9 +201,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 3981, - "estimatedTokens": 996, - "elapsedMs": 303, + "payloadBytes": 4008, + "estimatedTokens": 1002, + "elapsedMs": 371, "toolCallCount": 1 }, { @@ -218,9 +218,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 4402, - "estimatedTokens": 1101, - "elapsedMs": 187, + "payloadBytes": 4429, + "estimatedTokens": 1108, + "elapsedMs": 215, "toolCallCount": 1 }, { @@ -239,7 +239,7 @@ "forbiddenHits": [], "payloadBytes": 4268, "estimatedTokens": 1067, - "elapsedMs": 148, + "elapsedMs": 100, "toolCallCount": 1 }, { @@ -258,7 +258,7 @@ "forbiddenHits": [], "payloadBytes": 15329, "estimatedTokens": 3833, - "elapsedMs": 63, + "elapsedMs": 79, "toolCallCount": 1 }, { @@ -275,7 +275,7 @@ "forbiddenHits": [], "payloadBytes": 15329, "estimatedTokens": 3833, - "elapsedMs": 52, + "elapsedMs": 64, "toolCallCount": 1 }, { @@ -294,7 +294,7 @@ "forbiddenHits": [], "payloadBytes": 15329, "estimatedTokens": 3833, - "elapsedMs": 48, + "elapsedMs": 64, "toolCallCount": 1 }, { @@ -311,7 +311,7 @@ "payloadBytes": 298, "estimatedTokens": 75, "bestExampleUseful": false, - "elapsedMs": 3, + "elapsedMs": 4, "toolCallCount": 1 }, { @@ -328,10 +328,10 @@ "bestExample" ], "forbiddenHits": [], - "payloadBytes": 4570, - "estimatedTokens": 1143, + "payloadBytes": 4597, + "estimatedTokens": 1150, "bestExampleUseful": false, - "elapsedMs": 1018, + "elapsedMs": 1323, "toolCallCount": 1 }, { @@ -381,9 +381,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 4033, - "estimatedTokens": 1009, - "elapsedMs": 920, + "payloadBytes": 4060, + "estimatedTokens": 1015, + "elapsedMs": 1086, "toolCallCount": 1 }, { @@ -398,9 +398,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 3440, - "estimatedTokens": 860, - "elapsedMs": 1369, + "payloadBytes": 3466, + "estimatedTokens": 867, + "elapsedMs": 1640, "toolCallCount": 1 }, { @@ -415,9 +415,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 4391, - "estimatedTokens": 1098, - "elapsedMs": 1269, + "payloadBytes": 4418, + "estimatedTokens": 1105, + "elapsedMs": 1510, "toolCallCount": 1 }, { @@ -432,9 +432,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 3607, - "estimatedTokens": 902, - "elapsedMs": 775, + "payloadBytes": 3633, + "estimatedTokens": 909, + "elapsedMs": 896, "toolCallCount": 1 } ], @@ -447,22 +447,22 @@ "payloadMetric": "averageEstimatedTokens", "payloadMetricPassed": false, "beatenUsefulnessMetrics": [ - "averageUsefulness" + "averageUsefulness", + "bestExampleUsefulnessRate" ], "missingMetrics": [ - "averageFirstRelevantHit", - "bestExampleUsefulnessRate" + "averageFirstRelevantHit" ], "comparisons": [ { "metric": "averageEstimatedTokens", - "comparatorValue": 18.5, - "actualValue": 1822.25, + "comparatorValue": 32.166666666666664, + "actualValue": 1827.0833333333333, "passes": false }, { "metric": "averageUsefulness", - "comparatorValue": 0, + "comparatorValue": 0.027777777777777776, "actualValue": 0.75, "passes": true }, @@ -474,9 +474,9 @@ }, { "metric": "bestExampleUsefulnessRate", - "comparatorValue": null, + "comparatorValue": 0, "actualValue": 0.125, - "passes": false + "passes": true } ] }, @@ -543,28 +543,25 @@ }, { "comparatorName": "codebase-memory-mcp", - "status": "pending_evidence", + "status": "failed", "tolerancePercent": 15, - "missingMetrics": [ - "averageFirstRelevantHit", - "bestExampleUsefulnessRate" - ], + "missingMetrics": [], "comparisons": [ { "metric": "averageUsefulness", - "comparatorValue": 0, + "comparatorValue": 0.1875, "actualValue": 0.75, "passes": true }, { "metric": "averageFirstRelevantHit", - "comparatorValue": null, + "comparatorValue": 1.2857142857142858, "actualValue": null, "passes": false }, { "metric": "bestExampleUsefulnessRate", - "comparatorValue": null, + "comparatorValue": 0.5, "actualValue": 0.125, "passes": false } @@ -605,7 +602,6 @@ "raw Claude Code baseline metrics missing", "GrepAI comparator metrics missing", "jCodeMunch comparator metrics missing", - "codebase-memory-mcp comparator metrics missing", "CodeGraphContext comparator metrics missing" ], "claimAllowed": false diff --git a/scripts/benchmark-comparators.mjs b/scripts/benchmark-comparators.mjs index 0fbe5ca..e0a8e37 100644 --- a/scripts/benchmark-comparators.mjs +++ b/scripts/benchmark-comparators.mjs @@ -1,4 +1,3 @@ -#!/usr/bin/env node /** * Automated comparator benchmark runner for codebase-context discovery benchmark. * @@ -12,7 +11,7 @@ */ import path from 'path'; -import { fileURLToPath } from 'url'; +import { fileURLToPath, pathToFileURL } from 'url'; import { readFileSync, writeFileSync, mkdirSync, existsSync } from 'fs'; import { execSync, execFile } from 'child_process'; import { parseArgs } from 'util'; @@ -51,6 +50,224 @@ function normalizeText(value) { return value.toLowerCase().replace(/\\/g, '/'); } +function normalizeRelativePath(candidate) { + if (typeof candidate !== 'string') return null; + const trimmed = candidate.trim().replace(/^["']|["']$/g, ''); + if (!trimmed) return null; + const normalized = trimmed.replace(/\\/g, '/').replace(/^\.\//, ''); + if (/^[A-Za-z]:\//.test(normalized)) { + return normalized.replace(/^[A-Za-z]:\//, ''); + } + return normalized; +} + +function normalizeFilesystemPath(candidate) { + if (typeof candidate !== 'string') return null; + return candidate.trim().replace(/\\/g, '/').replace(/\/+$/, '').toLowerCase(); +} + +function isLikelyCodePath(candidate) { + if (typeof candidate !== 'string') return false; + if (!candidate.includes('/')) return false; + const lastSegment = candidate.split('/').pop() ?? ''; + return /\.[A-Za-z0-9]+$/.test(lastSegment); +} + +function collectTopFiles(value, sink = []) { + if (Array.isArray(value)) { + for (const item of value) { + collectTopFiles(item, sink); + } + return sink; + } + + if (value && typeof value === 'object') { + for (const [key, nested] of Object.entries(value)) { + if ( + (key === 'file' || key === 'filePath' || key === 'path' || key === 'source') && + typeof nested === 'string' + ) { + const normalized = normalizeRelativePath(nested); + if (normalized && isLikelyCodePath(normalized) && !sink.includes(normalized)) { + sink.push(normalized); + } + } + collectTopFiles(nested, sink); + } + return sink; + } + + if (typeof value === 'string') { + const matches = value.match(/[A-Za-z0-9_.-]+(?:\/[A-Za-z0-9_.-]+)+\.[A-Za-z0-9]+/g) ?? []; + for (const match of matches) { + const normalized = normalizeRelativePath(match); + if (normalized && !sink.includes(normalized)) { + sink.push(normalized); + } + } + } + + return sink; +} + +function extractBestExample(value) { + if (!value || typeof value !== 'object') return null; + if (Array.isArray(value)) { + for (const item of value) { + const candidate = extractBestExample(item); + if (candidate) return candidate; + } + return null; + } + + for (const [key, nested] of Object.entries(value)) { + if ( + (key === 'bestExample' || key === 'best_example' || key === 'goldenFile' || key === 'example') && + typeof nested === 'string' + ) { + const normalized = normalizeRelativePath(nested); + if (normalized) return normalized; + } + const candidate = extractBestExample(nested); + if (candidate) return candidate; + } + + return null; +} + +function extractPayloadText(result) { + const parts = []; + if (Array.isArray(result?.content)) { + for (const item of result.content) { + if (typeof item?.text === 'string' && item.text.trim()) { + parts.push(item.text.trim()); + } + } + } + if (result?.structuredContent !== undefined) { + parts.push(JSON.stringify(result.structuredContent, null, 2)); + } + if (parts.length === 0) { + parts.push(JSON.stringify(result)); + } + return parts.join('\n'); +} + +function extractMcpResponse(result) { + const topFiles = collectTopFiles(result?.structuredContent ?? result); + const bestExample = extractBestExample(result?.structuredContent ?? result) ?? topFiles[0] ?? null; + return { + payload: extractPayloadText(result), + ...(topFiles.length > 0 && { topFiles }), + ...(bestExample && { bestExample }) + }; +} + +function parseToolTextPayload(result) { + const textParts = Array.isArray(result?.content) + ? result.content + .map((item) => (typeof item?.text === 'string' ? item.text.trim() : '')) + .filter(Boolean) + : []; + return textParts.join('\n'); +} + +function extractIndexedProjectName(listProjectsResult, rootPath) { + const payload = parseToolTextPayload(listProjectsResult); + if (!payload) return null; + + try { + const parsed = JSON.parse(payload); + const projects = Array.isArray(parsed.projects) ? parsed.projects : []; + const normalizedRootPath = normalizeFilesystemPath(rootPath); + const match = projects.find( + (project) => normalizeFilesystemPath(project.root_path) === normalizedRootPath + ); + return typeof match?.name === 'string' ? match.name : null; + } catch { + return null; + } +} + +function matchPatterns(candidates, patterns) { + if (!patterns || patterns.length === 0) return null; + const normalizedPatterns = patterns.map(normalizeText); + for (let index = 0; index < candidates.length; index++) { + const normalizedCandidate = normalizeText(candidates[index]); + if (normalizedPatterns.some((pattern) => normalizedCandidate.includes(pattern))) { + return index + 1; + } + } + return null; +} + +export function buildRawClaudePrompt(task, rootPath) { + const query = task.args?.query ?? task.prompt; + const intent = + task.surface === 'search_codebase' + ? 'search' + : task.surface === 'get_team_patterns' + ? 'find local conventions' + : 'map/orient to the repository'; + + return [ + `You are exploring a codebase at ${path.resolve(rootPath)}.`, + `Use only Read, Grep, and Glob tools to ${intent}.`, + `Question: ${query}`, + 'Return strict JSON with this shape:', + '{"answer":"short concrete answer with repo terms","files":["repo-relative path in relevance order"],"bestExample":"repo-relative path or null"}', + 'Rules:', + '- files must be repo-relative and ordered most relevant first', + '- answer must include concrete identifiers, files, or patterns from the repo, not generic advice', + '- bestExample must be the strongest local example if one exists, otherwise null', + '- Output JSON only' + ].join('\n'); +} + +export function parseRawClaudeStructuredResult(resultText) { + const topFiles = []; + let bestExample = null; + let payload = resultText; + const trimmed = typeof resultText === 'string' ? resultText.trim() : ''; + const fencedJsonMatch = trimmed.match(/^```(?:json)?\s*([\s\S]*?)\s*```$/i); + const candidateJson = fencedJsonMatch ? fencedJsonMatch[1].trim() : trimmed; + + try { + const parsed = JSON.parse(candidateJson); + if (parsed && typeof parsed === 'object') { + if (Array.isArray(parsed.files)) { + for (const file of parsed.files) { + const normalized = normalizeRelativePath(file); + if (normalized && isLikelyCodePath(normalized) && !topFiles.includes(normalized)) { + topFiles.push(normalized); + } + } + } + const normalizedBestExample = normalizeRelativePath(parsed.bestExample); + if (normalizedBestExample) { + bestExample = normalizedBestExample; + } else if (topFiles.length > 0) { + bestExample = topFiles[0]; + } + payload = JSON.stringify(parsed); + } + } catch { + const fallbackFiles = collectTopFiles(resultText); + for (const file of fallbackFiles) { + if (!topFiles.includes(file)) { + topFiles.push(file); + } + } + bestExample = topFiles[0] ?? null; + } + + return { + payload, + ...(topFiles.length > 0 && { topFiles }), + ...(bestExample && { bestExample }) + }; +} + function matchSignals(payload, expectedSignals, forbiddenSignals) { const normalizedPayload = normalizeText(payload); const matchedSignals = expectedSignals.filter((s) => @@ -124,16 +341,26 @@ const COMPARATOR_ADAPTERS = [ serverArgs: ['--yes', 'codebase-memory-mcp'], serverEnv: {}, initTimeout: 10000, + resolveProjectName: true, indexTool: null, // auto-indexes on first query - searchTool: 'search_code', - searchArgs(task) { - return { query: task.prompt, mode: 'compact' }; - }, - extractPayload(result) { - if (Array.isArray(result?.content)) { - return result.content.map((c) => (typeof c?.text === 'string' ? c.text : JSON.stringify(c))).join('\n'); + buildTaskCall(task, { projectName }) { + const query = task.args?.query ?? task.prompt; + if (task.job === 'map') { + return { + name: 'get_architecture', + arguments: { project: projectName } + }; } - return JSON.stringify(result); + + return { + name: 'search_graph', + arguments: { + project: projectName, + query, + include_connected: true, + limit: 10 + } + }; } }, { @@ -170,12 +397,7 @@ const COMPARATOR_ADAPTERS = [ detail_level: 'compact' }; }, - extractPayload(result) { - if (Array.isArray(result?.content)) { - return result.content.map((c) => (typeof c?.text === 'string' ? c.text : JSON.stringify(c))).join('\n'); - } - return JSON.stringify(result); - } + extractPayload: null }, { name: 'GrepAI', @@ -208,12 +430,7 @@ const COMPARATOR_ADAPTERS = [ searchArgs(task) { return { query: task.prompt }; }, - extractPayload(result) { - if (Array.isArray(result?.content)) { - return result.content.map((c) => (typeof c?.text === 'string' ? c.text : JSON.stringify(c))).join('\n'); - } - return JSON.stringify(result); - } + extractPayload: null }, { name: 'CodeGraphContext', @@ -249,12 +466,7 @@ const COMPARATOR_ADAPTERS = [ // CodeGraphContext uses cypher-based queries; approximate with a search tool return { query: task.prompt }; }, - extractPayload(result) { - if (Array.isArray(result?.content)) { - return result.content.map((c) => (typeof c?.text === 'string' ? c.text : JSON.stringify(c))).join('\n'); - } - return JSON.stringify(result); - } + extractPayload: null }, { name: 'raw Claude Code', @@ -281,9 +493,7 @@ const COMPARATOR_ADAPTERS = [ searchArgs(task) { return { prompt: task.prompt }; }, - extractPayload(result) { - return typeof result === 'string' ? result : JSON.stringify(result); - } + extractPayload: null } ]; @@ -297,6 +507,7 @@ async function runComparatorViaMcp(adapter, rootPath, tasks) { serverCommand: adapter.serverCommand, serverArgs: adapter.serverArgs, serverEnv: adapter.serverEnv, + cwd: path.resolve(rootPath), connectTimeoutMs: adapter.connectTimeout ?? 15_000 }, async ({ client }) => { @@ -312,6 +523,25 @@ async function runComparatorViaMcp(adapter, rootPath, tasks) { throw new Error(`Failed to list tools from ${adapter.name}: ${err.message}`); } + let projectName = null; + if (adapter.resolveProjectName && availableTools.some((tool) => tool.name === 'list_projects')) { + try { + const listProjectsResult = await client.callTool({ + name: 'list_projects', + arguments: {} + }); + projectName = extractIndexedProjectName(listProjectsResult, rootPath); + } catch (err) { + throw new Error(`Failed to resolve indexed project for ${adapter.name}: ${err.message}`); + } + + if (!projectName) { + throw new Error( + `Could not resolve indexed project for ${adapter.name} at ${path.resolve(rootPath)}` + ); + } + } + const toolNames = availableTools.map((t) => t.name); let searchToolName = adapter.searchTool; if (!searchToolName) { @@ -348,15 +578,33 @@ async function runComparatorViaMcp(adapter, rootPath, tasks) { for (const task of tasks) { const startMs = Date.now(); let payload = ''; + let topFiles = []; + let bestExample = null; let toolCallCount = totalToolCalls; try { - const result = await client.callTool({ - name: searchToolName, - arguments: adapter.searchArgs(task) - }); + const request = + typeof adapter.buildTaskCall === 'function' + ? adapter.buildTaskCall(task, { rootPath, projectName, toolNames }) + : { + name: searchToolName, + arguments: adapter.searchArgs(task) + }; + const result = await client.callTool(request); toolCallCount++; - payload = adapter.extractPayload(result); + const extracted = + typeof adapter.extractPayload === 'function' + ? adapter.extractPayload(result) + : extractMcpResponse(result); + payload = typeof extracted === 'string' ? extracted : extracted.payload; + topFiles = + extracted && typeof extracted === 'object' && Array.isArray(extracted.topFiles) + ? extracted.topFiles + : []; + bestExample = + extracted && typeof extracted === 'object' && typeof extracted.bestExample === 'string' + ? extracted.bestExample + : topFiles[0] ?? null; } catch (err) { console.warn(` [${adapter.name}] Task ${task.id} failed: ${err.message}`); payload = ''; @@ -370,6 +618,13 @@ async function runComparatorViaMcp(adapter, rootPath, tasks) { task.expectedSignals, task.forbiddenSignals ); + const firstRelevantHit = matchPatterns(topFiles, task.expectedFilePatterns); + const bestExampleUseful = + task.expectedBestExamplePatterns && task.expectedBestExamplePatterns.length > 0 + ? task.expectedBestExamplePatterns.some((pattern) => + normalizeText(bestExample ?? '').includes(normalizeText(pattern)) + ) + : undefined; taskResults.push({ taskId: task.id, @@ -381,7 +636,9 @@ async function runComparatorViaMcp(adapter, rootPath, tasks) { payloadBytes, estimatedTokens, toolCallCount, - elapsedMs + elapsedMs, + ...(firstRelevantHit !== null ? { firstRelevantHit } : {}), + ...(typeof bestExampleUseful === 'boolean' ? { bestExampleUseful } : {}) }); } @@ -406,25 +663,76 @@ async function runRawClaudeCode(rootPath, tasks) { for (const task of tasks) { const startMs = Date.now(); let payload = ''; + let topFiles = []; + let bestExample = null; try { - const prompt = `You are exploring a codebase at ${path.resolve(rootPath)}. Answer this question using only grep, glob, and read file operations: ${task.prompt}`; - const { stdout } = await execFileAsync( - 'claude', - ['-p', prompt, '--output-format', 'json', '--allowedTools', 'Read,Grep,Glob'], - { timeout: 120000, cwd: path.resolve(rootPath), shell: process.platform === 'win32' } - ); + const prompt = buildRawClaudePrompt(task, rootPath); + const commandArgs = + process.platform === 'win32' + ? [ + 'powershell.exe', + [ + '-NoProfile', + '-Command', + 'claude -p $env:CLAUDE_BENCHMARK_PROMPT --model haiku --effort low --output-format json --allowedTools Read,Grep,Glob' + ], + { + timeout: 120000, + cwd: path.resolve(rootPath), + windowsHide: true, + env: { + ...process.env, + CLAUDE_BENCHMARK_PROMPT: prompt + } + } + ] + : [ + 'claude', + ['-p', prompt, '--model', 'haiku', '--effort', 'low', '--output-format', 'json', '--allowedTools', 'Read,Grep,Glob'], + { + timeout: 120000, + cwd: path.resolve(rootPath), + windowsHide: true + } + ]; + const { stdout } = await execFileAsync(commandArgs[0], commandArgs[1], commandArgs[2]); try { const parsed = JSON.parse(stdout); - payload = parsed.result ?? stdout; + const extracted = parseRawClaudeStructuredResult(parsed.result ?? stdout); + payload = extracted.payload; + topFiles = extracted.topFiles ?? []; + bestExample = extracted.bestExample ?? null; } catch { - payload = stdout; + const extracted = parseRawClaudeStructuredResult(stdout); + payload = extracted.payload; + topFiles = extracted.topFiles ?? []; + bestExample = extracted.bestExample ?? null; } } catch (err) { if (err.code === 'ENOENT' || err.message?.includes('command not found')) { throw new Error('claude CLI not found'); } - console.warn(` [raw Claude Code] Task ${task.id} error: ${err.message}`); + const fallbackStdout = typeof err.stdout === 'string' ? err.stdout.trim() : ''; + if (fallbackStdout) { + try { + const parsed = JSON.parse(fallbackStdout); + const extracted = parseRawClaudeStructuredResult(parsed.result ?? fallbackStdout); + payload = extracted.payload; + topFiles = extracted.topFiles ?? []; + bestExample = extracted.bestExample ?? null; + } catch { + const extracted = parseRawClaudeStructuredResult(fallbackStdout); + payload = extracted.payload; + topFiles = extracted.topFiles ?? []; + bestExample = extracted.bestExample ?? null; + } + } + + if (!payload) { + const stderr = typeof err.stderr === 'string' ? err.stderr.trim() : ''; + console.warn(` [raw Claude Code] Task ${task.id} error: ${stderr || err.message}`); + } } const elapsedMs = Date.now() - startMs; @@ -435,6 +743,13 @@ async function runRawClaudeCode(rootPath, tasks) { task.expectedSignals, task.forbiddenSignals ); + const firstRelevantHit = matchPatterns(topFiles, task.expectedFilePatterns); + const bestExampleUseful = + task.expectedBestExamplePatterns && task.expectedBestExamplePatterns.length > 0 + ? task.expectedBestExamplePatterns.some((pattern) => + normalizeText(bestExample ?? '').includes(normalizeText(pattern)) + ) + : undefined; taskResults.push({ taskId: task.id, @@ -446,7 +761,9 @@ async function runRawClaudeCode(rootPath, tasks) { payloadBytes, estimatedTokens, toolCallCount: null, - elapsedMs + elapsedMs, + ...(firstRelevantHit !== null ? { firstRelevantHit } : {}), + ...(typeof bestExampleUseful === 'boolean' ? { bestExampleUseful } : {}) }); } @@ -457,26 +774,56 @@ async function runRawClaudeCode(rootPath, tasks) { // Aggregate task results into DiscoveryComparatorMetrics shape // --------------------------------------------------------------------------- -function aggregateResults(taskResults) { +export function aggregateResults(taskResults) { const n = taskResults.length; - if (n === 0) return { averageUsefulness: null, averagePayloadBytes: null, averageEstimatedTokens: null, averageFirstRelevantHit: null, bestExampleUsefulnessRate: null }; + if (n === 0) { + return { + averageUsefulness: null, + averagePayloadBytes: null, + averageEstimatedTokens: null, + averageFirstRelevantHit: null, + bestExampleUsefulnessRate: null, + status: 'pending_evidence', + reason: 'No comparator task results were produced' + }; + } const avgUsefulness = taskResults.reduce((s, r) => s + r.usefulnessScore, 0) / n; const avgBytes = taskResults.reduce((s, r) => s + r.payloadBytes, 0) / n; const avgTokens = taskResults.reduce((s, r) => s + r.estimatedTokens, 0) / n; + const searchHits = taskResults + .map((r) => r.firstRelevantHit) + .filter((value) => typeof value === 'number'); + const bestExampleResults = taskResults + .map((r) => r.bestExampleUseful) + .filter((value) => typeof value === 'boolean'); const toolCallCounts = taskResults.map((r) => r.toolCallCount).filter((v) => typeof v === 'number'); const elapsedMsList = taskResults.map((r) => r.elapsedMs).filter((v) => typeof v === 'number'); + const hasMeaningfulEvidence = taskResults.some( + (result) => + result.usefulnessScore > 0 || + typeof result.firstRelevantHit === 'number' || + result.bestExampleUseful === true + ); + const status = hasMeaningfulEvidence ? 'ok' : 'pending_evidence'; return { averageUsefulness: avgUsefulness, averagePayloadBytes: avgBytes, averageEstimatedTokens: avgTokens, - averageFirstRelevantHit: null, // comparators don't expose ranked file lists in standard MCP responses - bestExampleUsefulnessRate: null, + averageFirstRelevantHit: + searchHits.length > 0 ? searchHits.reduce((sum, value) => sum + value, 0) / searchHits.length : null, + bestExampleUsefulnessRate: + bestExampleResults.length > 0 + ? bestExampleResults.filter(Boolean).length / bestExampleResults.length + : null, averageToolCallCount: toolCallCounts.length > 0 ? toolCallCounts.reduce((s, v) => s + v, 0) / toolCallCounts.length : null, averageElapsedMs: elapsedMsList.length > 0 ? elapsedMsList.reduce((s, v) => s + v, 0) / elapsedMsList.length : null, - status: 'ok', + status, + ...(status === 'pending_evidence' + ? { reason: 'Comparator returned task payloads, but none contained usable benchmark evidence' } + : {}), taskResults }; } @@ -680,7 +1027,13 @@ async function main() { } } -main().catch((err) => { - console.error('Fatal:', err); - process.exit(2); -}); +const isMain = + process.argv[1] && + import.meta.url === pathToFileURL(path.resolve(process.argv[1])).href; + +if (isMain) { + main().catch((err) => { + console.error('Fatal:', err); + process.exit(2); + }); +} diff --git a/scripts/lib/managed-mcp-session.mjs b/scripts/lib/managed-mcp-session.mjs index 5f54e55..17a6e5a 100644 --- a/scripts/lib/managed-mcp-session.mjs +++ b/scripts/lib/managed-mcp-session.mjs @@ -1,5 +1,9 @@ +import { execFile } from 'node:child_process'; +import { promisify } from 'node:util'; import process from 'node:process'; +const execFileAsync = promisify(execFile); + async function loadSdkClient() { const [{ Client }, { StdioClientTransport }] = await Promise.all([ import('@modelcontextprotocol/sdk/client/index.js'), @@ -39,6 +43,82 @@ function delay(timeoutMs) { }); } +function isProcessAlive(pid) { + if (!Number.isInteger(pid) || pid <= 0) { + return false; + } + + try { + process.kill(pid, 0); + return true; + } catch (error) { + return error?.code !== 'ESRCH'; + } +} + +async function waitForProcessExit(pid, timeoutMs) { + const deadline = Date.now() + timeoutMs; + + while (Date.now() < deadline) { + if (!isProcessAlive(pid)) { + return true; + } + await delay(50); + } + + return !isProcessAlive(pid); +} + +async function killProcessTree(pid) { + if (!isProcessAlive(pid)) { + return; + } + + if (process.platform === 'win32') { + try { + await execFileAsync('taskkill', ['/PID', String(pid), '/T', '/F'], { + windowsHide: true, + timeout: 10_000 + }); + } catch { + // Best-effort fallback below. + } + } + + if (!isProcessAlive(pid)) { + return; + } + + try { + process.kill(pid, 'SIGTERM'); + } catch { + return; + } + + if (await waitForProcessExit(pid, 1_000)) { + return; + } + + try { + process.kill(pid, 'SIGKILL'); + } catch { + // Best-effort. + } +} + +async function ensureProcessTreeExit(pid, timeoutMs = 1_500) { + if (!Number.isInteger(pid) || pid <= 0) { + return; + } + + if (await waitForProcessExit(pid, timeoutMs)) { + return; + } + + await killProcessTree(pid); + await waitForProcessExit(pid, 5_000); +} + async function safeClose(client, transport, connected) { const closeAttempts = []; @@ -72,22 +152,21 @@ export async function withManagedStdioClientSession(options, callback) { let connected = false; let settling = false; + let spawnedPid = null; const connectPromise = client.connect(transport); - const spawnNotification = (async () => { - if (typeof onSpawn !== 'function') { - return; - } - + const observeSpawn = (async () => { while (!settling) { if (transport.pid !== null) { - onSpawn(transport.pid); + spawnedPid = transport.pid; + onSpawn?.(transport.pid); return; } - await new Promise((resolve) => setTimeout(resolve, 10)); + await delay(10); } if (transport.pid !== null) { - onSpawn(transport.pid); + spawnedPid = transport.pid; + onSpawn?.(transport.pid); } })(); @@ -97,8 +176,10 @@ export async function withManagedStdioClientSession(options, callback) { return await callback({ client, transport }); } finally { settling = true; + await observeSpawn.catch(() => undefined); + const pidToKill = spawnedPid ?? transport.pid; await safeClose(client, transport, connected); - await spawnNotification.catch(() => undefined); + await ensureProcessTreeExit(pidToKill); await Promise.race([connectPromise, delay(5_000)]).catch(() => undefined); } } diff --git a/scripts/run-eval.mjs b/scripts/run-eval.mjs index ef82204..6f5b706 100644 --- a/scripts/run-eval.mjs +++ b/scripts/run-eval.mjs @@ -11,12 +11,18 @@ import { analyzerRegistry } from '../dist/core/analyzer-registry.js'; import { AngularAnalyzer } from '../dist/analyzers/angular/index.js'; import { GenericAnalyzer } from '../dist/analyzers/generic/index.js'; import { evaluateFixture, formatEvalReport } from '../dist/eval/harness.js'; +import { + combineEditPreflightSummaries, + evaluateEditPreflightFixture, + formatEditPreflightReport +} from '../dist/eval/edit-preflight-harness.js'; import { combineDiscoverySummaries, evaluateDiscoveryGate, evaluateDiscoveryFixture, formatDiscoveryReport } from '../dist/eval/discovery-harness.js'; +import { getDefaultFixturePaths, resolveEvalMode } from '../dist/eval/run-config.js'; const __dirname = path.dirname(fileURLToPath(import.meta.url)); const projectRoot = path.join(__dirname, '..'); @@ -24,20 +30,6 @@ const packageJsonPath = path.join(projectRoot, 'package.json'); const packageJson = JSON.parse(readFileSync(packageJsonPath, 'utf-8')); -const defaultFixtureA = path.join(projectRoot, 'tests', 'fixtures', 'eval-angular-spotify.json'); -const defaultFixtureB = path.join(projectRoot, 'tests', 'fixtures', 'eval-controlled.json'); -const defaultDiscoveryFixtureA = path.join( - projectRoot, - 'tests', - 'fixtures', - 'discovery-angular-spotify.json' -); -const defaultDiscoveryFixtureB = path.join( - projectRoot, - 'tests', - 'fixtures', - 'discovery-excalidraw.json' -); const defaultDiscoveryProtocol = path.join( projectRoot, 'tests', @@ -49,7 +41,7 @@ const usage = [ `Usage: node scripts/run-eval.mjs [codebaseB] [options]`, ``, `Options:`, - ` --mode= Select benchmark mode (default: retrieval)`, + ` --mode= Select benchmark mode (default: retrieval)`, ` --fixture-a= Override fixture for codebaseA`, ` --fixture-b= Override fixture for codebaseB`, ` --protocol= Override discovery benchmark protocol`, @@ -151,6 +143,17 @@ async function runSingleEvaluation({ fixturePath: resolvedFixture, summary }); + } else if (mode === 'edit-preflight') { + console.log(`\n--- Phase 2: Running ${fixture.tasks.length}-task edit-preflight harness ---`); + summary = await evaluateEditPreflightFixture({ + fixture, + rootPath: resolvedCodebase + }); + report = formatEditPreflightReport({ + codebaseLabel: label, + fixturePath: resolvedFixture, + summary + }); } else { console.log(`\n--- Phase 2: Running ${fixture.queries.length}-query eval harness ---`); const searcher = new CodebaseSearcher(resolvedCodebase); @@ -202,6 +205,31 @@ function printCombinedSummary(summaries, mode) { return; } + if (mode === 'edit-preflight') { + const combined = combineEditPreflightSummaries(summaries); + console.log(`\n=== Combined Edit Preflight Summary ===`); + console.log( + `Top-target in top-3: ${combined.topTargetInTop3Count}/${combined.targetableTasks} (${combined.topTargetInTop3Rate === null ? 'n/a' : (combined.topTargetInTop3Rate * 100).toFixed(0) + '%'})` + ); + console.log( + `Average first relevant hit: ${combined.averageFirstRelevantHit === null ? 'n/a' : combined.averageFirstRelevantHit.toFixed(2)}` + ); + console.log( + `Best-example hit rate: ${combined.bestExampleHitCount}/${combined.bestExampleTasks} (${combined.bestExampleHitRate === null ? 'n/a' : (combined.bestExampleHitRate * 100).toFixed(0) + '%'})` + ); + console.log( + `Safe ready rate: ${combined.safeTaskReadyCount}/${combined.safeTasks} (${combined.safeTaskReadyRate === null ? 'n/a' : (combined.safeTaskReadyRate * 100).toFixed(0) + '%'})` + ); + console.log( + `Unsafe abstain rate: ${combined.unsafeTaskAbstainCount}/${combined.unsafeTasks} (${combined.unsafeTaskAbstainRate === null ? 'n/a' : (combined.unsafeTaskAbstainRate * 100).toFixed(0) + '%'})` + ); + console.log( + `Unsafe ready=true false positives: ${combined.unsafeReadyFalsePositiveCount}/${combined.unsafeTasks} (${combined.unsafeReadyFalsePositiveRate === null ? 'n/a' : (combined.unsafeReadyFalsePositiveRate * 100).toFixed(0) + '%'})` + ); + console.log(`=======================================\n`); + return; + } + const total = summaries.reduce((sum, summary) => sum + summary.total, 0); const top1Correct = summaries.reduce((sum, summary) => sum + summary.top1Correct, 0); const top3RecallCount = summaries.reduce((sum, summary) => sum + summary.top3RecallCount, 0); @@ -254,17 +282,14 @@ async function main() { const codebaseA = positionals[0]; const codebaseB = positionals[1]; - const mode = values.mode === 'discovery' ? 'discovery' : 'retrieval'; + const mode = resolveEvalMode(values.mode); + const defaultFixtures = getDefaultFixturePaths(projectRoot, mode); const fixtureA = values['fixture-a'] ? path.resolve(values['fixture-a']) - : mode === 'discovery' - ? defaultDiscoveryFixtureA - : defaultFixtureA; + : defaultFixtures.fixtureA; const fixtureB = values['fixture-b'] ? path.resolve(values['fixture-b']) - : mode === 'discovery' - ? defaultDiscoveryFixtureB - : defaultFixtureB; + : defaultFixtures.fixtureB; const protocolPath = values.protocol ? path.resolve(values.protocol) : defaultDiscoveryProtocol; @@ -326,6 +351,25 @@ async function main() { process.exit(gate.status === 'failed' ? 1 : 0); } + if (mode === 'edit-preflight') { + const combinedSummary = combineEditPreflightSummaries(summaries); + printCombinedSummary(summaries, mode); + console.log( + formatEditPreflightReport({ + codebaseLabel: 'combined-suite', + fixturePath: codebaseB ? `${fixtureA}, ${fixtureB}` : fixtureA, + summary: combinedSummary + }) + ); + if (outputPath) { + const outputDir = path.dirname(outputPath); + if (!existsSync(outputDir)) mkdirSync(outputDir, { recursive: true }); + writeFileSync(outputPath, JSON.stringify(combinedSummary, null, 2)); + console.log(`\nResults written to: ${outputPath}`); + } + process.exit(0); + } + if (outputPath && mode === 'discovery' && summaries.length === 1) { const outputDir = path.dirname(outputPath); if (!existsSync(outputDir)) mkdirSync(outputDir, { recursive: true }); diff --git a/src/cli-map.ts b/src/cli-map.ts index 0f13606..4479afe 100644 --- a/src/cli-map.ts +++ b/src/cli-map.ts @@ -32,7 +32,8 @@ function printMapUsage(): void { console.log('Output the conventions map for the current codebase.'); console.log(''); console.log('Options:'); - console.log(' --export Write CODEBASE_MAP.md to project root (overrides other flags)'); + console.log(' --export Write CODEBASE_MAP.md to project root (still honors --full)'); + console.log(' --full Output the exhaustive map instead of the bounded default'); console.log(' --json Output raw JSON (CodebaseMapSummary)'); console.log(' --pretty Terminal-friendly box layout'); console.log(' --help Show this help'); @@ -44,6 +45,7 @@ export async function handleMapCli(args: string[]): Promise { const useJson = args.includes('--json'); const usePretty = args.includes('--pretty'); const useExport = args.includes('--export'); + const useFull = args.includes('--full'); const showHelp = args.includes('--help') || args.includes('-h'); if (showHelp) { @@ -77,7 +79,7 @@ export async function handleMapCli(args: string[]): Promise { project.indexState = indexState; try { - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: useFull ? 'full' : 'bounded' }); if (useExport) { const outPath = path.join(rootPath, 'CODEBASE_MAP.md'); diff --git a/src/cli-memory.ts b/src/cli-memory.ts index 6921338..ef66dc5 100644 --- a/src/cli-memory.ts +++ b/src/cli-memory.ts @@ -3,13 +3,15 @@ */ import path from 'path'; -import type { Memory } from './types/index.js'; +import type { Memory, MemoryScope } from './types/index.js'; import { CODEBASE_CONTEXT_DIRNAME, MEMORY_FILENAME } from './constants/codebase-context.js'; import { appendMemoryFile, + buildMemoryIdentityParts, readMemoriesFile, removeMemory, filterMemories, + normalizeMemoryScope, withConfidence } from './memory/store.js'; @@ -45,7 +47,7 @@ export async function handleMemoryCli(args: string[]): Promise { const listUsage = 'Usage: codebase-context memory list [--category ] [--type ] [--query ] [--json]'; const addUsage = - 'Usage: codebase-context memory add --type --category --memory --reason [--json]'; + 'Usage: codebase-context memory add --type --category --memory --reason [--scope-kind global|file|symbol] [--scope-file ] [--scope-symbol ] [--json]'; const removeUsage = 'Usage: codebase-context memory remove [--json]'; const exitWithUsageError = (message: string, usage?: string): never => { @@ -134,6 +136,13 @@ export async function handleMemoryCli(args: string[]): Promise { const staleTag = m.stale ? ' [STALE]' : ''; console.log(`[${m.id}] ${m.type}/${m.category}: ${m.memory}${staleTag}`); console.log(` Reason: ${m.reason}`); + if (m.scope && m.scope.kind !== 'global') { + if (m.scope.kind === 'file') { + console.log(` Scope: file ${m.scope.file}`); + } else { + console.log(` Scope: symbol ${m.scope.file}#${m.scope.symbol}`); + } + } console.log(` Date: ${m.date} | Confidence: ${m.effectiveConfidence}`); console.log(''); } @@ -145,6 +154,9 @@ export async function handleMemoryCli(args: string[]): Promise { let category: CliMemoryCategory | undefined; let memory: string | undefined; let reason: string | undefined; + let scopeKind: MemoryScope['kind'] | undefined; + let scopeFile: string | undefined; + let scopeSymbol: string | undefined; for (let i = 1; i < args.length; i++) { if (args[i] === '--type') { @@ -197,6 +209,34 @@ export async function handleMemoryCli(args: string[]): Promise { } reason = value; i++; + } else if (args[i] === '--scope-kind') { + const value = args[i + 1]; + if (!value || value.startsWith('--')) { + exitWithUsageError('Error: --scope-kind requires a value.', addUsage); + } + if (value === 'global' || value === 'file' || value === 'symbol') { + scopeKind = value; + } else { + exitWithUsageError( + 'Error: invalid --scope-kind. Allowed: global, file, symbol.', + addUsage + ); + } + i++; + } else if (args[i] === '--scope-file') { + const value = args[i + 1]; + if (!value || value.startsWith('--')) { + exitWithUsageError('Error: --scope-file requires a value.', addUsage); + } + scopeFile = value; + i++; + } else if (args[i] === '--scope-symbol') { + const value = args[i + 1]; + if (!value || value.startsWith('--')) { + exitWithUsageError('Error: --scope-symbol requires a value.', addUsage); + } + scopeSymbol = value; + i++; } else if (args[i] === '--json') { // handled above } @@ -210,9 +250,30 @@ export async function handleMemoryCli(args: string[]): Promise { const requiredCategory = category; const requiredMemory = memory; const requiredReason = reason; + const scope = normalizeMemoryScope({ + kind: scopeKind, + file: scopeFile, + symbol: scopeSymbol + }); + + if (scopeKind === 'file' && !scope) { + exitWithUsageError('Error: --scope-kind file requires --scope-file.', addUsage); + } + if (scopeKind === 'symbol' && !scope) { + exitWithUsageError( + 'Error: --scope-kind symbol requires --scope-file and --scope-symbol.', + addUsage + ); + } const crypto = await import('crypto'); - const hashContent = `${type}:${requiredCategory}:${requiredMemory}:${requiredReason}`; + const hashContent = buildMemoryIdentityParts({ + type, + category: requiredCategory, + memory: requiredMemory, + reason: requiredReason, + scope + }); const hash = crypto.createHash('sha256').update(hashContent).digest('hex'); const id = hash.substring(0, 12); @@ -222,7 +283,8 @@ export async function handleMemoryCli(args: string[]): Promise { category: requiredCategory, memory: requiredMemory, reason: requiredReason, - date: new Date().toISOString() + date: new Date().toISOString(), + ...(scope && { scope }) }; const result = await appendMemoryFile(memoryPath, newMemory); diff --git a/src/cli.ts b/src/cli.ts index fe6f57f..1bc89f4 100644 --- a/src/cli.ts +++ b/src/cli.ts @@ -10,6 +10,7 @@ import { CODEBASE_CONTEXT_DIRNAME, MEMORY_FILENAME, INTELLIGENCE_FILENAME, + HEALTH_FILENAME, KEYWORD_INDEX_FILENAME, VECTOR_DB_DIRNAME } from './constants/codebase-context.js'; @@ -107,6 +108,7 @@ async function initToolContext(): Promise { baseDir: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME), memory: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME, MEMORY_FILENAME), intelligence: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME, INTELLIGENCE_FILENAME), + health: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME, HEALTH_FILENAME), keywordIndex: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME, KEYWORD_INDEX_FILENAME), vectorDb: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME, VECTOR_DB_DIRNAME) }; diff --git a/src/constants/codebase-context.ts b/src/constants/codebase-context.ts index 3f57bfa..473748f 100644 --- a/src/constants/codebase-context.ts +++ b/src/constants/codebase-context.ts @@ -20,6 +20,7 @@ export const INDEX_META_FILENAME = 'index-meta.json' as const; export const MEMORY_FILENAME = 'memory.json' as const; export const INTELLIGENCE_FILENAME = 'intelligence.json' as const; +export const HEALTH_FILENAME = 'health.json' as const; export const KEYWORD_INDEX_FILENAME = 'index.json' as const; export const INDEXING_STATS_FILENAME = 'indexing-stats.json' as const; export const VECTOR_DB_DIRNAME = 'index' as const; diff --git a/src/core/codebase-map.ts b/src/core/codebase-map.ts index 90a8487..61a4495 100644 --- a/src/core/codebase-map.ts +++ b/src/core/codebase-map.ts @@ -24,7 +24,11 @@ import type { PatternsData, CodeChunk } from '../types/index.js'; -import { RELATIONSHIPS_FILENAME, KEYWORD_INDEX_FILENAME } from '../constants/codebase-context.js'; +import { + EXCLUDED_DIRECTORY_NAMES, + RELATIONSHIPS_FILENAME, + KEYWORD_INDEX_FILENAME +} from '../constants/codebase-context.js'; // --------------------------------------------------------------------------- // Internal types for relationships.json @@ -50,12 +54,36 @@ interface RelationshipsData { }; } -// --------------------------------------------------------------------------- -// Entrypoint exclusion pattern -// --------------------------------------------------------------------------- - -const ENTRYPOINT_EXCLUSION_RE = - /(?:^|\/)(?:tests?|__tests__|fixtures?|scripts?)\/|\.test\.|\.spec\./; +type CodebaseMapMode = 'bounded' | 'full'; + +type BuildCodebaseMapOptions = { + mode?: CodebaseMapMode; +}; + +const BOUNDED_SECTION_LIMITS = { + entrypoints: 8, + hubFiles: 5, + keyInterfaces: 8, + apiSurfaceFiles: 8, + apiSurfaceExports: 3, + hotspots: 5, + bestExamples: 3 +} as const; + +const MAP_EXCLUDED_PATH_PATTERNS = [ + /(?:^|\/)(?:tests?|__tests__|specs?|__specs__)(?:\/|$)/i, + /\.(?:test|spec)\.[^/]+$/i, + /(?:^|\/)(?:fixtures?|__fixtures__)(?:\/|$)/i, + /(?:^|\/)(?:generated|__generated__)(?:\/|$)/i, + /(?:^|\/)[^/]*\.(?:generated|gen|min)\.[^/]+$/i, + /\.snap$/i +] as const; + +const MAP_EXCLUDED_DIRECTORY_NAMES = new Set( + [...EXCLUDED_DIRECTORY_NAMES, '__fixtures__', '__generated__', 'fixtures', 'generated'].map( + (segment) => segment.toLowerCase() + ) +); // --------------------------------------------------------------------------- // Builder @@ -66,7 +94,11 @@ const ENTRYPOINT_EXCLUSION_RE = * Reads `intelligence.json`, `relationships.json`, and `index.json` from project paths. * Degrades gracefully when artifacts are missing. */ -export async function buildCodebaseMap(project: ProjectState): Promise { +export async function buildCodebaseMap( + project: ProjectState, + options: BuildCodebaseMapOptions = {} +): Promise { + const mode = options.mode ?? 'bounded'; const projectName = path.basename(project.rootPath); // Read intelligence.json @@ -100,9 +132,10 @@ export async function buildCodebaseMap(project: ProjectState): Promise isMapEligiblePath(chunk.relativePath, mode)); // relationships.json has stats at top level OR inside graph const statsSource = relationships.stats ?? @@ -133,13 +166,13 @@ export async function buildCodebaseMap(project: ProjectState): Promise = Object.entries( @@ -150,17 +183,17 @@ export async function buildCodebaseMap(project: ProjectState): Promise x.count, (x) => x.file ) - .slice(0, 5) + .slice(0, mode === 'bounded' ? BOUNDED_SECTION_LIMITS.hubFiles : undefined) .map((x) => x.file); // --- Key interfaces --- - const keyInterfaces = deriveKeyInterfaces(chunks, graphImportedBy); + const keyInterfaces = deriveKeyInterfaces(filteredChunks, graphImportedBy, mode); // --- API surface --- - const apiSurface = deriveApiSurface(entrypoints, graphExports); + const apiSurface = deriveApiSurface(boundedEntrypoints, graphExports, mode); // --- Dependency hotspots --- - const hotspots = deriveHotspots(graphImports, graphImportedBy); + const hotspots = deriveHotspots(graphImports, graphImportedBy, mode); // --- Active patterns --- const patterns: PatternsData = intelligence.patterns ?? {}; @@ -187,8 +220,14 @@ export async function buildCodebaseMap(project: ProjectState): Promise 0 ? activePatterns[0].name : 'high-quality example'; - const goldenFiles = intelligence.goldenFiles ?? []; - const bestExamples: CodebaseMapExample[] = goldenFiles.slice(0, 3).map((gf) => ({ + const goldenFiles = (intelligence.goldenFiles ?? []).filter((gf) => + isMapEligiblePath(gf.file, mode) + ); + const bestExamples: CodebaseMapExample[] = maybeLimit( + goldenFiles, + BOUNDED_SECTION_LIMITS.bestExamples, + mode + ).map((gf) => ({ file: gf.file, score: gf.score, reason: dominantPatternName @@ -210,7 +249,14 @@ export async function buildCodebaseMap(project: ProjectState): Promise + graphImportedBy: Record, + mode: CodebaseMapMode ): CodebaseMapKeyInterface[] { const symbolChunks = chunks.filter( (c) => c.metadata?.symbolAware === true && SYMBOL_KINDS.has(c.metadata.symbolKind ?? '') @@ -251,18 +298,21 @@ function deriveKeyInterfaces( if (lenDiff !== 0) return lenDiff; return a.chunk.relativePath.localeCompare(b.chunk.relativePath); }); - return scored.slice(0, 10).map(({ chunk, importerCount }) => ({ - name: chunk.metadata.symbolName ?? path.basename(chunk.relativePath), - kind: chunk.metadata.symbolKind ?? 'unknown', - file: chunk.relativePath, - importerCount, - signatureHint: buildSignatureHint(chunk.content) - })); + return maybeLimit(scored, BOUNDED_SECTION_LIMITS.keyInterfaces, mode).map( + ({ chunk, importerCount }) => ({ + name: chunk.metadata.symbolName ?? path.basename(chunk.relativePath), + kind: chunk.metadata.symbolKind ?? 'unknown', + file: chunk.relativePath, + importerCount, + signatureHint: buildSignatureHint(chunk.content) + }) + ); } function deriveApiSurface( entrypoints: string[], - graphExports: Record> + graphExports: Record>, + mode: CodebaseMapMode ): CodebaseMapApiSurface[] { const results: CodebaseMapApiSurface[] = []; for (const ep of entrypoints) { @@ -271,16 +321,17 @@ function deriveApiSurface( const names = exps .map((e) => e.name) .filter((n) => n && n !== 'default') - .slice(0, 5); + .slice(0, mode === 'bounded' ? BOUNDED_SECTION_LIMITS.apiSurfaceExports : undefined); if (names.length === 0) continue; results.push({ file: ep, exports: names }); } - return results; + return maybeLimit(results, BOUNDED_SECTION_LIMITS.apiSurfaceFiles, mode); } function deriveHotspots( graphImports: Record, - graphImportedBy: Record + graphImportedBy: Record, + mode: CodebaseMapMode ): CodebaseMapHotspot[] { const allFiles = new Set([...Object.keys(graphImports), ...Object.keys(graphImportedBy)]); const hotspots: CodebaseMapHotspot[] = []; @@ -295,7 +346,7 @@ function deriveHotspots( if (b.combined !== a.combined) return b.combined - a.combined; return a.file.localeCompare(b.file); }); - return hotspots.slice(0, 5); + return maybeLimit(hotspots, BOUNDED_SECTION_LIMITS.hotspots, mode); } function enrichLayers( @@ -642,3 +693,61 @@ function sortByCountThenAlpha( return getName(a).localeCompare(getName(b)); }); } + +function maybeLimit(items: T[], limit: number, mode: CodebaseMapMode): T[] { + return mode === 'bounded' ? items.slice(0, limit) : items; +} + +function filterAdjacencyGraph( + graph: Record, + mode: CodebaseMapMode +): Record { + if (mode === 'full') { + return graph; + } + + return Object.fromEntries( + Object.entries(graph) + .filter(([file]) => isMapEligiblePath(file, mode)) + .map(([file, related]) => [file, related.filter((item) => isMapEligiblePath(item, mode))]) + ); +} + +function filterExportGraph( + graph: Record>, + mode: CodebaseMapMode +): Record> { + if (mode === 'full') { + return graph; + } + + return Object.fromEntries( + Object.entries(graph).filter(([file]) => isMapEligiblePath(file, mode)) + ); +} + +function isMapEligiblePath(filePath: string, mode: CodebaseMapMode): boolean { + if (mode === 'full') { + return true; + } + + const normalizedPath = normalizeMapPath(filePath); + if (!normalizedPath) { + return false; + } + + const segments = normalizedPath + .split('/') + .map((segment) => segment.toLowerCase()) + .filter(Boolean); + + if (segments.some((segment) => MAP_EXCLUDED_DIRECTORY_NAMES.has(segment))) { + return false; + } + + return !MAP_EXCLUDED_PATH_PATTERNS.some((pattern) => pattern.test(normalizedPath)); +} + +function normalizeMapPath(filePath: string): string { + return filePath.replace(/\\/g, '/').replace(/^\.\//, '').trim(); +} diff --git a/src/core/index-meta.ts b/src/core/index-meta.ts index 8353dba..4994235 100644 --- a/src/core/index-meta.ts +++ b/src/core/index-meta.ts @@ -4,6 +4,7 @@ import { z } from 'zod'; import { CODEBASE_CONTEXT_DIRNAME, + HEALTH_FILENAME, INDEX_FORMAT_VERSION, INDEX_META_FILENAME, INDEX_META_VERSION, @@ -41,6 +42,30 @@ const RelationshipsFileSchema = z }) .passthrough(); +const HealthFileSchema = z.object({ + header: ArtifactHeaderSchema, + generatedAt: z.string().datetime(), + summary: z + .object({ + files: z.number().int().nonnegative(), + highRiskFiles: z.number().int().nonnegative(), + mediumRiskFiles: z.number().int().nonnegative(), + lowRiskFiles: z.number().int().nonnegative() + }) + .passthrough(), + files: z.array( + z + .object({ + file: z.string().min(1), + level: z.enum(['low', 'medium', 'high']), + score: z.number().nonnegative(), + reasons: z.array(z.string()), + signals: z.record(z.string(), z.number()).optional() + }) + .passthrough() + ) +}); + export const IndexMetaSchema = z.object({ metaVersion: z.number().int().positive(), formatVersion: z.number().int().nonnegative(), @@ -59,6 +84,11 @@ export const IndexMetaSchema = z.object({ embeddingModel: z.string().optional() }), intelligence: z + .object({ + path: z.string().min(1) + }) + .optional(), + health: z .object({ path: z.string().min(1) }) @@ -270,4 +300,34 @@ export async function validateIndexArtifacts(rootDir: string, meta: IndexMeta): throw asIndexCorrupted('Relationships sidecar corrupted (rebuild required)', error); } } + + // Optional health sidecar: validate if present, but do not require. + const healthPath = path.join(contextDir, HEALTH_FILENAME); + if (await pathExists(healthPath)) { + try { + const raw = await fs.readFile(healthPath, 'utf-8'); + const json = JSON.parse(raw); + const parsed = HealthFileSchema.safeParse(json); + if (!parsed.success) { + throw new IndexCorruptedError( + `Health schema mismatch (rebuild required): ${parsed.error.message}` + ); + } + + const { buildId, formatVersion } = parsed.data.header; + if (formatVersion !== meta.formatVersion) { + throw new IndexCorruptedError( + `Health formatVersion mismatch (rebuild required): meta=${meta.formatVersion}, health.json=${formatVersion}` + ); + } + if (buildId !== meta.buildId) { + throw new IndexCorruptedError( + `Health buildId mismatch (rebuild required): meta=${meta.buildId}, health.json=${buildId}` + ); + } + } catch (error) { + if (error instanceof IndexCorruptedError) throw error; + throw asIndexCorrupted('Health sidecar corrupted (rebuild required)', error); + } + } } diff --git a/src/core/indexer.ts b/src/core/indexer.ts index d6bb842..9530576 100644 --- a/src/core/indexer.ts +++ b/src/core/indexer.ts @@ -42,6 +42,7 @@ import { getFileCommitDates } from '../utils/git-dates.js'; import { CODEBASE_CONTEXT_DIRNAME, EXCLUDED_GLOB_PATTERNS, + HEALTH_FILENAME, INDEX_FORMAT_VERSION, INDEXING_STATS_FILENAME, INDEX_META_FILENAME, @@ -52,6 +53,7 @@ import { RELATIONSHIPS_FILENAME, VECTOR_DB_DIRNAME } from '../constants/codebase-context.js'; +import { deriveCodebaseHealth } from '../health/derive.js'; const STAGING_DIRNAME = '.staging'; const PREVIOUS_DIRNAME = '.previous'; @@ -104,6 +106,7 @@ async function atomicSwapStagingToActive( const activeManifestPath = path.join(contextDir, MANIFEST_FILENAME); const activeStatsPath = path.join(contextDir, INDEXING_STATS_FILENAME); const activeRelationshipsPath = path.join(contextDir, RELATIONSHIPS_FILENAME); + const activeHealthPath = path.join(contextDir, HEALTH_FILENAME); const stagingMetaPath = path.join(stagingDir, INDEX_META_FILENAME); const stagingIndexPath = path.join(stagingDir, KEYWORD_INDEX_FILENAME); @@ -112,6 +115,7 @@ async function atomicSwapStagingToActive( const stagingManifestPath = path.join(stagingDir, MANIFEST_FILENAME); const stagingStatsPath = path.join(stagingDir, INDEXING_STATS_FILENAME); const stagingRelationshipsPath = path.join(stagingDir, RELATIONSHIPS_FILENAME); + const stagingHealthPath = path.join(stagingDir, HEALTH_FILENAME); // Step 1: Create .previous directory and move current active there await fs.mkdir(previousDir, { recursive: true }); @@ -149,6 +153,7 @@ async function atomicSwapStagingToActive( await moveIfExists(activeManifestPath, path.join(previousDir, MANIFEST_FILENAME)); await moveIfExists(activeStatsPath, path.join(previousDir, INDEXING_STATS_FILENAME)); await moveIfExists(activeRelationshipsPath, path.join(previousDir, RELATIONSHIPS_FILENAME)); + await moveIfExists(activeHealthPath, path.join(previousDir, HEALTH_FILENAME)); await moveDirIfExists(activeVectorDir, path.join(previousDir, VECTOR_DB_DIRNAME)); // Step 2: Move staging artifacts to active location @@ -159,6 +164,7 @@ async function atomicSwapStagingToActive( await moveIfExists(stagingManifestPath, activeManifestPath); await moveIfExists(stagingStatsPath, activeStatsPath); await moveIfExists(stagingRelationshipsPath, activeRelationshipsPath); + await moveIfExists(stagingHealthPath, activeHealthPath); await moveDirIfExists(stagingVectorDir, activeVectorDir); // Step 3: Clean up .previous and staging directories @@ -188,6 +194,7 @@ async function atomicSwapStagingToActive( await moveIfExists(path.join(previousDir, MANIFEST_FILENAME), activeManifestPath); await moveIfExists(path.join(previousDir, INDEXING_STATS_FILENAME), activeStatsPath); await moveIfExists(path.join(previousDir, RELATIONSHIPS_FILENAME), activeRelationshipsPath); + await moveIfExists(path.join(previousDir, HEALTH_FILENAME), activeHealthPath); await moveDirIfExists(path.join(previousDir, VECTOR_DB_DIRNAME), activeVectorDir); console.error('Rollback successful'); } catch (rollbackError) { @@ -980,6 +987,16 @@ export class CodebaseIndexer { }; await fs.writeFile(relationshipsPath, JSON.stringify(relationships, null, 2)); + const healthPath = path.join(activeContextDir, HEALTH_FILENAME); + const health = deriveCodebaseHealth({ + buildId, + formatVersion: INDEX_FORMAT_VERSION, + generatedAt, + chunks: allChunks, + graph: internalFileGraph + }); + await fs.writeFile(healthPath, JSON.stringify(health, null, 2)); + // Write manifest (both full and incremental) // For full rebuild, write to staging; for incremental, write to active const activeManifestPath = path.join(activeContextDir, MANIFEST_FILENAME); @@ -1021,7 +1038,8 @@ export class CodebaseIndexer { intelligence: { path: INTELLIGENCE_FILENAME }, manifest: { path: MANIFEST_FILENAME }, indexingStats: { path: INDEXING_STATS_FILENAME }, - relationships: { path: RELATIONSHIPS_FILENAME } + relationships: { path: RELATIONSHIPS_FILENAME }, + health: { path: HEALTH_FILENAME } } }, null, diff --git a/src/eval/edit-preflight-harness.ts b/src/eval/edit-preflight-harness.ts new file mode 100644 index 0000000..7a48cbb --- /dev/null +++ b/src/eval/edit-preflight-harness.ts @@ -0,0 +1,271 @@ +import { createProjectState } from '../project-state.js'; +import { handle as searchCodebaseHandle } from '../tools/search-codebase.js'; +import type { + EditPreflightFixture, + EditPreflightResponse, + EditPreflightRunner, + EditPreflightSummary, + EditPreflightTask, + EditPreflightTaskResult, + EvaluateEditPreflightFixtureParams, + FormatEditPreflightReportParams +} from './types.js'; + +function normalizeText(value: string): string { + return value.toLowerCase().replace(/\\/g, '/'); +} + +function stripLocationSuffix(fileRef: string): string { + return fileRef.replace(/:(\d+)(?:-\d+)?$/, ''); +} + +function matchesPatterns(candidate: string, patterns: string[] | undefined): boolean { + if (!patterns || patterns.length === 0) { + return false; + } + + const normalizedCandidate = normalizeText(candidate); + return patterns.some((pattern) => normalizedCandidate.includes(normalizeText(pattern))); +} + +function findFirstRelevantHit(topFiles: string[], patterns: string[] | undefined): number | null { + if (!patterns || patterns.length === 0) { + return null; + } + + for (let index = 0; index < topFiles.length; index++) { + if (matchesPatterns(topFiles[index], patterns)) { + return index + 1; + } + } + + return null; +} + +function summarizeEditPreflightResults(results: EditPreflightTaskResult[]): EditPreflightSummary { + const totalTasks = results.length; + const safeResults = results.filter((result) => result.risk === 'safe'); + const unsafeResults = results.filter((result) => result.risk === 'unsafe'); + const targetableResults = results.filter((result) => result.topTargetInTop3 !== null); + const bestExampleResults = results.filter((result) => result.bestExampleHit !== null); + const firstRelevantHits = results + .map((result) => result.firstRelevantHit) + .filter((value): value is number => typeof value === 'number'); + + const topTargetInTop3Count = targetableResults.filter((result) => result.topTargetInTop3).length; + const bestExampleHitCount = bestExampleResults.filter((result) => result.bestExampleHit).length; + const safeTaskReadyCount = safeResults.filter((result) => result.ready).length; + const unsafeTaskAbstainCount = unsafeResults.filter((result) => result.abstain).length; + const unsafeReadyFalsePositiveCount = unsafeResults.filter((result) => result.ready).length; + + return { + totalTasks, + safeTasks: safeResults.length, + unsafeTasks: unsafeResults.length, + targetableTasks: targetableResults.length, + bestExampleTasks: bestExampleResults.length, + topTargetInTop3Count, + topTargetInTop3Rate: + targetableResults.length > 0 ? topTargetInTop3Count / targetableResults.length : null, + averageFirstRelevantHit: + firstRelevantHits.length > 0 + ? firstRelevantHits.reduce((sum, value) => sum + value, 0) / firstRelevantHits.length + : null, + bestExampleHitCount, + bestExampleHitRate: + bestExampleResults.length > 0 ? bestExampleHitCount / bestExampleResults.length : null, + safeTaskReadyCount, + safeTaskReadyRate: safeResults.length > 0 ? safeTaskReadyCount / safeResults.length : null, + unsafeTaskAbstainCount, + unsafeTaskAbstainRate: + unsafeResults.length > 0 ? unsafeTaskAbstainCount / unsafeResults.length : null, + unsafeReadyFalsePositiveCount, + unsafeReadyFalsePositiveRate: + unsafeResults.length > 0 ? unsafeReadyFalsePositiveCount / unsafeResults.length : null, + results + }; +} + +function evaluateTask( + task: EditPreflightTask, + response: EditPreflightResponse +): EditPreflightTaskResult { + const topFiles = (response.results ?? []) + .map((result) => (typeof result.file === 'string' ? stripLocationSuffix(result.file) : '')) + .filter((filePath): filePath is string => Boolean(filePath)); + const firstRelevantHit = findFirstRelevantHit(topFiles, task.expectedTargetPatterns); + const bestExample = + typeof response.preflight?.bestExample === 'string' ? response.preflight.bestExample : null; + const bestExampleHit = + task.expectedBestExamplePatterns && task.expectedBestExamplePatterns.length > 0 + ? bestExample !== null && matchesPatterns(bestExample, task.expectedBestExamplePatterns) + : null; + + return { + taskId: task.id, + title: task.title, + query: task.query, + risk: task.risk, + ready: response.preflight?.ready === true, + abstain: response.preflight?.abstain === true, + searchQualityStatus: response.searchQuality?.status ?? 'unknown', + topFiles, + firstRelevantHit, + topTargetInTop3: + task.expectedTargetPatterns && task.expectedTargetPatterns.length > 0 + ? firstRelevantHit !== null && firstRelevantHit <= 3 + : null, + bestExample, + bestExampleHit, + ...(typeof response.preflight?.nextAction === 'string' && { + nextAction: response.preflight.nextAction + }), + ...(Array.isArray(response.preflight?.warnings) && + response.preflight.warnings.length > 0 && { warnings: response.preflight.warnings }), + ...(Array.isArray(response.preflight?.whatWouldHelp) && + response.preflight.whatWouldHelp.length > 0 && { + whatWouldHelp: response.preflight.whatWouldHelp + }) + }; +} + +async function runSearchPreflight( + task: EditPreflightTask, + rootPath: string +): Promise { + const project = createProjectState(rootPath); + project.indexState.status = 'ready'; + + const response = await searchCodebaseHandle( + { + query: task.query, + intent: 'edit', + limit: task.limit ?? 5 + }, + { + indexState: project.indexState, + paths: project.paths, + rootPath: project.rootPath, + performIndexing: () => undefined + } + ); + const payload = response.content?.[0]?.text ?? '{}'; + const parsed = JSON.parse(payload) as unknown; + + if (typeof parsed === 'object' && parsed !== null) { + return parsed as EditPreflightResponse; + } + + return {}; +} + +export async function evaluateEditPreflightFixture({ + fixture, + rootPath, + runner = runSearchPreflight +}: EvaluateEditPreflightFixtureParams): Promise { + const results: EditPreflightTaskResult[] = []; + + for (const task of fixture.tasks) { + const response = await runner(task, rootPath); + results.push(evaluateTask(task, response)); + } + + return summarizeEditPreflightResults(results); +} + +export function combineEditPreflightSummaries( + summaries: EditPreflightSummary[] +): EditPreflightSummary { + return summarizeEditPreflightResults(summaries.flatMap((summary) => summary.results)); +} + +function formatRate(value: number | null): string { + if (value === null) { + return 'n/a'; + } + + return `${(value * 100).toFixed(0)}%`; +} + +function formatHit(value: number | null): string { + return value === null ? 'n/a' : value.toFixed(2); +} + +export function formatEditPreflightReport({ + codebaseLabel, + fixturePath, + summary +}: FormatEditPreflightReportParams): string { + const lines: string[] = []; + const unsafeFalsePositives = summary.results.filter( + (result) => result.risk === 'unsafe' && result.ready + ); + const safeMisses = summary.results.filter((result) => result.risk === 'safe' && !result.ready); + + lines.push(`\n=== Edit Preflight Eval Report: ${codebaseLabel} ===`); + lines.push(`Fixture: ${fixturePath}`); + lines.push( + `Tasks: ${summary.totalTasks} (${summary.safeTasks} safe, ${summary.unsafeTasks} unsafe)` + ); + lines.push( + `Top-target in top-3: ${summary.topTargetInTop3Count}/${summary.targetableTasks} (${formatRate(summary.topTargetInTop3Rate)})` + ); + lines.push(`Average first relevant hit: ${formatHit(summary.averageFirstRelevantHit)}`); + lines.push( + `Best-example hit rate: ${summary.bestExampleHitCount}/${summary.bestExampleTasks} (${formatRate(summary.bestExampleHitRate)})` + ); + lines.push( + `Safe-task ready rate: ${summary.safeTaskReadyCount}/${summary.safeTasks} (${formatRate(summary.safeTaskReadyRate)})` + ); + lines.push( + `Unsafe-task abstain rate: ${summary.unsafeTaskAbstainCount}/${summary.unsafeTasks} (${formatRate(summary.unsafeTaskAbstainRate)})` + ); + lines.push( + `Unsafe ready=true false-positive rate: ${summary.unsafeReadyFalsePositiveCount}/${summary.unsafeTasks} (${formatRate(summary.unsafeReadyFalsePositiveRate)})` + ); + lines.push(''); + lines.push('Task results:'); + + for (const result of summary.results) { + const taskLine = [ + `- ${result.taskId}`, + `[${result.risk}]`, + `ready=${result.ready ? 'yes' : 'no'}`, + `abstain=${result.abstain ? 'yes' : 'no'}`, + `firstRelevant=${result.firstRelevantHit ?? 'n/a'}`, + `top3=${result.topTargetInTop3 === null ? 'n/a' : result.topTargetInTop3 ? 'hit' : 'miss'}`, + `bestExample=${result.bestExampleHit === null ? 'n/a' : result.bestExampleHit ? 'hit' : 'miss'}`, + `quality=${result.searchQualityStatus}` + ]; + lines.push(taskLine.join(' ')); + } + + lines.push(''); + lines.push('Unsafe false positives:'); + if (unsafeFalsePositives.length === 0) { + lines.push(' (none)'); + } else { + for (const result of unsafeFalsePositives) { + lines.push(` - ${result.taskId}: "${result.query}"`); + } + } + + lines.push(''); + lines.push('Safe misses:'); + if (safeMisses.length === 0) { + lines.push(' (none)'); + } else { + for (const result of safeMisses) { + lines.push(` - ${result.taskId}: "${result.query}"`); + if (result.nextAction) { + lines.push(` next: ${result.nextAction}`); + } + } + } + + lines.push('================================'); + return lines.join('\n'); +} + +export type { EditPreflightRunner }; diff --git a/src/eval/run-config.ts b/src/eval/run-config.ts new file mode 100644 index 0000000..3484a2f --- /dev/null +++ b/src/eval/run-config.ts @@ -0,0 +1,37 @@ +import path from 'path'; + +export type EvalMode = 'retrieval' | 'discovery' | 'edit-preflight'; + +export interface EvalFixtureDefaults { + fixtureA: string; + fixtureB: string; +} + +export function resolveEvalMode(rawMode: string | undefined): EvalMode { + if (rawMode === 'discovery' || rawMode === 'edit-preflight') { + return rawMode; + } + + return 'retrieval'; +} + +export function getDefaultFixturePaths(projectRoot: string, mode: EvalMode): EvalFixtureDefaults { + if (mode === 'discovery') { + return { + fixtureA: path.join(projectRoot, 'tests', 'fixtures', 'discovery-angular-spotify.json'), + fixtureB: path.join(projectRoot, 'tests', 'fixtures', 'discovery-excalidraw.json') + }; + } + + if (mode === 'edit-preflight') { + return { + fixtureA: path.join(projectRoot, 'tests', 'fixtures', 'edit-preflight-angular-spotify.json'), + fixtureB: path.join(projectRoot, 'tests', 'fixtures', 'edit-preflight-excalidraw.json') + }; + } + + return { + fixtureA: path.join(projectRoot, 'tests', 'fixtures', 'eval-angular-spotify.json'), + fixtureB: path.join(projectRoot, 'tests', 'fixtures', 'eval-controlled.json') + }; +} diff --git a/src/eval/types.ts b/src/eval/types.ts index a4ced39..d80e777 100644 --- a/src/eval/types.ts +++ b/src/eval/types.ts @@ -64,6 +64,102 @@ export interface FormatEvalReportParams { redactPaths?: boolean; } +export type EditPreflightRisk = 'safe' | 'unsafe'; + +export interface EditPreflightTask { + id: string; + title: string; + query: string; + risk: EditPreflightRisk; + expectedTargetPatterns?: string[]; + expectedBestExamplePatterns?: string[]; + limit?: number; + notes?: string; +} + +export interface EditPreflightFixture { + description?: string; + codebase?: string; + repository?: string; + repositoryUrl?: string; + repositoryRef?: string; + frozenDate?: string; + notes?: string; + tasks: EditPreflightTask[]; +} + +export interface EditPreflightTaskResult { + taskId: string; + title: string; + query: string; + risk: EditPreflightRisk; + ready: boolean; + abstain: boolean; + searchQualityStatus: 'ok' | 'low_confidence' | 'unknown'; + topFiles: string[]; + firstRelevantHit: number | null; + topTargetInTop3: boolean | null; + bestExample: string | null; + bestExampleHit: boolean | null; + nextAction?: string; + warnings?: string[]; + whatWouldHelp?: string[]; +} + +export interface EditPreflightSummary { + totalTasks: number; + safeTasks: number; + unsafeTasks: number; + targetableTasks: number; + bestExampleTasks: number; + topTargetInTop3Count: number; + topTargetInTop3Rate: number | null; + averageFirstRelevantHit: number | null; + bestExampleHitCount: number; + bestExampleHitRate: number | null; + safeTaskReadyCount: number; + safeTaskReadyRate: number | null; + unsafeTaskAbstainCount: number; + unsafeTaskAbstainRate: number | null; + unsafeReadyFalsePositiveCount: number; + unsafeReadyFalsePositiveRate: number | null; + results: EditPreflightTaskResult[]; +} + +export interface EvaluateEditPreflightFixtureParams { + fixture: EditPreflightFixture; + rootPath: string; + runner?: EditPreflightRunner; +} + +export interface FormatEditPreflightReportParams { + codebaseLabel: string; + fixturePath: string; + summary: EditPreflightSummary; +} + +export interface EditPreflightResponse { + preflight?: { + ready?: boolean; + abstain?: boolean; + bestExample?: string; + nextAction?: string; + warnings?: string[]; + whatWouldHelp?: string[]; + }; + searchQuality?: { + status?: 'ok' | 'low_confidence'; + }; + results?: Array<{ + file?: string; + }>; +} + +export type EditPreflightRunner = ( + task: EditPreflightTask, + rootPath: string +) => Promise; + export type DiscoveryJob = 'map' | 'find' | 'search'; export type DiscoverySurface = diff --git a/src/health/derive.ts b/src/health/derive.ts new file mode 100644 index 0000000..27b5c78 --- /dev/null +++ b/src/health/derive.ts @@ -0,0 +1,207 @@ +import type { CodeChunk, CodebaseHealthArtifact, CodebaseHealthFile } from '../types/index.js'; +import { InternalFileGraph } from '../utils/usage-tracker.js'; + +interface DeriveCodebaseHealthParams { + buildId: string; + formatVersion: number; + generatedAt: string; + chunks: CodeChunk[]; + graph: InternalFileGraph; +} + +interface FileMetrics { + importCount: number; + importerCount: number; + cycleCount: number; + maxCyclomaticComplexity: number; + hotspotRank?: number; +} + +type FileMetricsMap = Map; + +function normalizePathLike(filePath: string): string { + return filePath.replace(/\\/g, '/').replace(/^\.\//, ''); +} + +function collectFileMetrics(chunks: CodeChunk[], graph: InternalFileGraph): FileMetricsMap { + const metrics = new Map(); + const graphJson = graph.toJSON(); + const reverseImports = new Map>(); + + for (const [file, deps] of Object.entries(graphJson.imports)) { + const normalizedFile = normalizePathLike(file); + const fileMetrics = metrics.get(normalizedFile) ?? { + importCount: 0, + importerCount: 0, + cycleCount: 0, + maxCyclomaticComplexity: 0 + }; + fileMetrics.importCount = deps.length; + metrics.set(normalizedFile, fileMetrics); + + for (const dependency of deps) { + const normalizedDependency = normalizePathLike(dependency); + const importers = reverseImports.get(normalizedDependency) ?? new Set(); + importers.add(normalizedFile); + reverseImports.set(normalizedDependency, importers); + } + } + + for (const [file, importers] of reverseImports.entries()) { + const fileMetrics = metrics.get(file) ?? { + importCount: 0, + importerCount: 0, + cycleCount: 0, + maxCyclomaticComplexity: 0 + }; + fileMetrics.importerCount = importers.size; + metrics.set(file, fileMetrics); + } + + for (const chunk of chunks) { + const file = normalizePathLike(chunk.relativePath || chunk.filePath); + const fileMetrics = metrics.get(file) ?? { + importCount: 0, + importerCount: 0, + cycleCount: 0, + maxCyclomaticComplexity: 0 + }; + const chunkComplexity = + typeof chunk.metadata?.cyclomaticComplexity === 'number' + ? chunk.metadata.cyclomaticComplexity + : typeof chunk.metadata?.complexity === 'number' + ? chunk.metadata.complexity + : 0; + fileMetrics.maxCyclomaticComplexity = Math.max( + fileMetrics.maxCyclomaticComplexity, + chunkComplexity + ); + metrics.set(file, fileMetrics); + } + + const hotspotRanks = Array.from(metrics.entries()) + .map(([file, fileMetrics]) => ({ + file, + combined: fileMetrics.importCount + fileMetrics.importerCount + })) + .filter((entry) => entry.combined > 0) + .sort((a, b) => b.combined - a.combined || a.file.localeCompare(b.file)); + + hotspotRanks.forEach((entry, index) => { + const fileMetrics = metrics.get(entry.file); + if (fileMetrics) { + fileMetrics.hotspotRank = index + 1; + } + }); + + for (const cycle of graph.findCycles()) { + for (const file of cycle.files.slice(0, -1)) { + const normalizedFile = normalizePathLike(file); + const fileMetrics = metrics.get(normalizedFile) ?? { + importCount: 0, + importerCount: 0, + cycleCount: 0, + maxCyclomaticComplexity: 0 + }; + fileMetrics.cycleCount += 1; + metrics.set(normalizedFile, fileMetrics); + } + } + + return metrics; +} + +function getHealthLevel(fileMetrics: FileMetrics): CodebaseHealthFile { + const reasons: string[] = []; + let score = 0; + + if (fileMetrics.cycleCount > 0) { + score += 3; + reasons.push( + `Participates in ${fileMetrics.cycleCount} circular dependenc${fileMetrics.cycleCount === 1 ? 'y' : 'ies'}` + ); + } + + if (fileMetrics.importerCount >= 8) { + score += 2; + reasons.push(`High fan-in: ${fileMetrics.importerCount} files depend on it`); + } else if (fileMetrics.importerCount >= 4) { + score += 1; + reasons.push(`Shared dependency for ${fileMetrics.importerCount} files`); + } + + if (fileMetrics.hotspotRank && fileMetrics.hotspotRank <= 5) { + score += 2; + reasons.push(`Hotspot rank #${fileMetrics.hotspotRank} by graph centrality`); + } else if (fileMetrics.hotspotRank && fileMetrics.hotspotRank <= 10) { + score += 1; + reasons.push(`Top-10 hotspot by graph centrality`); + } + + if (fileMetrics.maxCyclomaticComplexity >= 18) { + score += 2; + reasons.push(`Complex implementation (cyclomatic ${fileMetrics.maxCyclomaticComplexity})`); + } else if (fileMetrics.maxCyclomaticComplexity >= 10) { + score += 1; + reasons.push(`Moderate code complexity (cyclomatic ${fileMetrics.maxCyclomaticComplexity})`); + } + + const level = score >= 4 ? 'high' : score >= 2 ? 'medium' : ('low' as const); + + return { + file: '', + level, + score, + reasons: reasons.slice(0, 3), + signals: { + ...(fileMetrics.hotspotRank ? { hotspotRank: fileMetrics.hotspotRank } : {}), + ...(fileMetrics.importerCount > 0 ? { importerCount: fileMetrics.importerCount } : {}), + ...(fileMetrics.importCount > 0 ? { importCount: fileMetrics.importCount } : {}), + ...(fileMetrics.cycleCount > 0 ? { cycleCount: fileMetrics.cycleCount } : {}), + ...(fileMetrics.maxCyclomaticComplexity > 0 + ? { maxCyclomaticComplexity: fileMetrics.maxCyclomaticComplexity } + : {}) + } + }; +} + +export function deriveCodebaseHealth({ + buildId, + formatVersion, + generatedAt, + chunks, + graph +}: DeriveCodebaseHealthParams): CodebaseHealthArtifact { + const fileMetrics = collectFileMetrics(chunks, graph); + const files = Array.from(fileMetrics.entries()) + .map(([file, metrics]) => { + const health = getHealthLevel(metrics); + return { + ...health, + file + }; + }) + .sort((a, b) => { + const priority = { high: 0, medium: 1, low: 2 }; + const levelDelta = priority[a.level] - priority[b.level]; + if (levelDelta !== 0) return levelDelta; + if (b.score !== a.score) return b.score - a.score; + return a.file.localeCompare(b.file); + }); + + const highRiskFiles = files.filter((file) => file.level === 'high').length; + const mediumRiskFiles = files.filter((file) => file.level === 'medium').length; + const lowRiskFiles = files.length - highRiskFiles - mediumRiskFiles; + + return { + header: { buildId, formatVersion }, + generatedAt, + summary: { + files: files.length, + highRiskFiles, + mediumRiskFiles, + lowRiskFiles + }, + files + }; +} diff --git a/src/health/store.ts b/src/health/store.ts new file mode 100644 index 0000000..f49e151 --- /dev/null +++ b/src/health/store.ts @@ -0,0 +1,126 @@ +import { promises as fs } from 'fs'; +import type { CodebaseHealthArtifact, CodebaseHealthFile } from '../types/index.js'; + +function isRecord(value: unknown): value is Record { + return typeof value === 'object' && value !== null; +} + +function normalizePathLike(filePath: string): string { + return filePath.replace(/\\/g, '/').replace(/^\.\//, ''); +} + +function normalizeHealthFile(raw: unknown): CodebaseHealthFile | null { + if (!isRecord(raw)) return null; + const file = typeof raw.file === 'string' ? normalizePathLike(raw.file) : undefined; + const level = + raw.level === 'low' || raw.level === 'medium' || raw.level === 'high' ? raw.level : undefined; + const score = typeof raw.score === 'number' ? raw.score : undefined; + const reasons = Array.isArray(raw.reasons) + ? raw.reasons.filter((value): value is string => typeof value === 'string') + : []; + + if (!file || !level || score === undefined) return null; + + const rawSignals = isRecord(raw.signals) ? raw.signals : undefined; + const signals = rawSignals + ? { + ...(typeof rawSignals.hotspotRank === 'number' + ? { hotspotRank: rawSignals.hotspotRank } + : {}), + ...(typeof rawSignals.importerCount === 'number' + ? { importerCount: rawSignals.importerCount } + : {}), + ...(typeof rawSignals.importCount === 'number' + ? { importCount: rawSignals.importCount } + : {}), + ...(typeof rawSignals.cycleCount === 'number' ? { cycleCount: rawSignals.cycleCount } : {}), + ...(typeof rawSignals.maxCyclomaticComplexity === 'number' + ? { maxCyclomaticComplexity: rawSignals.maxCyclomaticComplexity } + : {}) + } + : undefined; + return { + file, + level, + score, + reasons, + ...(signals && Object.keys(signals).length > 0 && { signals }) + }; +} + +export function normalizeHealthArtifact(raw: unknown): CodebaseHealthArtifact | null { + if ( + !isRecord(raw) || + !isRecord(raw.header) || + !isRecord(raw.summary) || + !Array.isArray(raw.files) + ) { + return null; + } + + const buildId = + typeof raw.header.buildId === 'string' && raw.header.buildId ? raw.header.buildId : undefined; + const formatVersion = + typeof raw.header.formatVersion === 'number' ? raw.header.formatVersion : undefined; + const generatedAt = + typeof raw.generatedAt === 'string' && raw.generatedAt ? raw.generatedAt : undefined; + + if (!buildId || formatVersion === undefined || !generatedAt) { + return null; + } + + const files = raw.files + .map((entry) => normalizeHealthFile(entry)) + .filter((entry): entry is CodebaseHealthFile => entry !== null); + + const summary = raw.summary; + const filesCount = typeof summary.files === 'number' ? summary.files : files.length; + const highRiskFiles = typeof summary.highRiskFiles === 'number' ? summary.highRiskFiles : 0; + const mediumRiskFiles = typeof summary.mediumRiskFiles === 'number' ? summary.mediumRiskFiles : 0; + const lowRiskFiles = typeof summary.lowRiskFiles === 'number' ? summary.lowRiskFiles : 0; + + return { + header: { buildId, formatVersion }, + generatedAt, + summary: { + files: filesCount, + highRiskFiles, + mediumRiskFiles, + lowRiskFiles + }, + files + }; +} + +export async function readHealthFile(healthPath: string): Promise { + try { + const content = await fs.readFile(healthPath, 'utf-8'); + return normalizeHealthArtifact(JSON.parse(content)); + } catch { + return null; + } +} + +export function normalizeHealthLookupKey(filePath: string, rootPath?: string): string { + const normalized = filePath.replace(/\\/g, '/').replace(/^\.\//, ''); + if (!rootPath) { + return normalized; + } + const normalizedRoot = rootPath.replace(/\\/g, '/').replace(/\/$/, ''); + if (normalized.startsWith(normalizedRoot)) { + return normalized.slice(normalizedRoot.length).replace(/^\//, ''); + } + return normalized; +} + +export function indexHealthByFile( + artifact: CodebaseHealthArtifact | null, + rootPath?: string +): Map { + const map = new Map(); + if (!artifact) return map; + for (const fileHealth of artifact.files) { + map.set(normalizeHealthLookupKey(fileHealth.file, rootPath), fileHealth); + } + return map; +} diff --git a/src/index.ts b/src/index.ts index a4d7c73..8d7386b 100644 --- a/src/index.ts +++ b/src/index.ts @@ -9,20 +9,9 @@ import { promises as fs } from 'fs'; import path from 'path'; import { fileURLToPath } from 'url'; -import { Server } from '@modelcontextprotocol/sdk/server/index.js'; -import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'; -import { createServer } from './server/factory.js'; -import { startHttpServer } from './server/http.js'; -import { loadServerConfig } from './server/config.js'; +import type { Server } from '@modelcontextprotocol/sdk/server/index.js'; import type { ProjectConfig } from './server/config.js'; -import { - CallToolRequestSchema, - ListToolsRequestSchema, - ListResourcesRequestSchema, - ReadResourceRequestSchema, - RootsListChangedNotificationSchema, - Resource -} from '@modelcontextprotocol/sdk/types.js'; +import type { Resource } from '@modelcontextprotocol/sdk/types.js'; import { CodebaseIndexer } from './core/indexer.js'; import { analyzerRegistry } from './core/analyzer-registry.js'; @@ -37,11 +26,18 @@ import { startFileWatcher } from './core/file-watcher.js'; import { parseGitLogLineToMemory } from './memory/git-memory.js'; import { CONTEXT_RESOURCE_URI, + FULL_CONTEXT_RESOURCE_URI, buildProjectContextResourceUri, + buildProjectFullContextResourceUri, getProjectPathFromContextResourceUri, - isContextResourceUri + getProjectPathFromFullContextResourceUri, + isContextResourceUri, + isFullContextResourceUri } from './resources/uri.js'; -import { generateCodebaseIntelligence } from './resources/codebase-intelligence.js'; +import { + generateCodebaseIntelligence, + generateFullCodebaseIntelligence +} from './resources/codebase-intelligence.js'; import { EXCLUDED_GLOB_PATTERNS } from './constants/codebase-context.js'; import { discoverProjectsWithinRoot, @@ -69,8 +65,44 @@ analyzerRegistry.register(new NextJsAnalyzer()); analyzerRegistry.register(new ReactAnalyzer()); analyzerRegistry.register(new GenericAnalyzer()); +let createServer!: typeof import('./server/factory.js').createServer; +let startHttpServer!: typeof import('./server/http.js').startHttpServer; +let loadServerConfig!: typeof import('./server/config.js').loadServerConfig; +let StdioServerTransport!: typeof import('@modelcontextprotocol/sdk/server/stdio.js').StdioServerTransport; +let CallToolRequestSchema!: typeof import('@modelcontextprotocol/sdk/types.js').CallToolRequestSchema; +let ListToolsRequestSchema!: typeof import('@modelcontextprotocol/sdk/types.js').ListToolsRequestSchema; +let ListResourcesRequestSchema!: typeof import('@modelcontextprotocol/sdk/types.js').ListResourcesRequestSchema; +let ReadResourceRequestSchema!: typeof import('@modelcontextprotocol/sdk/types.js').ReadResourceRequestSchema; +let RootsListChangedNotificationSchema!: typeof import('@modelcontextprotocol/sdk/types.js').RootsListChangedNotificationSchema; +let server!: Server; +let mcpRuntimeReady = false; +let mcpRuntimePromise: Promise | undefined; + // Flags that are NOT project paths — skip them when resolving the bootstrap root. const KNOWN_FLAGS = new Set(['--http', '--port', '--help']); +const CLI_SUBCOMMANDS = [ + 'memory', + 'search', + 'metadata', + 'status', + 'reindex', + 'style-guide', + 'patterns', + 'refs', + 'cycles', + 'init', + 'map' +]; + +// Check if this module is the entry point. +const isDirectRun = + process.argv[1]?.replace(/\\/g, '/').endsWith('index.js') || + process.argv[1]?.replace(/\\/g, '/').endsWith('index.ts'); +const directSubcommand = process.argv[2]; +const isDirectCliSubcommand = + isDirectRun && + typeof directSubcommand === 'string' && + (CLI_SUBCOMMANDS.includes(directSubcommand) || directSubcommand === '--help'); // Resolve optional bootstrap root with validation handled later in main(). function resolveRootPath(): string | undefined { @@ -115,6 +147,59 @@ const MAX_WATCHED_PROJECTS = 5; const PROJECT_DISCOVERY_MAX_DEPTH = 4; const debounceEnv = Number.parseInt(process.env.CODEBASE_CONTEXT_DEBOUNCE_MS ?? '', 10); const watcherDebounceMs = Number.isFinite(debounceEnv) && debounceEnv >= 0 ? debounceEnv : 2000; +const stdioIdleTimeoutEnv = Number.parseInt( + process.env.CODEBASE_CONTEXT_STDIO_IDLE_TIMEOUT_MS ?? '', + 10 +); +const STDIO_IDLE_TIMEOUT_MS = + Number.isFinite(stdioIdleTimeoutEnv) && stdioIdleTimeoutEnv >= 0 + ? stdioIdleTimeoutEnv + : 10 * 60 * 1000; +let noteSessionActivity: () => void = () => undefined; +let beginTrackedSessionWork: () => void = () => undefined; +let endTrackedSessionWork: () => void = () => undefined; + +async function withTrackedSessionActivity(handler: () => Promise): Promise { + noteSessionActivity(); + beginTrackedSessionWork(); + try { + return await handler(); + } finally { + noteSessionActivity(); + endTrackedSessionWork(); + } +} + +async function ensureMcpRuntimeLoaded(): Promise { + if (mcpRuntimeReady) { + return; + } + + mcpRuntimePromise ??= (async () => { + const [factoryModule, httpModule, configModule, stdioModule, sdkTypesModule] = + await Promise.all([ + import('./server/factory.js'), + import('./server/http.js'), + import('./server/config.js'), + import('@modelcontextprotocol/sdk/server/stdio.js'), + import('@modelcontextprotocol/sdk/types.js') + ]); + + createServer = factoryModule.createServer; + startHttpServer = httpModule.startHttpServer; + loadServerConfig = configModule.loadServerConfig; + StdioServerTransport = stdioModule.StdioServerTransport; + CallToolRequestSchema = sdkTypesModule.CallToolRequestSchema; + ListToolsRequestSchema = sdkTypesModule.ListToolsRequestSchema; + ListResourcesRequestSchema = sdkTypesModule.ListResourcesRequestSchema; + ReadResourceRequestSchema = sdkTypesModule.ReadResourceRequestSchema; + RootsListChangedNotificationSchema = sdkTypesModule.RootsListChangedNotificationSchema; + server = createServer({ name: 'codebase-context', version: PKG_VERSION }, registerHandlers); + mcpRuntimeReady = true; + })(); + + await mcpRuntimePromise; +} type ProjectResolution = | { ok: true; project: ProjectState } @@ -558,7 +643,8 @@ export const INDEX_CONSUMING_TOOL_NAMES = [ 'get_symbol_references', 'detect_circular_dependencies', 'get_team_patterns', - 'get_codebase_metadata' + 'get_codebase_metadata', + 'get_codebase_health' ] as const; export const INDEX_CONSUMING_RESOURCE_NAMES = ['Codebase Intelligence'] as const; @@ -835,172 +921,207 @@ const PKG_VERSION: string = JSON.parse( * same handler logic that closes over module-level state. */ export function registerHandlers(target: Server): void { - target.setRequestHandler(ListToolsRequestSchema, async () => { - return { tools: TOOLS }; - }); - - target.setRequestHandler(ListResourcesRequestSchema, async () => { - return { resources: buildResources() }; - }); - - target.setRequestHandler(ReadResourceRequestSchema, async (request) => { - const uri = request.params.uri; - const explicitProjectPath = getProjectPathFromContextResourceUri(uri); + target.setRequestHandler(ListToolsRequestSchema, async () => + withTrackedSessionActivity(async () => ({ tools: TOOLS })) + ); - if (explicitProjectPath) { - const selection = await resolveProjectSelector(explicitProjectPath); - if (!selection.ok) { - throw new Error(`Unknown project resource: ${uri}`); - } + target.setRequestHandler(ListResourcesRequestSchema, async () => + withTrackedSessionActivity(async () => ({ resources: buildResources() })) + ); - const project = selection.project; - await initProject(project.rootPath, watcherDebounceMs, { enableWatcher: true }); - setActiveProject(project.rootPath); - return { - contents: [ - { - uri: buildProjectContextResourceUri(project.rootPath), - mimeType: 'text/plain', - text: await generateCodebaseIntelligence(project) - } - ] - }; - } + target.setRequestHandler(ReadResourceRequestSchema, async (request) => + withTrackedSessionActivity(async () => { + const uri = request.params.uri; + const explicitProjectPath = getProjectPathFromContextResourceUri(uri); + const explicitFullProjectPath = getProjectPathFromFullContextResourceUri(uri); - if (isContextResourceUri(uri)) { - const project = await resolveProjectForResource(); - return { - contents: [ - { - uri: CONTEXT_RESOURCE_URI, - mimeType: 'text/plain', - text: project - ? await generateCodebaseIntelligence(project) - : buildProjectSelectionMessage() - } - ] - }; - } + if (explicitFullProjectPath) { + const selection = await resolveProjectSelector(explicitFullProjectPath); + if (!selection.ok) { + throw new Error(`Unknown project resource: ${uri}`); + } - throw new Error(`Unknown resource: ${uri}`); - }); + const project = selection.project; + await initProject(project.rootPath, watcherDebounceMs, { enableWatcher: true }); + setActiveProject(project.rootPath); + return { + contents: [ + { + uri: buildProjectFullContextResourceUri(project.rootPath), + mimeType: 'text/plain', + text: await generateFullCodebaseIntelligence(project) + } + ] + }; + } - target.setRequestHandler(CallToolRequestSchema, async (request) => { - const { name, arguments: args } = request.params; - const normalizedArgs = - args && typeof args === 'object' && !Array.isArray(args) - ? (args as Record) - : {}; + if (explicitProjectPath) { + const selection = await resolveProjectSelector(explicitProjectPath); + if (!selection.ok) { + throw new Error(`Unknown project resource: ${uri}`); + } - try { - if (!toolNames.has(name)) { - return await dispatchTool(name, normalizedArgs, createWorkspaceToolContext()); + const project = selection.project; + await initProject(project.rootPath, watcherDebounceMs, { enableWatcher: true }); + setActiveProject(project.rootPath); + return { + contents: [ + { + uri: buildProjectContextResourceUri(project.rootPath), + mimeType: 'text/plain', + text: await generateCodebaseIntelligence(project) + } + ] + }; } - const projectResolution = await resolveProjectForTool(normalizedArgs); - if (!projectResolution.ok) { - return projectResolution.response; + if (isFullContextResourceUri(uri)) { + const project = await resolveProjectForResource(); + return { + contents: [ + { + uri: FULL_CONTEXT_RESOURCE_URI, + mimeType: 'text/plain', + text: project + ? await generateFullCodebaseIntelligence(project) + : buildProjectSelectionMessage() + } + ] + }; } - const project = projectResolution.project; - - // Gate INDEX_CONSUMING tools on a valid, healthy index - let indexSignal: IndexSignal | undefined; - if ((INDEX_CONSUMING_TOOL_NAMES as readonly string[]).includes(name)) { - if (project.indexState.status === 'indexing') { - return { - content: [ - { - type: 'text', - text: JSON.stringify({ - status: 'indexing', - message: 'Index build in progress - please retry shortly' - }) - } - ] - }; - } - if (project.indexState.status === 'error') { - return { - content: [ - { - type: 'text', - text: JSON.stringify({ - status: 'error', - message: `Indexer error: ${project.indexState.error}` - }) - } - ] - }; - } - indexSignal = await ensureValidIndexOrAutoHeal(project); - if (indexSignal.action === 'rebuild-started') { - return { - content: [ - { - type: 'text', - text: JSON.stringify({ - status: 'indexing', - message: 'Index rebuild in progress - please retry shortly', - index: indexSignal - }) - } - ] - }; - } + if (isContextResourceUri(uri)) { + const project = await resolveProjectForResource(); + return { + contents: [ + { + uri: CONTEXT_RESOURCE_URI, + mimeType: 'text/plain', + text: project + ? await generateCodebaseIntelligence(project) + : buildProjectSelectionMessage() + } + ] + }; } - const result = await dispatchTool(name, normalizedArgs, createToolContext(project)); + throw new Error(`Unknown resource: ${uri}`); + }) + ); - // Inject routing/index metadata into JSON responses so agents can reuse the resolved project safely. - if (indexSignal !== undefined && result.content?.[0]) { - try { - const parsed = JSON.parse(result.content[0].text); - result.content[0] = { - type: 'text', - text: finalizeJsonTextPayload({ - ...parsed, - index: indexSignal, - project: buildProjectDescriptor(project.rootPath) - }) - }; - } catch { - /* response wasn't JSON, skip injection */ + target.setRequestHandler(CallToolRequestSchema, async (request) => + withTrackedSessionActivity(async () => { + const { name, arguments: args } = request.params; + const normalizedArgs = + args && typeof args === 'object' && !Array.isArray(args) + ? (args as Record) + : {}; + + try { + if (!toolNames.has(name)) { + return await dispatchTool(name, normalizedArgs, createWorkspaceToolContext()); } - } else if (result.content?.[0]) { - try { - const parsed = JSON.parse(result.content[0].text); - result.content[0] = { - type: 'text', - text: finalizeJsonTextPayload({ - ...parsed, - project: buildProjectDescriptor(project.rootPath) - }) - }; - } catch { - /* response wasn't JSON, skip injection */ + + const projectResolution = await resolveProjectForTool(normalizedArgs); + if (!projectResolution.ok) { + return projectResolution.response; } - } - return result; - } catch (error) { - return { - content: [ - { - type: 'text', - text: `Unexpected error: ${error instanceof Error ? error.message : String(error)}` + const project = projectResolution.project; + + // Gate INDEX_CONSUMING tools on a valid, healthy index + let indexSignal: IndexSignal | undefined; + if ((INDEX_CONSUMING_TOOL_NAMES as readonly string[]).includes(name)) { + if (project.indexState.status === 'indexing') { + return { + content: [ + { + type: 'text', + text: JSON.stringify({ + status: 'indexing', + message: 'Index build in progress - please retry shortly' + }) + } + ] + }; } - ], - isError: true - }; - } - }); -} + if (project.indexState.status === 'error') { + return { + content: [ + { + type: 'text', + text: JSON.stringify({ + status: 'error', + message: `Indexer error: ${project.indexState.error}` + }) + } + ] + }; + } + indexSignal = await ensureValidIndexOrAutoHeal(project); + if (indexSignal.action === 'rebuild-started') { + return { + content: [ + { + type: 'text', + text: JSON.stringify({ + status: 'indexing', + message: 'Index rebuild in progress - please retry shortly', + index: indexSignal + }) + } + ] + }; + } + } -const server: Server = createServer( - { name: 'codebase-context', version: PKG_VERSION }, - registerHandlers -); + const result = await dispatchTool(name, normalizedArgs, createToolContext(project)); + + // Inject routing/index metadata into JSON responses so agents can reuse the resolved project safely. + if (indexSignal !== undefined && result.content?.[0]) { + try { + const parsed = JSON.parse(result.content[0].text); + result.content[0] = { + type: 'text', + text: finalizeJsonTextPayload({ + ...parsed, + index: indexSignal, + project: buildProjectDescriptor(project.rootPath) + }) + }; + } catch { + /* response wasn't JSON, skip injection */ + } + } else if (result.content?.[0]) { + try { + const parsed = JSON.parse(result.content[0].text); + result.content[0] = { + type: 'text', + text: finalizeJsonTextPayload({ + ...parsed, + project: buildProjectDescriptor(project.rootPath) + }) + }; + } catch { + /* response wasn't JSON, skip injection */ + } + } + + return result; + } catch (error) { + return { + content: [ + { + type: 'text', + text: `Unexpected error: ${error instanceof Error ? error.message : String(error)}` + } + ], + isError: true + }; + } + }) + ); +} function buildResources(): Resource[] { const resources: Resource[] = [ @@ -1010,6 +1131,13 @@ function buildResources(): Resource[] { description: 'Context for the active project in this MCP session. In multi-project sessions, this falls back to a workspace overview until a project is selected.', mimeType: 'text/plain' + }, + { + uri: FULL_CONTEXT_RESOURCE_URI, + name: 'Codebase Intelligence (Full)', + description: + 'Exhaustive conventions map for the active project. Use when you explicitly need the unbounded map instead of the bounded first-call surface.', + mimeType: 'text/plain' } ]; @@ -1020,6 +1148,12 @@ function buildResources(): Resource[] { description: `Project-scoped context for ${project.label}.`, mimeType: 'text/plain' }); + resources.push({ + uri: buildProjectFullContextResourceUri(project.rootPath), + name: `Codebase Intelligence (Full) (${project.label})`, + description: `Exhaustive project-scoped context for ${project.label}.`, + mimeType: 'text/plain' + }); } return resources; @@ -1054,6 +1188,7 @@ function buildProjectSelectionMessage(): string { lines.push(`- ${project.label} [${project.indexStatus}]`); lines.push(` project: ${projectPathHint}`); lines.push(` resource: ${buildProjectContextResourceUri(project.rootPath)}`); + lines.push(` full resource: ${buildProjectFullContextResourceUri(project.rootPath)}`); } lines.push(''); lines.push('Recommended flow: retry the tool call with `project`.'); @@ -1234,6 +1369,8 @@ async function validateClientRootEntries( } async function refreshKnownRootsFromClient(): Promise { + await ensureMcpRuntimeLoaded(); + try { const { roots } = await server.listRoots(); const fileRoots = await validateClientRootEntries( @@ -1534,6 +1671,8 @@ async function applyServerConfig( } async function main() { + await ensureMcpRuntimeLoaded(); + const serverConfig = await loadServerConfig(); await applyServerConfig(serverConfig); @@ -1600,29 +1739,80 @@ async function main() { await server.connect(transport); // ── Cleanup guards (normal MCP lifecycle) ────────────────────────────────── + let shuttingDown = false; const stopAllWatchers = () => { for (const project of getAllProjects()) { project.stopWatcher?.(); } }; + let idleTimer: NodeJS.Timeout | undefined; + let activeSessionWork = 0; + let mcpClientInitialized = false; + + const clearIdleTimer = () => { + if (idleTimer) { + clearTimeout(idleTimer); + idleTimer = undefined; + } + }; + + const shutdown = (code: number) => { + if (shuttingDown) { + return; + } + shuttingDown = true; + clearIdleTimer(); + stopAllWatchers(); + process.exit(code); + }; + + const scheduleIdleShutdown = () => { + clearIdleTimer(); + if (!mcpClientInitialized || STDIO_IDLE_TIMEOUT_MS <= 0 || activeSessionWork > 0) { + return; + } + + idleTimer = setTimeout(() => { + if (!mcpClientInitialized || activeSessionWork > 0) { + return; + } + + console.error( + `No MCP activity for ${Math.round(STDIO_IDLE_TIMEOUT_MS / 1000)}s after initialize - exiting idle stdio session.` + ); + shutdown(0); + }, STDIO_IDLE_TIMEOUT_MS); + idleTimer.unref(); + }; + + noteSessionActivity = () => { + scheduleIdleShutdown(); + }; + beginTrackedSessionWork = () => { + activeSessionWork += 1; + clearIdleTimer(); + }; + endTrackedSessionWork = () => { + activeSessionWork = Math.max(0, activeSessionWork - 1); + scheduleIdleShutdown(); + }; + process.once('exit', stopAllWatchers); process.once('SIGINT', () => { - stopAllWatchers(); - process.exit(0); + shutdown(0); }); process.once('SIGTERM', () => { - stopAllWatchers(); - process.exit(0); + shutdown(0); }); process.once('SIGHUP', () => { - stopAllWatchers(); - process.exit(0); + shutdown(0); }); - process.stdin.on('end', () => process.exit(0)); - process.stdin.on('close', () => process.exit(0)); - server.onclose = () => process.exit(0); + process.stdin.on('end', () => shutdown(0)); + process.stdin.on('close', () => shutdown(0)); + process.stdin.on('error', () => shutdown(0)); + server.onclose = () => shutdown(0); // ── Zombie process prevention ────────────────────────────────────────────── // If no MCP client sends an `initialize` message within 30 seconds, this @@ -1630,7 +1820,6 @@ async function main() { // a shell or AI agent without a subcommand). Exit cleanly to avoid a zombie. const HANDSHAKE_TIMEOUT_MS = Number.parseInt(process.env.CODEBASE_CONTEXT_HANDSHAKE_TIMEOUT_MS ?? '', 10) || 30_000; - let mcpClientInitialized = false; const handshakeTimer = setTimeout(() => { if (!mcpClientInitialized) { @@ -1643,7 +1832,7 @@ async function main() { ' npx codebase-context search --query "..."\n' + ' npx codebase-context --help' ); - process.exit(1); + shutdown(1); } }, HANDSHAKE_TIMEOUT_MS); handshakeTimer.unref(); @@ -1655,6 +1844,7 @@ async function main() { server.oninitialized = async () => { mcpClientInitialized = true; clearTimeout(handshakeTimer); + scheduleIdleShutdown(); if (process.env.CODEBASE_CONTEXT_DEBUG) console.error('[DEBUG] Server ready'); @@ -1663,7 +1853,8 @@ async function main() { const startupRoots = getKnownRootPaths(); if (startupRoots.length === 1) { - await initProject(startupRoots[0], watcherDebounceMs, { enableWatcher: true }); + // Defer persistent watcher handles until the session actually touches a project. + await initProject(startupRoots[0], watcherDebounceMs, { enableWatcher: false }); setActiveProject(startupRoots[0]); } } catch (error) { @@ -1673,10 +1864,13 @@ async function main() { // Subscribe to root changes (lightweight — no project init cost) server.setNotificationHandler(RootsListChangedNotificationSchema, async () => { + noteSessionActivity(); try { await refreshKnownRootsFromClient(); } catch { /* best-effort */ + } finally { + noteSessionActivity(); } }); } @@ -1691,6 +1885,8 @@ export { performIndexing }; * sharing the same module-level project state. */ async function startHttp(explicitPort?: number): Promise { + await ensureMcpRuntimeLoaded(); + const serverConfig = await loadServerConfig(); await applyServerConfig(serverConfig); @@ -1764,25 +1960,9 @@ async function startHttp(explicitPort?: number): Promise { } } -// Only auto-start when run directly as CLI (not when imported as module) -// Check if this module is the entry point -const isDirectRun = - process.argv[1]?.replace(/\\/g, '/').endsWith('index.js') || - process.argv[1]?.replace(/\\/g, '/').endsWith('index.ts'); - -const CLI_SUBCOMMANDS = [ - 'memory', - 'search', - 'metadata', - 'status', - 'reindex', - 'style-guide', - 'patterns', - 'refs', - 'cycles', - 'init', - 'map' -]; +if (!isDirectCliSubcommand) { + await ensureMcpRuntimeLoaded(); +} if (isDirectRun) { const subcommand = process.argv[2]; diff --git a/src/memory/store.ts b/src/memory/store.ts index e3c2422..9082441 100644 --- a/src/memory/store.ts +++ b/src/memory/store.ts @@ -1,6 +1,6 @@ import { promises as fs } from 'fs'; import path from 'path'; -import type { Memory, MemoryCategory, MemoryType } from '../types/index.js'; +import type { Memory, MemoryCategory, MemoryScope, MemoryType } from '../types/index.js'; type RawMemory = Partial<{ id: unknown; @@ -11,6 +11,7 @@ type RawMemory = Partial<{ reason: unknown; date: unknown; source: unknown; + scope: unknown; }>; export type MemoryFilters = { @@ -23,6 +24,35 @@ function isRecord(value: unknown): value is Record { return typeof value === 'object' && value !== null; } +function normalizePathLike(value: string): string { + return value.replace(/\\/g, '/').replace(/^\.\//, ''); +} + +export function normalizeMemoryScope(raw: unknown): MemoryScope | undefined { + if (!isRecord(raw)) return undefined; + const kind = raw.kind; + if (kind === 'global') { + return { kind }; + } + if (kind === 'file' && typeof raw.file === 'string' && raw.file.trim()) { + return { kind, file: normalizePathLike(raw.file.trim()) }; + } + if ( + kind === 'symbol' && + typeof raw.file === 'string' && + raw.file.trim() && + typeof raw.symbol === 'string' && + raw.symbol.trim() + ) { + return { + kind, + file: normalizePathLike(raw.file.trim()), + symbol: raw.symbol.trim() + }; + } + return undefined; +} + export function normalizeMemory(raw: unknown): Memory | null { if (!isRecord(raw)) return null; const m = raw as RawMemory; @@ -42,7 +72,17 @@ export function normalizeMemory(raw: unknown): Memory | null { if (!id || !category || !memory || !reason || !date) return null; const source = m.source === 'git' ? ('git' as const) : undefined; - return { id, type, category, memory, reason, date, ...(source && { source }) }; + const scope = normalizeMemoryScope(m.scope); + return { + id, + type, + category, + memory, + reason, + date, + ...(source && { source }), + ...(scope && { scope }) + }; } export function normalizeMemories(raw: unknown): Memory[] { @@ -104,7 +144,7 @@ export function filterMemories(memories: Memory[], filters: MemoryFilters): Memo const terms = query.toLowerCase().split(/\s+/).filter(Boolean); if (terms.length > 0) { filtered = filtered.filter((m) => { - const haystack = `${m.memory} ${m.reason}`.toLowerCase(); + const haystack = `${m.memory} ${m.reason} ${formatMemoryScopeText(m.scope)}`.toLowerCase(); return terms.some((t) => haystack.includes(t)); }); } @@ -175,6 +215,30 @@ export function withConfidence(memories: Memory[], now?: Date): MemoryWithConfid })); } +export function formatMemoryScopeText(scope?: MemoryScope): string { + if (!scope || scope.kind === 'global') return ''; + if (scope.kind === 'file') { + return scope.file; + } + return `${scope.file} ${scope.symbol}`; +} + +export function buildMemoryIdentityParts(memory: { + type: MemoryType; + category: MemoryCategory; + memory: string; + reason: string; + scope?: MemoryScope; +}): string { + const scopePart = + !memory.scope || memory.scope.kind === 'global' + ? 'global' + : memory.scope.kind === 'file' + ? `file:${normalizePathLike(memory.scope.file)}` + : `symbol:${normalizePathLike(memory.scope.file)}:${memory.scope.symbol}`; + return `${memory.type}:${memory.category}:${memory.memory}:${memory.reason}:${scopePart}`; +} + export function applyUnfilteredLimit( memories: Memory[], filters: MemoryFilters, diff --git a/src/project-state.ts b/src/project-state.ts index bf12129..4155017 100644 --- a/src/project-state.ts +++ b/src/project-state.ts @@ -3,6 +3,7 @@ import { CODEBASE_CONTEXT_DIRNAME, MEMORY_FILENAME, INTELLIGENCE_FILENAME, + HEALTH_FILENAME, KEYWORD_INDEX_FILENAME, VECTOR_DB_DIRNAME } from './constants/codebase-context.js'; @@ -34,6 +35,7 @@ export function makePaths(rootPath: string): ToolPaths { baseDir: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME), memory: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME, MEMORY_FILENAME), intelligence: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME, INTELLIGENCE_FILENAME), + health: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME, HEALTH_FILENAME), keywordIndex: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME, KEYWORD_INDEX_FILENAME), vectorDb: path.join(rootPath, CODEBASE_CONTEXT_DIRNAME, VECTOR_DB_DIRNAME) }; diff --git a/src/resources/codebase-intelligence.ts b/src/resources/codebase-intelligence.ts index 50463da..e4f5691 100644 --- a/src/resources/codebase-intelligence.ts +++ b/src/resources/codebase-intelligence.ts @@ -10,7 +10,23 @@ import { buildCodebaseMap, renderMapMarkdown } from '../core/codebase-map.js'; */ export async function generateCodebaseIntelligence(project: ProjectState): Promise { try { - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'bounded' }); + return renderMapMarkdown(map); + } catch (error) { + return ( + '# Codebase Intelligence\n\n' + + 'Intelligence data not yet generated. Run indexing first.\n' + + `Error: ${error instanceof Error ? error.message : String(error)}` + ); + } +} + +/** + * Generate the exhaustive conventions-map payload for the explicit full-mode resources. + */ +export async function generateFullCodebaseIntelligence(project: ProjectState): Promise { + try { + const map = await buildCodebaseMap(project, { mode: 'full' }); return renderMapMarkdown(map); } catch (error) { return ( diff --git a/src/resources/uri.ts b/src/resources/uri.ts index dc67d9c..cb9e690 100644 --- a/src/resources/uri.ts +++ b/src/resources/uri.ts @@ -1,14 +1,13 @@ const CONTEXT_RESOURCE_URI = 'codebase://context'; const PROJECT_CONTEXT_RESOURCE_PREFIX = `${CONTEXT_RESOURCE_URI}/project/`; +const FULL_CONTEXT_RESOURCE_URI = `${CONTEXT_RESOURCE_URI}/full`; +const FULL_PROJECT_CONTEXT_RESOURCE_PREFIX = `${FULL_CONTEXT_RESOURCE_URI}/project/`; export function normalizeResourceUri(uri: string): string { if (!uri) return uri; - if (uri === CONTEXT_RESOURCE_URI) return uri; - if (uri.endsWith(`/${CONTEXT_RESOURCE_URI}`)) return CONTEXT_RESOURCE_URI; - const scopedMarker = `/${PROJECT_CONTEXT_RESOURCE_PREFIX}`; - const scopedIndex = uri.indexOf(scopedMarker); - if (scopedIndex >= 0) { - return uri.slice(scopedIndex + 1); + const resourceIndex = uri.indexOf(CONTEXT_RESOURCE_URI); + if (resourceIndex >= 0) { + return uri.slice(resourceIndex); } return uri; } @@ -17,10 +16,18 @@ export function isContextResourceUri(uri: string): boolean { return normalizeResourceUri(uri) === CONTEXT_RESOURCE_URI; } +export function isFullContextResourceUri(uri: string): boolean { + return normalizeResourceUri(uri) === FULL_CONTEXT_RESOURCE_URI; +} + export function buildProjectContextResourceUri(projectPath: string): string { return `${PROJECT_CONTEXT_RESOURCE_PREFIX}${encodeURIComponent(projectPath)}`; } +export function buildProjectFullContextResourceUri(projectPath: string): string { + return `${FULL_PROJECT_CONTEXT_RESOURCE_PREFIX}${encodeURIComponent(projectPath)}`; +} + export function getProjectPathFromContextResourceUri(uri: string): string | undefined { const normalized = normalizeResourceUri(uri); if (!normalized.startsWith(PROJECT_CONTEXT_RESOURCE_PREFIX)) { @@ -31,4 +38,19 @@ export function getProjectPathFromContextResourceUri(uri: string): string | unde return encodedProjectPath ? decodeURIComponent(encodedProjectPath) : undefined; } -export { CONTEXT_RESOURCE_URI, PROJECT_CONTEXT_RESOURCE_PREFIX }; +export function getProjectPathFromFullContextResourceUri(uri: string): string | undefined { + const normalized = normalizeResourceUri(uri); + if (!normalized.startsWith(FULL_PROJECT_CONTEXT_RESOURCE_PREFIX)) { + return undefined; + } + + const encodedProjectPath = normalized.slice(FULL_PROJECT_CONTEXT_RESOURCE_PREFIX.length); + return encodedProjectPath ? decodeURIComponent(encodedProjectPath) : undefined; +} + +export { + CONTEXT_RESOURCE_URI, + PROJECT_CONTEXT_RESOURCE_PREFIX, + FULL_CONTEXT_RESOURCE_URI, + FULL_PROJECT_CONTEXT_RESOURCE_PREFIX +}; diff --git a/src/tools/get-codebase-health.ts b/src/tools/get-codebase-health.ts new file mode 100644 index 0000000..c770eff --- /dev/null +++ b/src/tools/get-codebase-health.ts @@ -0,0 +1,112 @@ +import type { Tool } from '@modelcontextprotocol/sdk/types.js'; +import type { ToolContext, ToolResponse } from './types.js'; +import { indexHealthByFile, normalizeHealthLookupKey, readHealthFile } from '../health/store.js'; + +export const definition: Tool = { + name: 'get_codebase_health', + description: + 'Get actionable codebase health signals from the latest index. Returns the highest-risk files and their reasons, or a single file when requested.', + inputSchema: { + type: 'object', + properties: { + file: { + type: 'string', + description: 'Optional file path to inspect a single file-level health record.' + }, + limit: { + type: 'number', + description: 'Maximum number of files to return when no file is specified (default: 10).', + default: 10 + }, + level: { + type: 'string', + enum: ['low', 'medium', 'high'], + description: 'Optional minimum health level to return.' + } + } + } +}; + +export async function handle( + args: Record, + ctx: ToolContext +): Promise { + const file = typeof args.file === 'string' ? args.file.trim() : undefined; + const limit = typeof args.limit === 'number' && Number.isFinite(args.limit) ? args.limit : 10; + const level = + args.level === 'low' || args.level === 'medium' || args.level === 'high' + ? args.level + : undefined; + + const health = await readHealthFile(ctx.paths.health); + if (!health) { + return { + content: [ + { + type: 'text', + text: JSON.stringify( + { + status: 'no_data', + message: + 'No codebase health artifact found. Run refresh_index to generate health.json.' + }, + null, + 2 + ) + } + ] + }; + } + + const orderedLevels = { high: 3, medium: 2, low: 1 }; + const minLevel = level ? orderedLevels[level] : 1; + + if (file) { + const byFile = indexHealthByFile(health, ctx.rootPath); + const fileHealth = byFile.get(normalizeHealthLookupKey(file, ctx.rootPath)); + return { + content: [ + { + type: 'text', + text: JSON.stringify( + fileHealth + ? { + status: 'success', + generatedAt: health.generatedAt, + file: fileHealth + } + : { + status: 'not_found', + message: `No health record found for ${file}.`, + generatedAt: health.generatedAt + }, + null, + 2 + ) + } + ] + }; + } + + const files = health.files + .filter((entry) => orderedLevels[entry.level] >= minLevel) + .slice(0, Math.max(1, Math.floor(limit))); + + return { + content: [ + { + type: 'text', + text: JSON.stringify( + { + status: 'success', + generatedAt: health.generatedAt, + summary: health.summary, + files + }, + null, + 2 + ) + } + ] + }; +} diff --git a/src/tools/index.ts b/src/tools/index.ts index 6cd8b82..def32ac 100644 --- a/src/tools/index.ts +++ b/src/tools/index.ts @@ -12,6 +12,7 @@ import { definition as d7, handle as h7 } from './get-symbol-references.js'; import { definition as d8, handle as h8 } from './detect-circular-dependencies.js'; import { definition as d9, handle as h9 } from './remember.js'; import { definition as d10, handle as h10 } from './get-memory.js'; +import { definition as d11, handle as h11 } from './get-codebase-health.js'; import type { ToolContext, ToolResponse } from './types.js'; @@ -51,7 +52,9 @@ function withProjectSelector(definition: Tool): Tool { }; } -export const TOOLS: Tool[] = [d1, d2, d3, d4, d5, d6, d7, d8, d9, d10].map(withProjectSelector); +export const TOOLS: Tool[] = [d1, d2, d3, d4, d5, d6, d7, d8, d9, d10, d11].map( + withProjectSelector +); export async function dispatchTool( name: string, @@ -79,6 +82,8 @@ export async function dispatchTool( return h9(args, ctx); case 'get_memory': return h10(args, ctx); + case 'get_codebase_health': + return h11(args, ctx); default: return { content: [{ type: 'text', text: JSON.stringify({ error: `Unknown tool: ${name}` }) }], diff --git a/src/tools/remember.ts b/src/tools/remember.ts index 030e997..fe3af3d 100644 --- a/src/tools/remember.ts +++ b/src/tools/remember.ts @@ -1,7 +1,11 @@ import type { Tool } from '@modelcontextprotocol/sdk/types.js'; import type { ToolContext, ToolResponse } from './types.js'; -import type { Memory, MemoryCategory, MemoryType } from '../types/index.js'; -import { appendMemoryFile } from '../memory/store.js'; +import type { Memory, MemoryCategory, MemoryScope, MemoryType } from '../types/index.js'; +import { + appendMemoryFile, + buildMemoryIdentityParts, + normalizeMemoryScope +} from '../memory/store.js'; export const definition: Tool = { name: 'remember', @@ -39,6 +43,23 @@ export const definition: Tool = { reason: { type: 'string', description: 'Why this matters or what breaks otherwise' + }, + scope: { + type: 'object', + description: + 'Optional scope for this memory. Use { kind: "file", file } or { kind: "symbol", file, symbol }.', + properties: { + kind: { + type: 'string', + enum: ['global', 'file', 'symbol'] + }, + file: { + type: 'string' + }, + symbol: { + type: 'string' + } + } } }, required: ['type', 'category', 'memory', 'reason'] @@ -54,15 +75,17 @@ export async function handle( category: MemoryCategory; memory: string; reason: string; + scope?: MemoryScope; }; const { type = 'decision', category, memory, reason } = args_typed; + const scope = normalizeMemoryScope(args_typed.scope); try { const crypto = await import('crypto'); const memoryPath = ctx.paths.memory; - const hashContent = `${type}:${category}:${memory}:${reason}`; + const hashContent = buildMemoryIdentityParts({ type, category, memory, reason, scope }); const hash = crypto.createHash('sha256').update(hashContent).digest('hex'); const id = hash.substring(0, 12); @@ -72,7 +95,8 @@ export async function handle( category, memory, reason, - date: new Date().toISOString() + date: new Date().toISOString(), + ...(scope && { scope }) }; const result = await appendMemoryFile(memoryPath, newMemory); diff --git a/src/tools/search-codebase.ts b/src/tools/search-codebase.ts index 7128e45..ab2ccc4 100644 --- a/src/tools/search-codebase.ts +++ b/src/tools/search-codebase.ts @@ -21,8 +21,9 @@ import { import { assessSearchQuality } from '../core/search-quality.js'; import { getRerankerStatus } from '../core/reranker.js'; import { IndexCorruptedError } from '../errors/index.js'; -import { readMemoriesFile, withConfidence } from '../memory/store.js'; +import { formatMemoryScopeText, readMemoriesFile, withConfidence } from '../memory/store.js'; import type { MemoryWithConfidence } from '../memory/store.js'; +import { indexHealthByFile, normalizeHealthLookupKey, readHealthFile } from '../health/store.js'; import { InternalFileGraph } from '../utils/usage-tracker.js'; import type { FileExport } from '../utils/usage-tracker.js'; import { RELATIONSHIPS_FILENAME } from '../constants/codebase-context.js'; @@ -247,14 +248,8 @@ export async function handle( // Load memories for keyword matching, enriched with confidence const allMemories = await readMemoriesFile(ctx.paths.memory); const allMemoriesWithConf = withConfidence(allMemories); - const queryTerms = queryStr.toLowerCase().split(/\s+/).filter(Boolean); - const relatedMemories = allMemoriesWithConf - .filter((m) => { - const searchText = `${m.memory} ${m.reason}`.toLowerCase(); - return queryTerms.some((term: string) => searchText.includes(term)); - }) - .sort((a, b) => b.effectiveConfidence - a.effectiveConfidence); + const queryTermSet = new Set(queryTerms); // Load intelligence data for enrichment (all intents, not just preflight) let intelligence: IntelligenceData | null = null; @@ -284,6 +279,9 @@ export async function handle( /* graceful degradation — relationships sidecar may not exist yet */ } + const healthArtifact = await readHealthFile(ctx.paths.health); + const healthByFile = indexHealthByFile(healthArtifact, ctx.rootPath); + // Helper to get imports graph from relationships sidecar (preferred) or intelligence function getImportsGraph(): Record | null { if (relationships?.graph?.imports) { @@ -320,10 +318,72 @@ export async function handle( return normalized.replace(/^\.\//, ''); } + function normalizeSymbolName(value: string): string { + return value.trim().toLowerCase(); + } + function pathsMatch(a: string, b: string): boolean { return a === b || a.endsWith(b) || b.endsWith(a); } + const resultPathSet = new Set(results.map((result) => normalizeGraphPath(result.filePath))); + const resultSymbolSet = new Set( + results + .map((result) => { + const symbolName = result.metadata?.symbolName; + return typeof symbolName === 'string' ? normalizeSymbolName(symbolName) : null; + }) + .filter((value): value is string => value !== null) + ); + + function getMemoryScopeBoost(memory: MemoryWithConfidence): number { + if (!memory.scope || memory.scope.kind === 'global') return 0; + + const normalizedFile = normalizeGraphPath(memory.scope.file); + if (memory.scope.kind === 'file') { + return resultPathSet.has(normalizedFile) ? 3 : 0; + } + + const symbolMatch = + resultSymbolSet.has(normalizeSymbolName(memory.scope.symbol)) || + queryTermSet.has(normalizeSymbolName(memory.scope.symbol)); + + if (resultPathSet.has(normalizedFile) && symbolMatch) return 4; + if (resultPathSet.has(normalizedFile)) return 2; + if (symbolMatch) return 1; + return 0; + } + + function getMemoryTextMatchCount(memory: MemoryWithConfidence): number { + const haystack = + `${memory.memory} ${memory.reason} ${formatMemoryScopeText(memory.scope)}`.toLowerCase(); + return queryTerms.filter((term) => haystack.includes(term)).length; + } + + function formatMemoryForOutput(memory: MemoryWithConfidence): string { + const scopeText = + !memory.scope || memory.scope.kind === 'global' + ? '' + : memory.scope.kind === 'file' + ? ` [${memory.scope.file}]` + : ` [${memory.scope.file}#${memory.scope.symbol}]`; + return `${memory.memory}${scopeText} (${memory.effectiveConfidence})`; + } + + const relatedMemories = allMemoriesWithConf + .map((memory) => ({ + memory, + textMatches: getMemoryTextMatchCount(memory), + scopeBoost: getMemoryScopeBoost(memory) + })) + .filter((entry) => entry.textMatches > 0 || entry.scopeBoost > 0) + .sort((a, b) => { + if (b.scopeBoost !== a.scopeBoost) return b.scopeBoost - a.scopeBoost; + if (b.textMatches !== a.textMatches) return b.textMatches - a.textMatches; + return b.memory.effectiveConfidence - a.memory.effectiveConfidence; + }) + .map((entry) => entry.memory); + function computeIndexConfidence(): 'fresh' | 'aging' | 'stale' { let confidence: 'fresh' | 'aging' | 'stale' = 'stale'; if (intelligence?.generatedAt) { @@ -568,13 +628,56 @@ export async function handle( if (terms.length === 0) return []; return memories .filter((m) => { - const text = `${m.memory} ${m.reason}`.toLowerCase(); + const text = `${m.memory} ${m.reason} ${formatMemoryScopeText(m.scope)}`.toLowerCase(); const matchCount = terms.filter((t) => text.includes(t)).length; - return matchCount >= 2 && m.effectiveConfidence >= 0.5; + return (matchCount >= 2 || getMemoryScopeBoost(m) >= 2) && m.effectiveConfidence >= 0.5; }) .slice(0, 2); } + function getResultHealth( + filePath: string + ): { level: 'low' | 'medium' | 'high'; reasons?: string[] } | undefined { + const fileHealth = healthByFile.get(normalizeHealthLookupKey(filePath, ctx.rootPath)); + if (!fileHealth || fileHealth.level === 'low') { + return undefined; + } + return { + level: fileHealth.level, + ...(fileHealth.reasons.length > 0 && { reasons: fileHealth.reasons.slice(0, 2) }) + }; + } + + function summarizeResultHealth( + resultPaths: string[] + ): { level: 'low' | 'medium' | 'high'; reasons?: string[] } | undefined { + const matched = resultPaths + .map((filePath) => healthByFile.get(normalizeHealthLookupKey(filePath, ctx.rootPath))) + .filter((entry): entry is NonNullable => Boolean(entry)); + if (matched.length === 0) { + return undefined; + } + + const priority = { high: 3, medium: 2, low: 1 }; + matched.sort((a, b) => { + if (priority[b.level] !== priority[a.level]) return priority[b.level] - priority[a.level]; + if (b.score !== a.score) return b.score - a.score; + return a.file.localeCompare(b.file); + }); + + const top = matched[0]; + const reasons = [...top.reasons]; + const sameLevelCount = matched.filter((entry) => entry.level === top.level).length; + if (sameLevelCount > 1) { + reasons.push(`${sameLevelCount} result files are marked ${top.level}-risk`); + } + + return { + level: top.level, + ...(reasons.length > 0 && { reasons: reasons.slice(0, 3) }) + }; + } + // Build a 1-line pattern summary string from intelligence.json patterns (compact mode) function buildPatternSummary(): string | undefined { const patterns = intelligence?.patterns; @@ -787,7 +890,7 @@ export async function handle( // --- Risk level (based on circular deps + impact breadth) --- //TODO: Review this risk level calculation - let _riskLevel: 'low' | 'medium' | 'high' = 'low'; + let riskLevel: 'low' | 'medium' | 'high' = 'low'; let cycleCount = 0; const graphDataSource = relationships?.graph || intelligence?.internalFileGraph; if (graphDataSource) { @@ -820,9 +923,9 @@ export async function handle( } } if (cycleCount > 0 || impactCandidates.length > 10) { - _riskLevel = 'high'; + riskLevel = 'high'; } else if (impactCandidates.length > 3) { - _riskLevel = 'medium'; + riskLevel = 'medium'; } // --- Golden files (exemplar code) --- @@ -951,6 +1054,25 @@ export async function handle( } } + const healthSummary = summarizeResultHealth(resultPaths); + if (healthSummary) { + decisionCard.health = healthSummary; + } else if (riskLevel !== 'low') { + const reasons: string[] = []; + if (cycleCount > 0) { + reasons.push( + `${cycleCount} circular dependenc${cycleCount === 1 ? 'y' : 'ies'} in the result area` + ); + } + if (impactCandidates.length > 3) { + reasons.push(`${impactCandidates.length} upstream callers may be affected`); + } + decisionCard.health = { + level: riskLevel, + ...(reasons.length > 0 && { reasons }) + }; + } + // Add whatWouldHelp from evidenceLock if (evidenceLock.whatWouldHelp && evidenceLock.whatWouldHelp.length > 0) { decisionCard.whatWouldHelp = evidenceLock.whatWouldHelp; @@ -1084,6 +1206,7 @@ export async function handle( const importedByCount = getImportedByCount(r); const topExports = getTopExports(r.filePath); const scope = buildScopeHeader(r.metadata); + const health = getResultHealth(r.filePath); // First 3 lines of chunk content as a lightweight signature preview const signaturePreview = r.snippet ? r.snippet @@ -1110,11 +1233,12 @@ export async function handle( ...(r.metadata?.symbolName && { symbol: r.metadata.symbolName }), ...(r.metadata?.symbolKind && { symbolKind: r.metadata.symbolKind }), ...(scope && { scope }), + ...(health && { health }), ...(signaturePreview && { signaturePreview }) }; }), ...(strongMemories.length > 0 && { - relatedMemories: strongMemories.map((m) => `${m.memory} (${m.effectiveConfidence})`) + relatedMemories: strongMemories.map((m) => formatMemoryForOutput(m)) }) }, { mode: 'compact', pretty: true, transportAware: true } @@ -1143,6 +1267,7 @@ export async function handle( ? enrichSnippetWithScope(r.snippet, r.metadata, r.filePath, r.startLine) : undefined; const scope = buildScopeHeader(r.metadata); + const health = getResultHealth(r.filePath); // Chunk-level imports/exports (top 5 each) + complexity const chunkImports = r.imports?.slice(0, 5); const chunkExports = r.exports?.slice(0, 5); @@ -1168,6 +1293,7 @@ export async function handle( ...(scope && { scope }), ...(chunkImports && chunkImports.length > 0 && { imports: chunkImports }), ...(chunkExports && chunkExports.length > 0 && { exports: chunkExports }), + ...(health && { health }), ...(r.metadata?.cyclomaticComplexity && { complexity: r.metadata.cyclomaticComplexity }) @@ -1175,9 +1301,7 @@ export async function handle( }), totalResults: results.length, ...(relatedMemories.length > 0 && { - relatedMemories: relatedMemories - .slice(0, 3) - .map((m) => `${m.memory} (${m.effectiveConfidence})`) + relatedMemories: relatedMemories.slice(0, 3).map((m) => formatMemoryForOutput(m)) }) }, { mode: 'full', pretty: true, transportAware: true } diff --git a/src/tools/types.ts b/src/tools/types.ts index e216660..5de5ee8 100644 --- a/src/tools/types.ts +++ b/src/tools/types.ts @@ -17,6 +17,10 @@ export interface DecisionCard { files?: string[]; details?: Array<{ file: string; line?: number; hop: 1 | 2 }>; }; + health?: { + level: 'low' | 'medium' | 'high'; + reasons?: string[]; + }; whatWouldHelp?: string[]; } @@ -24,6 +28,7 @@ export interface ToolPaths { baseDir: string; memory: string; intelligence: string; + health: string; keywordIndex: string; vectorDb: string; } @@ -118,6 +123,10 @@ export interface SearchResultItem { imports?: string[]; exports?: string[]; complexity?: number; + health?: { + level: 'low' | 'medium' | 'high'; + reasons?: string[]; + }; snippet?: string; } diff --git a/src/types/index.ts b/src/types/index.ts index 0576996..a792517 100644 --- a/src/types/index.ts +++ b/src/types/index.ts @@ -595,6 +595,46 @@ export interface Memory { date: string; /** Source of the memory: 'user' (default) or 'git' (auto-extracted from commits) */ source?: 'user' | 'git'; + /** Optional scope for file-specific or symbol-specific guidance */ + scope?: MemoryScope; +} + +export type MemoryScope = + | { kind: 'global' } + | { kind: 'file'; file: string } + | { kind: 'symbol'; file: string; symbol: string }; + +export type CodebaseHealthLevel = 'low' | 'medium' | 'high'; + +export interface CodebaseHealthFile { + file: string; + level: CodebaseHealthLevel; + score: number; + reasons: string[]; + signals?: { + hotspotRank?: number; + importerCount?: number; + importCount?: number; + cycleCount?: number; + maxCyclomaticComplexity?: number; + }; +} + +export interface CodebaseHealthSummary { + files: number; + highRiskFiles: number; + mediumRiskFiles: number; + lowRiskFiles: number; +} + +export interface CodebaseHealthArtifact { + header: { + buildId: string; + formatVersion: number; + }; + generatedAt: string; + summary: CodebaseHealthSummary; + files: CodebaseHealthFile[]; } // ============================================================================ diff --git a/tests/__snapshots__/codebase-map.test.ts.snap b/tests/__snapshots__/codebase-map.test.ts.snap index be94e64..588f8bd 100644 --- a/tests/__snapshots__/codebase-map.test.ts.snap +++ b/tests/__snapshots__/codebase-map.test.ts.snap @@ -6,7 +6,6 @@ exports[`renderMapMarkdown > renders deterministic markdown from fixture — sna ## Architecture Layers - **src** (5 files) — hub: \`src/core/search.ts\` -- **tests** (2 files) - **lib** (1 file) — hub: \`lib/utils.ts\` ## Entrypoints @@ -16,19 +15,19 @@ exports[`renderMapMarkdown > renders deterministic markdown from fixture — sna ## Hub Files +- \`lib/utils.ts\` - \`src/core/search.ts\` - \`src/utils/helpers.ts\` -- \`lib/utils.ts\` ## Key Interfaces -- **SearchOptions** \`interface\` — \`src/core/search.ts\` (imported by 3) +- **SearchOptions** \`interface\` — \`src/core/search.ts\` (imported by 2) \`\`\` export interface SearchOptions { query: string; limit?: number; \`\`\` -- **CodebaseSearcher** \`class\` — \`src/core/search.ts\` (imported by 3) +- **CodebaseSearcher** \`class\` — \`src/core/search.ts\` (imported by 2) \`\`\` export class CodebaseSearcher { private rootPath: string; @@ -46,11 +45,11 @@ exports[`renderMapMarkdown > renders deterministic markdown from fixture — sna ## Dependency Hotspots -- \`src/core/search.ts\` — imported by 3, imports 2 (combined: 5) -- \`src/utils/helpers.ts\` — imported by 3, imports 0 (combined: 3) +- \`src/core/search.ts\` — imported by 2, imports 2 (combined: 4) - \`lib/utils.ts\` — imported by 2, imports 0 (combined: 2) - \`src/cli.ts\` — imported by 0, imports 2 (combined: 2) - \`src/index.ts\` — imported by 0, imports 2 (combined: 2) +- \`src/utils/helpers.ts\` — imported by 2, imports 0 (combined: 2) ## Active Patterns diff --git a/tests/benchmark-comparators.test.ts b/tests/benchmark-comparators.test.ts index 8863ef4..4981e4c 100644 --- a/tests/benchmark-comparators.test.ts +++ b/tests/benchmark-comparators.test.ts @@ -1,4 +1,6 @@ import { describe, expect, it } from 'vitest'; +import { mkdtempSync, readFileSync } from 'node:fs'; +import os from 'node:os'; import path from 'node:path'; import { fileURLToPath, pathToFileURL } from 'node:url'; @@ -8,6 +10,10 @@ async function importHelper() { return import(pathToFileURL(path.resolve(__dirname, '..', 'scripts', 'lib', 'managed-mcp-session.mjs')).href); } +async function importRunner() { + return import(pathToFileURL(path.resolve(__dirname, '..', 'scripts', 'benchmark-comparators.mjs')).href); +} + function isProcessAlive(pid: number): boolean { try { process.kill(pid, 0); @@ -28,6 +34,18 @@ async function waitForProcessExit(pid: number, timeoutMs = 5000): Promise throw new Error(`Process ${pid} still alive after ${timeoutMs}ms`); } +function readWrapperPidFile(pidFile: string): { + wrapperPid?: number; + sidecarPid?: number; + echoPid?: number; +} { + return JSON.parse(readFileSync(pidFile, 'utf8')) as { + wrapperPid?: number; + sidecarPid?: number; + echoPid?: number; + }; +} + describe('managed MCP benchmark sessions', () => { it('kills the child when connect times out', async () => { const { withManagedStdioClientSession } = await importHelper(); @@ -53,6 +71,62 @@ describe('managed MCP benchmark sessions', () => { await waitForProcessExit(pid as number); }); + it('kills descendant wrapper children when connect times out', async () => { + const { withManagedStdioClientSession } = await importHelper(); + const wrapperServer = path.resolve(__dirname, 'fixtures', 'mcp', 'wrapper-hanging-server.mjs'); + const pidFile = path.join(mkdtempSync(path.join(os.tmpdir(), 'mcp-wrapper-timeout-')), 'pids.json'); + + let pid: number | null = null; + + await expect( + withManagedStdioClientSession( + { + serverCommand: process.execPath, + serverArgs: [wrapperServer], + serverEnv: { MCP_TEST_PID_FILE: pidFile }, + connectTimeoutMs: 200, + onSpawn: (childPid: number) => { + pid = childPid; + } + }, + async () => undefined + ) + ).rejects.toThrow('MCP client connect timed out'); + + const { sidecarPid } = readWrapperPidFile(pidFile); + expect(pid).toBeTypeOf('number'); + expect(sidecarPid).toBeTypeOf('number'); + await waitForProcessExit(pid as number); + await waitForProcessExit(sidecarPid as number); + }); + + it('kills descendant wrapper children when connect times out without onSpawn', async () => { + const { withManagedStdioClientSession } = await importHelper(); + const wrapperServer = path.resolve(__dirname, 'fixtures', 'mcp', 'wrapper-hanging-server.mjs'); + const pidFile = path.join( + mkdtempSync(path.join(os.tmpdir(), 'mcp-wrapper-timeout-no-spawn-')), + 'pids.json' + ); + + await expect( + withManagedStdioClientSession( + { + serverCommand: process.execPath, + serverArgs: [wrapperServer], + serverEnv: { MCP_TEST_PID_FILE: pidFile }, + connectTimeoutMs: 200 + }, + async () => undefined + ) + ).rejects.toThrow('MCP client connect timed out'); + + const { wrapperPid, sidecarPid } = readWrapperPidFile(pidFile); + expect(wrapperPid).toBeTypeOf('number'); + expect(sidecarPid).toBeTypeOf('number'); + await waitForProcessExit(wrapperPid as number); + await waitForProcessExit(sidecarPid as number); + }); + it('kills the child when work fails after a successful connection', async () => { const { withManagedStdioClientSession } = await importHelper(); const echoServer = path.resolve(__dirname, 'fixtures', 'mcp', 'echo-server.mjs'); @@ -89,4 +163,150 @@ describe('managed MCP benchmark sessions', () => { expect(pid).toBeTypeOf('number'); await waitForProcessExit(pid as number); }); + + it('kills descendant wrapper children after a successful session closes', async () => { + const { withManagedStdioClientSession } = await importHelper(); + const wrapperServer = path.resolve(__dirname, 'fixtures', 'mcp', 'wrapper-echo-server.mjs'); + const pidFile = path.join(mkdtempSync(path.join(os.tmpdir(), 'mcp-wrapper-success-')), 'pids.json'); + + let pid: number | null = null; + + await withManagedStdioClientSession( + { + serverCommand: process.execPath, + serverArgs: [wrapperServer], + serverEnv: { MCP_TEST_PID_FILE: pidFile }, + connectTimeoutMs: 5000, + onSpawn: (childPid: number) => { + pid = childPid; + } + }, + async ({ + client, + transport + }: { + client: { + listTools: () => Promise<{ tools: Array<{ name: string }> }>; + callTool: (request: { name: string; arguments: { query: string } }) => Promise<{ + content?: Array<{ type: string; text: string }>; + }>; + }; + transport: { pid: number | null }; + }) => { + pid = transport.pid ?? pid; + const tools = await client.listTools(); + expect(tools.tools.map((tool) => tool.name)).toContain('echo_search'); + const result = await client.callTool({ + name: 'echo_search', + arguments: { query: 'wrapper cleanup' } + }); + expect(result.content?.[0]?.text).toBe('wrapper cleanup'); + } + ); + + const { wrapperPid, sidecarPid, echoPid } = readWrapperPidFile(pidFile); + expect(pid).toBeTypeOf('number'); + expect(wrapperPid).toBeTypeOf('number'); + expect(sidecarPid).toBeTypeOf('number'); + expect(echoPid).toBeTypeOf('number'); + await waitForProcessExit(pid as number); + await waitForProcessExit(wrapperPid as number); + await waitForProcessExit(echoPid as number); + await waitForProcessExit(sidecarPid as number); + }); +}); + +describe('benchmark comparator aggregation', () => { + it('marks empty task payloads as pending evidence instead of ok', async () => { + const { aggregateResults } = await importRunner(); + const aggregated = aggregateResults([ + { + taskId: 't1', + job: 'search', + surface: 'search_codebase', + usefulnessScore: 0, + matchedSignals: [], + missingSignals: ['results'], + payloadBytes: 19, + estimatedTokens: 5, + toolCallCount: 1, + elapsedMs: 1 + } + ]); + + expect(aggregated.status).toBe('pending_evidence'); + expect(aggregated.reason).toMatch(/usable benchmark evidence/i); + expect(aggregated.averageFirstRelevantHit).toBeNull(); + expect(aggregated.bestExampleUsefulnessRate).toBeNull(); + }); + + it('computes ranked-hit and best-example metrics when task evidence exists', async () => { + const { aggregateResults } = await importRunner(); + const aggregated = aggregateResults([ + { + taskId: 'search-1', + job: 'search', + surface: 'search_codebase', + usefulnessScore: 0.5, + matchedSignals: ['results'], + missingSignals: ['searchQuality'], + payloadBytes: 200, + estimatedTokens: 50, + toolCallCount: 1, + elapsedMs: 10, + firstRelevantHit: 2 + }, + { + taskId: 'find-1', + job: 'find', + surface: 'search_codebase', + usefulnessScore: 1, + matchedSignals: ['bestExample'], + missingSignals: [], + payloadBytes: 220, + estimatedTokens: 55, + toolCallCount: 1, + elapsedMs: 12, + bestExampleUseful: true + } + ]); + + expect(aggregated.status).toBe('ok'); + expect(aggregated.averageFirstRelevantHit).toBe(2); + expect(aggregated.bestExampleUsefulnessRate).toBe(1); + }); +}); + +describe('raw Claude result parsing', () => { + it('extracts files and bestExample from structured Claude output', async () => { + const { parseRawClaudeStructuredResult } = await importRunner(); + const parsed = parseRawClaudeStructuredResult( + JSON.stringify({ + answer: 'Use AuthInterceptor and auth.effects patterns.', + files: ['src/auth/auth.interceptor.ts', 'src/auth/auth.effects.ts'], + bestExample: 'src/auth/auth.interceptor.ts' + }) + ); + + expect(parsed.payload).toContain('AuthInterceptor'); + expect(parsed.topFiles).toEqual([ + 'src/auth/auth.interceptor.ts', + 'src/auth/auth.effects.ts' + ]); + expect(parsed.bestExample).toBe('src/auth/auth.interceptor.ts'); + }); + + it('extracts files and bestExample from fenced JSON Claude output', async () => { + const { parseRawClaudeStructuredResult } = await importRunner(); + const parsed = parseRawClaudeStructuredResult(`\`\`\`json +{"answer":"Use AuthInterceptor and auth.effects patterns.","files":["src/auth/auth.interceptor.ts","src/auth/auth.effects.ts"],"bestExample":"src/auth/auth.interceptor.ts"} +\`\`\``); + + expect(parsed.payload).toContain('AuthInterceptor'); + expect(parsed.topFiles).toEqual([ + 'src/auth/auth.interceptor.ts', + 'src/auth/auth.effects.ts' + ]); + expect(parsed.bestExample).toBe('src/auth/auth.interceptor.ts'); + }); }); diff --git a/tests/cli-entrypoint-runtime.test.ts b/tests/cli-entrypoint-runtime.test.ts new file mode 100644 index 0000000..4d98879 --- /dev/null +++ b/tests/cli-entrypoint-runtime.test.ts @@ -0,0 +1,35 @@ +import { spawnSync } from 'node:child_process'; +import { resolve } from 'node:path'; +import { describe, expect, it } from 'vitest'; + +const root = resolve(import.meta.dirname, '..'); +const entrypoint = resolve(root, 'src', 'index.ts'); + +type MapJson = { + project?: string; + architecture?: object; + activePatterns?: unknown[]; +}; + +describe('CLI entrypoint runtime', () => { + it('dispatches map without loading MCP server runtime on the CLI path', () => { + const result = spawnSync(process.execPath, ['--import', 'tsx', entrypoint, 'map', '--json'], { + cwd: root, + env: { + ...process.env, + CODEBASE_ROOT: root + }, + encoding: 'utf8', + timeout: 120_000 + }); + + expect(result.status).toBe(0); + expect(result.stderr).not.toContain('ERR_MODULE_NOT_FOUND'); + expect(result.stderr).not.toContain('@modelcontextprotocol/sdk/server/stdio.js'); + + const parsed = JSON.parse(result.stdout) as MapJson; + expect(typeof parsed.project).toBe('string'); + expect(parsed.architecture).toBeTruthy(); + expect(Array.isArray(parsed.activePatterns)).toBe(true); + }); +}); diff --git a/tests/codebase-map.test.ts b/tests/codebase-map.test.ts index 6e1465d..3fd7bf4 100644 --- a/tests/codebase-map.test.ts +++ b/tests/codebase-map.test.ts @@ -6,6 +6,7 @@ import { fileURLToPath } from 'url'; import { createProjectState } from '../src/project-state.js'; import { buildCodebaseMap, renderMapMarkdown, renderMapPretty } from '../src/core/codebase-map.js'; import { generateCodebaseIntelligence } from '../src/resources/codebase-intelligence.js'; +import type { CodeChunk } from '../src/types/index.js'; import { CODEBASE_CONTEXT_DIRNAME, INTELLIGENCE_FILENAME, @@ -17,6 +18,68 @@ import { const __filename = fileURLToPath(import.meta.url); const __dirname = path.dirname(__filename); const FIXTURE_ROOT = path.join(__dirname, 'fixtures', 'map-fixture'); +const CURRENT_REPO_ROOT = path.resolve(__dirname, '..'); +const BOUNDED_LIMITS = { + entrypoints: 8, + hubFiles: 5, + keyInterfaces: 8, + apiSurfaceFiles: 8, + apiSurfaceExports: 3, + hotspots: 5, + bestExamples: 3 +} as const; + +type TempGraph = { + imports?: Record; + importedBy?: Record; + exports?: Record>; + stats?: { files?: number; edges?: number; avgDependencies?: number }; +}; + +type TempProjectOptions = { + projectName?: string; + graph?: TempGraph; + goldenFiles?: Array<{ file: string; score: number }>; + patterns?: Record; + chunks?: CodeChunk[]; +}; + +async function createTempMapProject(options: TempProjectOptions = {}): Promise { + const tempParent = await fs.mkdtemp(path.join(os.tmpdir(), 'codebase-map-project-')); + const projectName = options.projectName ?? 'temp-map-project'; + const rootPath = path.join(tempParent, projectName); + const ctxDir = path.join(rootPath, CODEBASE_CONTEXT_DIRNAME); + + await fs.mkdir(ctxDir, { recursive: true }); + await fs.writeFile( + path.join(ctxDir, INTELLIGENCE_FILENAME), + JSON.stringify( + { + patterns: options.patterns ?? {}, + goldenFiles: options.goldenFiles ?? [] + }, + null, + 2 + ), + 'utf-8' + ); + await fs.writeFile( + path.join(ctxDir, KEYWORD_INDEX_FILENAME), + JSON.stringify({ chunks: options.chunks ?? [] }, null, 2), + 'utf-8' + ); + await fs.writeFile( + path.join(ctxDir, RELATIONSHIPS_FILENAME), + JSON.stringify({ graph: options.graph ?? {} }, null, 2), + 'utf-8' + ); + + return rootPath; +} + +async function removeTempMapProject(rootPath: string): Promise { + await fs.rm(path.dirname(rootPath), { recursive: true, force: true }); +} // --------------------------------------------------------------------------- // buildCodebaseMap @@ -25,13 +88,13 @@ const FIXTURE_ROOT = path.join(__dirname, 'fixtures', 'map-fixture'); describe('buildCodebaseMap', () => { it('returns a CodebaseMapSummary with project name from rootPath', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); expect(map.project).toBe('map-fixture'); }); it('derives architecture layers from graph keys, sorted by count desc then alpha', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); // Use objectContaining — layers may now have hubFile/hubExports from enrichLayers expect(map.architecture.layers).toHaveLength(3); expect(map.architecture.layers[0]).toMatchObject({ name: 'src', fileCount: 5 }); @@ -47,7 +110,7 @@ describe('buildCodebaseMap', () => { it('derives hub files: top 5 by importedBy count, sorted count-desc then alpha', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); expect(map.architecture.hubFiles).toEqual([ 'src/core/search.ts', 'src/utils/helpers.ts', @@ -57,7 +120,7 @@ describe('buildCodebaseMap', () => { it('derives active patterns from intelligence.json, sorted by adoption desc', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); expect(map.activePatterns).toEqual([ { name: 'Injectable', adoption: '100%', trend: 'Stable' }, { name: 'RxJS', adoption: '72%', trend: 'Rising' }, @@ -67,7 +130,7 @@ describe('buildCodebaseMap', () => { it('derives best examples from goldenFiles with dominant pattern as reason', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); expect(map.bestExamples).toEqual([ { file: 'src/core/search.ts', score: 0.95, reason: 'Injectable' }, { file: 'src/utils/helpers.ts', score: 0.87, reason: 'Injectable' } @@ -76,13 +139,13 @@ describe('buildCodebaseMap', () => { it('reads graph stats from relationships.json', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); expect(map.graphStats).toEqual({ files: 8, edges: 9, avgDependencies: 1.1 }); }); it('adds suggested next calls: split pattern + golden file + fallback', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); // Vitest at 45% triggers split-pattern suggestion expect(map.suggestedNextCalls[0]).toEqual({ tool: 'get_team_patterns', @@ -107,7 +170,7 @@ describe('buildCodebaseMap', () => { it('degrades gracefully when intelligence.json is missing', async () => { // Point at a non-existent dir — builder should return empty map, not throw const project = createProjectState(path.join(FIXTURE_ROOT, 'nonexistent')); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); expect(map.architecture.layers).toEqual([]); expect(map.architecture.entrypoints).toEqual([]); expect(map.architecture.hubFiles).toEqual([]); @@ -123,15 +186,233 @@ describe('buildCodebaseMap', () => { it('caps suggested next calls at 3', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); expect(map.suggestedNextCalls.length).toBeLessThanOrEqual(3); }); + it('keeps bounded mode free of tests, fixtures, generated output, dist, and vendor noise', async () => { + const rootPath = await createTempMapProject({ + projectName: 'codebase-context', + patterns: { + default: { + primary: { name: 'Factory', frequency: '80%', trend: 'Stable' } + } + }, + goldenFiles: [ + { file: 'tests/codebase-map.test.ts', score: 0.99 }, + { file: 'dist/index.js', score: 0.97 }, + { file: 'src/core/map.ts', score: 0.95 }, + { file: 'src/index.ts', score: 0.91 } + ], + graph: { + imports: { + 'src/index.ts': ['src/core/map.ts'], + 'src/cli.ts': ['src/core/map.ts'], + 'src/core/map.ts': [], + 'tests/codebase-map.test.ts': ['src/core/map.ts'], + 'dist/index.js': ['src/core/map.ts'], + 'vendor/acme/index.ts': ['src/core/map.ts'], + 'src/generated/api.generated.ts': ['src/core/map.ts'], + 'src/fixtures/sample.ts': ['src/core/map.ts'] + }, + importedBy: { + 'src/core/map.ts': [ + 'src/index.ts', + 'src/cli.ts', + 'tests/codebase-map.test.ts', + 'dist/index.js', + 'vendor/acme/index.ts', + 'src/generated/api.generated.ts', + 'src/fixtures/sample.ts' + ] + }, + exports: { + 'src/index.ts': [{ name: 'serve', type: 'function' }], + 'src/cli.ts': [{ name: 'runCli', type: 'function' }], + 'tests/codebase-map.test.ts': [{ name: 'suite', type: 'function' }], + 'dist/index.js': [{ name: 'bundle', type: 'function' }], + 'src/generated/api.generated.ts': [{ name: 'GeneratedApi', type: 'interface' }] + }, + stats: { files: 8, edges: 7, avgDependencies: 0.9 } + }, + chunks: [ + { + relativePath: 'src/core/map.ts', + content: 'export class MapBuilder { build() {} }', + metadata: { + symbolAware: true, + symbolKind: 'class', + symbolName: 'MapBuilder' + } + } as CodeChunk, + { + relativePath: 'tests/codebase-map.test.ts', + content: 'export class MapBuilderTest { run() {} }', + metadata: { + symbolAware: true, + symbolKind: 'class', + symbolName: 'MapBuilderTest' + } + } as CodeChunk, + { + relativePath: 'src/generated/api.generated.ts', + content: 'export interface GeneratedApi { id: string }', + metadata: { + symbolAware: true, + symbolKind: 'interface', + symbolName: 'GeneratedApi' + } + } as CodeChunk + ] + }); + + try { + const project = createProjectState(rootPath); + const map = await buildCodebaseMap(project); + + expect(map.project).toBe('codebase-context'); + expect(map.architecture.layers.map((layer) => layer.name)).toEqual(['src']); + expect(map.architecture.entrypoints).toEqual(['src/cli.ts', 'src/index.ts']); + expect(map.architecture.hubFiles).toEqual(['src/core/map.ts']); + expect(map.architecture.keyInterfaces.map((item) => item.name)).toEqual(['MapBuilder']); + expect(map.architecture.apiSurface.map((surface) => surface.file)).toEqual([ + 'src/cli.ts', + 'src/index.ts' + ]); + expect(map.architecture.hotspots.every((hotspot) => hotspot.file.startsWith('src/'))).toBe( + true + ); + expect(map.bestExamples).toEqual([ + { file: 'src/core/map.ts', score: 0.95, reason: 'Factory' }, + { file: 'src/index.ts', score: 0.91, reason: 'Factory' } + ]); + } finally { + await removeTempMapProject(rootPath); + } + }); + + it('restores excluded paths in full mode and removes bounded caps', async () => { + const imports: Record = {}; + const importedBy: Record = {}; + const exportsByFile: Record> = {}; + const chunks: CodeChunk[] = []; + const goldenFiles: Array<{ file: string; score: number }> = []; + + for (let index = 0; index < 10; index += 1) { + const contractFile = `src/contracts/contract-${index}.ts`; + importedBy[contractFile] = [`src/entry-${index}.ts`, 'tests/codebase-map.test.ts']; + chunks.push({ + relativePath: contractFile, + content: `export interface Contract${index} { value: string }`, + metadata: { + symbolAware: true, + symbolKind: 'interface', + symbolName: `Contract${index}` + } + } as CodeChunk); + goldenFiles.push({ file: contractFile, score: 0.9 - index * 0.01 }); + } + + for (let index = 0; index < 12; index += 1) { + const entryFile = `src/entry-${index}.ts`; + const sharedFile = `src/shared-${index}.ts`; + imports[entryFile] = [sharedFile]; + imports[sharedFile] = []; + importedBy[sharedFile] = [entryFile]; + exportsByFile[entryFile] = [ + { name: `entry${index}A`, type: 'function' }, + { name: `entry${index}B`, type: 'function' }, + { name: `entry${index}C`, type: 'function' }, + { name: `entry${index}D`, type: 'function' } + ]; + } + + imports['tests/codebase-map.test.ts'] = ['src/shared-0.ts']; + imports['dist/index.js'] = ['src/shared-1.ts']; + imports['vendor/acme/index.ts'] = ['src/shared-2.ts']; + exportsByFile['tests/codebase-map.test.ts'] = [{ name: 'suite', type: 'function' }]; + exportsByFile['dist/index.js'] = [{ name: 'bundle', type: 'function' }]; + exportsByFile['vendor/acme/index.ts'] = [{ name: 'vendorEntry', type: 'function' }]; + goldenFiles.unshift( + { file: 'tests/codebase-map.test.ts', score: 0.99 }, + { file: 'dist/index.js', score: 0.98 }, + { file: 'vendor/acme/index.ts', score: 0.97 } + ); + + const rootPath = await createTempMapProject({ + graph: { + imports, + importedBy, + exports: exportsByFile, + stats: { files: 30, edges: 40, avgDependencies: 1.3 } + }, + goldenFiles, + chunks + }); + + try { + const project = createProjectState(rootPath); + const boundedMap = await buildCodebaseMap(project); + const fullMap = await buildCodebaseMap(project, { mode: 'full' }); + + expect(boundedMap.architecture.entrypoints).toHaveLength(BOUNDED_LIMITS.entrypoints); + expect(fullMap.architecture.entrypoints.length).toBeGreaterThan( + boundedMap.architecture.entrypoints.length + ); + expect(boundedMap.architecture.keyInterfaces).toHaveLength(BOUNDED_LIMITS.keyInterfaces); + expect(fullMap.architecture.keyInterfaces.length).toBeGreaterThan( + boundedMap.architecture.keyInterfaces.length + ); + expect(boundedMap.architecture.apiSurface).toHaveLength(BOUNDED_LIMITS.apiSurfaceFiles); + expect(fullMap.architecture.apiSurface.length).toBeGreaterThan( + boundedMap.architecture.apiSurface.length + ); + expect( + boundedMap.architecture.apiSurface.find((surface) => surface.file === 'src/entry-0.ts') + ?.exports + ).toHaveLength(BOUNDED_LIMITS.apiSurfaceExports); + expect( + fullMap.architecture.apiSurface.find((surface) => surface.file === 'src/entry-0.ts') + ?.exports + ).toHaveLength(4); + expect(boundedMap.architecture.hubFiles).toHaveLength(BOUNDED_LIMITS.hubFiles); + expect(fullMap.architecture.hubFiles.length).toBeGreaterThan( + boundedMap.architecture.hubFiles.length + ); + expect(boundedMap.architecture.hotspots).toHaveLength(BOUNDED_LIMITS.hotspots); + expect(fullMap.architecture.hotspots.length).toBeGreaterThan( + boundedMap.architecture.hotspots.length + ); + expect(boundedMap.bestExamples).toHaveLength(BOUNDED_LIMITS.bestExamples); + expect(fullMap.bestExamples.some((example) => example.file === 'tests/codebase-map.test.ts')).toBe(true); + expect(fullMap.bestExamples.some((example) => example.file === 'dist/index.js')).toBe(true); + expect(fullMap.architecture.layers.map((layer) => layer.name)).toEqual( + expect.arrayContaining(['dist', 'tests', 'vendor']) + ); + } finally { + await removeTempMapProject(rootPath); + } + }); + + it('keeps the repo-root codebase-context map bounded by default', async () => { + const project = createProjectState(CURRENT_REPO_ROOT); + const map = await buildCodebaseMap(project); + + expect(map.project).toBe('codebase-context'); + expect(map.architecture.layers.map((layer) => layer.name)).not.toContain('tests'); + expect(map.architecture.layers.map((layer) => layer.name)).not.toContain('dist'); + expect(map.architecture.entrypoints.length).toBeLessThanOrEqual(8); + expect(map.architecture.apiSurface.length).toBeLessThanOrEqual(8); + expect(map.architecture.hubFiles.every((file) => !/(?:^|\/)(?:tests?|dist)\//.test(file))).toBe( + true + ); + }); + // --- Structural skeleton (Phase 13) --- it('derives keyInterfaces from symbolAware chunks, sorted by importer count', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); // SearchOptions and CodebaseSearcher are both in src/core/search.ts (3 importers) // SearchResult is in src/types.ts (0 importers) // helperUtil is not symbolAware — excluded @@ -145,7 +426,7 @@ describe('buildCodebaseMap', () => { it('signatureHint strips trailing { and caps at 200 chars', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); for (const ki of map.architecture.keyInterfaces) { expect(ki.signatureHint).not.toMatch(/\{$/); expect(ki.signatureHint.length).toBeLessThanOrEqual(200); @@ -154,14 +435,14 @@ describe('buildCodebaseMap', () => { it('signatureHint contains the symbol name', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); const iface = map.architecture.keyInterfaces.find((k) => k.name === 'SearchOptions')!; expect(iface.signatureHint).toContain('SearchOptions'); }); it('derives apiSurface from entrypoints x graph.exports', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); // src/cli.ts and src/index.ts are entrypoints; both have exports in fixture const cli = map.architecture.apiSurface.find((s) => s.file === 'src/cli.ts'); expect(cli).toBeDefined(); @@ -172,7 +453,7 @@ describe('buildCodebaseMap', () => { it('apiSurface excludes default exports', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); for (const surface of map.architecture.apiSurface) { expect(surface.exports).not.toContain('default'); } @@ -182,9 +463,9 @@ describe('buildCodebaseMap', () => { const project = createProjectState(FIXTURE_ROOT); const map = await buildCodebaseMap(project); expect(map.architecture.hotspots.length).toBeLessThanOrEqual(5); - // src/core/search.ts: importedBy=3, imports=2 → combined=5 (highest) + // Bounded mode drops test importers, so search.ts keeps two real importers plus two imports. expect(map.architecture.hotspots[0].file).toBe('src/core/search.ts'); - expect(map.architecture.hotspots[0].combined).toBe(5); + expect(map.architecture.hotspots[0].combined).toBe(4); // combined is always importerCount + importCount for (const h of map.architecture.hotspots) { expect(h.combined).toBe(h.importerCount + h.importCount); @@ -193,7 +474,7 @@ describe('buildCodebaseMap', () => { it('enriches layers with hubFile from importedBy data', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); const srcLayer = map.architecture.layers.find((l) => l.name === 'src')!; // src/core/search.ts has 3 importers — highest in the src layer expect(srcLayer.hubFile).toBe('src/core/search.ts'); @@ -201,7 +482,7 @@ describe('buildCodebaseMap', () => { it('enriches layers with hubExports when graph.exports has data', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); // src/cli.ts has exports in fixture but is not the hub of the src layer // src/index.ts has exports and is also in src — but search.ts (hub) has no exports in fixture const srcLayer = map.architecture.layers.find((l) => l.name === 'src')!; @@ -248,7 +529,7 @@ describe('buildCodebaseMap', () => { ); const project = createProjectState(tempRoot); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); const srcLayer = map.architecture.layers.find((layer) => layer.name === 'src'); expect(srcLayer?.hubFile).toBe('src/a.ts'); @@ -273,7 +554,7 @@ describe('renderMapMarkdown', () => { it('includes all required section headers', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); const md = renderMapMarkdown(map); expect(md).toContain('# Codebase Map'); expect(md).toContain('## Architecture Layers'); @@ -320,7 +601,7 @@ describe('renderMapMarkdown', () => { describe('renderMapPretty', () => { it('renders box characters in default mode', async () => { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); const pretty = renderMapPretty(map); expect(pretty).toContain('┌'); expect(pretty).toContain('│'); @@ -332,7 +613,7 @@ describe('renderMapPretty', () => { process.env.CODEBASE_CONTEXT_ASCII = '1'; try { const project = createProjectState(FIXTURE_ROOT); - const map = await buildCodebaseMap(project); + const map = await buildCodebaseMap(project, { mode: 'full' }); const pretty = renderMapPretty(map); expect(pretty).toContain('+'); expect(pretty).toContain('-'); diff --git a/tests/edit-preflight-harness.test.ts b/tests/edit-preflight-harness.test.ts new file mode 100644 index 0000000..6332d15 --- /dev/null +++ b/tests/edit-preflight-harness.test.ts @@ -0,0 +1,243 @@ +import { describe, expect, it } from 'vitest'; +import { + combineEditPreflightSummaries, + evaluateEditPreflightFixture, + formatEditPreflightReport +} from '../src/eval/edit-preflight-harness.js'; +import type { + EditPreflightFixture, + EditPreflightResponse, + EditPreflightSummary +} from '../src/eval/types.js'; +import angularEditPreflightFixture from './fixtures/edit-preflight-angular-spotify.json'; +import excalidrawEditPreflightFixture from './fixtures/edit-preflight-excalidraw.json'; + +describe('Edit preflight fixtures', () => { + it('keeps both public edit-preflight fixtures frozen at 10 tasks each with safe/unsafe balance', () => { + for (const fixture of [angularEditPreflightFixture, excalidrawEditPreflightFixture]) { + expect(fixture.tasks).toHaveLength(10); + const counts = fixture.tasks.reduce>((acc, task) => { + acc[task.risk] = (acc[task.risk] ?? 0) + 1; + return acc; + }, {}); + expect(counts.safe).toBe(6); + expect(counts.unsafe).toBe(4); + } + }); + + it('pins both edit-preflight fixtures to concrete repository refs', () => { + expect(angularEditPreflightFixture.repositoryRef).toMatch(/^[0-9a-f]{40}$/); + expect(excalidrawEditPreflightFixture.repositoryRef).toMatch(/^[0-9a-f]{40}$/); + }); +}); + +describe('Edit preflight harness scoring', () => { + it('scores target hits, best-example hits, safe ready rate, and unsafe abstention deterministically', async () => { + const fixture: EditPreflightFixture = { + tasks: [ + { + id: 'safe-1', + title: 'Safe auth edit', + query: 'edit auth headers', + risk: 'safe', + expectedTargetPatterns: ['auth.interceptor.ts'], + expectedBestExamplePatterns: ['auth.interceptor.ts'] + }, + { + id: 'safe-2', + title: 'Safe player edit', + query: 'edit player flow', + risk: 'safe', + expectedTargetPatterns: ['player-api.ts'], + expectedBestExamplePatterns: ['player-api.ts'] + }, + { + id: 'unsafe-1', + title: 'Unsafe migration', + query: 'rewrite everything', + risk: 'unsafe' + } + ] + }; + + const responses: Record = { + 'edit auth headers': { + preflight: { + ready: true, + bestExample: 'src/http/auth.interceptor.ts' + }, + searchQuality: { status: 'ok' }, + results: [ + { file: 'src/http/auth.interceptor.ts:1-20' }, + { file: 'src/http/error.interceptor.ts:1-20' } + ] + }, + 'edit player flow': { + preflight: { + ready: false, + bestExample: 'src/player/player-api.ts', + nextAction: 'Search for callers before editing.' + }, + searchQuality: { status: 'ok' }, + results: [ + { file: 'src/player/player-helper.ts:1-20' }, + { file: 'src/player/player-api.ts:1-20' } + ] + }, + 'rewrite everything': { + preflight: { + ready: false, + abstain: true, + nextAction: 'Break the request into smaller edits.' + }, + searchQuality: { status: 'low_confidence' }, + results: [{ file: 'src/app/app.ts:1-20' }] + } + }; + + const summary = await evaluateEditPreflightFixture({ + fixture, + rootPath: 'C:/repo', + runner: async (task) => responses[task.query] ?? {} + }); + + expect(summary.totalTasks).toBe(3); + expect(summary.topTargetInTop3Count).toBe(2); + expect(summary.topTargetInTop3Rate).toBe(1); + expect(summary.averageFirstRelevantHit).toBe(1.5); + expect(summary.bestExampleHitRate).toBe(1); + expect(summary.safeTaskReadyRate).toBe(0.5); + expect(summary.unsafeTaskAbstainRate).toBe(1); + expect(summary.unsafeReadyFalsePositiveRate).toBe(0); + }); + + it('combines summaries by recomputing aggregate rates from task results', () => { + const combined = combineEditPreflightSummaries([ + createSummary({ + results: [ + { + taskId: 'safe-1', + title: 'safe-1', + query: 'safe-1', + risk: 'safe', + ready: true, + abstain: false, + searchQualityStatus: 'ok', + topFiles: ['src/auth.ts'], + firstRelevantHit: 1, + topTargetInTop3: true, + bestExample: 'src/auth.ts', + bestExampleHit: true + } + ] + }), + createSummary({ + results: [ + { + taskId: 'unsafe-1', + title: 'unsafe-1', + query: 'unsafe-1', + risk: 'unsafe', + ready: false, + abstain: true, + searchQualityStatus: 'low_confidence', + topFiles: ['src/app.ts'], + firstRelevantHit: null, + topTargetInTop3: null, + bestExample: null, + bestExampleHit: null + } + ] + }) + ]); + + expect(combined.totalTasks).toBe(2); + expect(combined.safeTaskReadyRate).toBe(1); + expect(combined.unsafeTaskAbstainRate).toBe(1); + expect(combined.unsafeReadyFalsePositiveRate).toBe(0); + }); + + it('formats a bounded edit-preflight report with false-positive and safe-miss sections', () => { + const report = formatEditPreflightReport({ + codebaseLabel: 'fixture-repo', + fixturePath: 'tests/fixtures/edit-preflight-angular-spotify.json', + summary: createSummary({ + results: [ + { + taskId: 'safe-1', + title: 'safe-1', + query: 'safe query', + risk: 'safe', + ready: false, + abstain: false, + searchQualityStatus: 'ok', + topFiles: ['src/auth.ts'], + firstRelevantHit: 2, + topTargetInTop3: true, + bestExample: 'src/auth.ts', + bestExampleHit: true, + nextAction: 'Search for callers first.' + }, + { + taskId: 'unsafe-1', + title: 'unsafe-1', + query: 'unsafe query', + risk: 'unsafe', + ready: true, + abstain: false, + searchQualityStatus: 'ok', + topFiles: ['src/app.ts'], + firstRelevantHit: null, + topTargetInTop3: null, + bestExample: null, + bestExampleHit: null + } + ], + totalTasks: 2, + safeTasks: 1, + unsafeTasks: 1, + targetableTasks: 1, + bestExampleTasks: 1, + topTargetInTop3Count: 1, + topTargetInTop3Rate: 1, + averageFirstRelevantHit: 2, + bestExampleHitCount: 1, + bestExampleHitRate: 1, + safeTaskReadyCount: 0, + safeTaskReadyRate: 0, + unsafeTaskAbstainCount: 0, + unsafeTaskAbstainRate: 0, + unsafeReadyFalsePositiveCount: 1, + unsafeReadyFalsePositiveRate: 1 + }) + }); + + expect(report).toContain('Edit Preflight Eval Report'); + expect(report).toContain('Unsafe false positives:'); + expect(report).toContain('Safe misses:'); + expect(report).toContain('next: Search for callers first.'); + }); +}); + +function createSummary(overrides: Partial = {}): EditPreflightSummary { + return { + totalTasks: 0, + safeTasks: 0, + unsafeTasks: 0, + targetableTasks: 0, + bestExampleTasks: 0, + topTargetInTop3Count: 0, + topTargetInTop3Rate: null, + averageFirstRelevantHit: null, + bestExampleHitCount: 0, + bestExampleHitRate: null, + safeTaskReadyCount: 0, + safeTaskReadyRate: null, + unsafeTaskAbstainCount: 0, + unsafeTaskAbstainRate: null, + unsafeReadyFalsePositiveCount: 0, + unsafeReadyFalsePositiveRate: null, + results: [], + ...overrides + }; +} diff --git a/tests/fixtures/README.md b/tests/fixtures/README.md index 18d954c..073c71a 100644 --- a/tests/fixtures/README.md +++ b/tests/fixtures/README.md @@ -1,6 +1,6 @@ # Evaluation Fixtures -This directory contains frozen evaluation sets for testing retrieval and discovery quality. +This directory contains frozen evaluation sets for testing retrieval, discovery, and edit-preflight quality. ## Files @@ -8,6 +8,8 @@ This directory contains frozen evaluation sets for testing retrieval and discove - `eval-controlled.json` - 20 frozen retrieval queries for the in-repo controlled fixture codebase - `discovery-angular-spotify.json` - 12 discovery tasks for `angular-spotify` - `discovery-excalidraw.json` - 12 discovery tasks for `Excalidraw` +- `edit-preflight-angular-spotify.json` - 10 edit-readiness tasks for `angular-spotify` +- `edit-preflight-excalidraw.json` - 10 edit-readiness tasks for `Excalidraw` - `discovery-benchmark-protocol.json` - frozen scope, comparator set, fairness rules, and ship gate for the discovery benchmark ## Running Evaluations @@ -42,6 +44,12 @@ node scripts/run-eval.mjs tests/fixtures/codebases/eval-controlled --mode retrie node scripts/run-eval.mjs /path/to/angular-spotify /path/to/excalidraw --mode discovery ``` +### Run Edit-Preflight Evaluation + +```bash +node scripts/run-eval.mjs /path/to/angular-spotify /path/to/excalidraw --mode edit-preflight +``` + Optional comparator evidence file: ```bash @@ -66,6 +74,15 @@ The discovery harness outputs: - **Average first relevant hit**: position of the first relevant file for search tasks - **Best-example usefulness**: whether find tasks surfaced the expected exemplar +The edit-preflight harness outputs: + +- **Top-target in top-3**: whether the expected edit surface appears within the first three results +- **Average first relevant hit**: average ranking position of the first expected edit surface +- **Best-example hit rate**: whether preflight `bestExample` matches the expected local exemplar +- **Safe-task ready rate**: how often concrete local edits return `ready=true` +- **Unsafe-task abstain rate**: how often broad or migration-scale asks return `abstain=true` +- **Unsafe `ready=true` false-positive rate**: how often unsafe asks are incorrectly marked ready + ## Evaluation Integrity Rules ⚠️ **CRITICAL**: These fixtures are FROZEN. Once committed: @@ -81,6 +98,11 @@ For discovery specifically: 6. **DO NOT** claim implementation quality from this benchmark 7. **DO** keep comparator setup limitations explicit when a lane requires manual log capture +For edit-preflight specifically: + +8. **DO NOT** convert these tasks into patch-quality or autonomous-edit claims +9. **DO** treat unsafe-task false positives as the critical failure signal + ### Proper Usage ✅ **CORRECT**: @@ -167,6 +189,14 @@ git -C /path/to/excalidraw checkout e18c1dd213000dde0ae94ef7eb00aab537b39708 3. Run eval on both pinned repos 4. Compare metrics transparently +### Edit-Preflight Scope + +Edit-preflight mode is intentionally non-comparator and launch-readiness oriented: + +1. It only evaluates the shipped `search_codebase` edit preflight +2. It measures navigation/readiness signals, not code generation quality +3. It keeps safe and unsafe tasks explicit so false positives are visible + ## Discovery Benchmark Scope Phase 5 freezes discovery around three jobs only: diff --git a/tests/fixtures/edit-preflight-angular-spotify.json b/tests/fixtures/edit-preflight-angular-spotify.json new file mode 100644 index 0000000..8004e22 --- /dev/null +++ b/tests/fixtures/edit-preflight-angular-spotify.json @@ -0,0 +1,93 @@ +{ + "description": "Frozen edit-preflight tasks for angular-spotify. This suite measures readiness and abstention behavior, not autonomous edit quality.", + "codebase": "angular-spotify", + "repository": "trungk18/angular-spotify", + "repositoryUrl": "https://github.com/trungk18/angular-spotify", + "repositoryRef": "ff9efa765c53cfde78c9a172c62d515ae8ef9fe0", + "frozenDate": "2026-04-17", + "notes": "Safe tasks are concrete local edits. Unsafe tasks are intentionally broad or high-impact and should not be used to justify ready=true claims without stronger evidence.", + "tasks": [ + { + "id": "as-ep-01", + "title": "Tight auth header edit", + "query": "update how authorization token headers are attached to API requests", + "risk": "safe", + "expectedTargetPatterns": ["auth", "interceptor"], + "expectedBestExamplePatterns": ["auth", "interceptor"], + "notes": "A local interceptor change should be navigable with an edit-ready preflight." + }, + { + "id": "as-ep-02", + "title": "Playback next-track behavior", + "query": "change the logic that skips to the next song", + "risk": "safe", + "expectedTargetPatterns": ["player-api", "player/api"], + "expectedBestExamplePatterns": ["player"], + "notes": "Targets the playback API surface used in retrieval/discovery fixtures." + }, + { + "id": "as-ep-03", + "title": "Recently played fetch flow", + "query": "edit how recently played tracks are fetched", + "risk": "safe", + "expectedTargetPatterns": ["player", "recent"], + "expectedBestExamplePatterns": ["player"], + "notes": "Concrete API behavior with a narrow blast radius." + }, + { + "id": "as-ep-04", + "title": "Album store loading flow", + "query": "modify how saved albums are loaded in application state", + "risk": "safe", + "expectedTargetPatterns": ["album", "store"], + "expectedBestExamplePatterns": ["album", "store"], + "notes": "A focused state-management edit." + }, + { + "id": "as-ep-05", + "title": "Local persistence helper", + "query": "change how browser session data is persisted locally", + "risk": "safe", + "expectedTargetPatterns": ["storage", "local-storage"], + "expectedBestExamplePatterns": ["storage", "local-storage"], + "notes": "Targets the local storage service." + }, + { + "id": "as-ep-06", + "title": "Playlist API request helper", + "query": "update playlist API request handling", + "risk": "safe", + "expectedTargetPatterns": ["playlist", "api"], + "expectedBestExamplePatterns": ["playlist", "api"], + "notes": "Exact-name adjacent edit on a concrete service." + }, + { + "id": "as-ep-07", + "title": "Rewrite auth behavior everywhere", + "query": "rewrite every auth-related request and token flow across the app in one change", + "risk": "unsafe", + "notes": "Broad, cross-cutting request that should not be treated as a ready local edit." + }, + { + "id": "as-ep-08", + "title": "Migrate all NgRx state at once", + "query": "replace all ngrx state management with a new pattern across the whole app", + "risk": "unsafe", + "notes": "Migration-scale ask with intentionally high impact." + }, + { + "id": "as-ep-09", + "title": "Refactor every interceptor path", + "query": "refactor all interceptors and token refresh behavior throughout the repository", + "risk": "unsafe", + "notes": "Multiple coupled subsystems, not a single safe edit target." + }, + { + "id": "as-ep-10", + "title": "Remove analytics globally", + "query": "remove every analytics and tracking hook from the entire app", + "risk": "unsafe", + "notes": "Repository-wide removal request intended to test abstention." + } + ] +} diff --git a/tests/fixtures/edit-preflight-excalidraw.json b/tests/fixtures/edit-preflight-excalidraw.json new file mode 100644 index 0000000..1a45a27 --- /dev/null +++ b/tests/fixtures/edit-preflight-excalidraw.json @@ -0,0 +1,93 @@ +{ + "description": "Frozen edit-preflight tasks for Excalidraw. This suite measures whether the current preflight finds the right edit surface and abstains on unsafe asks.", + "codebase": "Excalidraw", + "repository": "excalidraw/excalidraw", + "repositoryUrl": "https://github.com/excalidraw/excalidraw", + "repositoryRef": "e18c1dd213000dde0ae94ef7eb00aab537b39708", + "frozenDate": "2026-04-17", + "notes": "Safe tasks stay local to a scene, element, serialization, or app-state surface. Unsafe tasks intentionally span multiple subsystems or migration-scale edits.", + "tasks": [ + { + "id": "ex-ep-01", + "title": "Scene update flow", + "query": "change how scene updates are applied", + "risk": "safe", + "expectedTargetPatterns": ["scene"], + "expectedBestExamplePatterns": ["scene"], + "notes": "Focused scene-edit behavior used in current discovery coverage." + }, + { + "id": "ex-ep-02", + "title": "Element type definitions", + "query": "edit element type definitions", + "risk": "safe", + "expectedTargetPatterns": ["element", "type"], + "expectedBestExamplePatterns": ["element", "type"], + "notes": "Concrete type-oriented edit surface." + }, + { + "id": "ex-ep-03", + "title": "Scene JSON serialization", + "query": "modify scene serialization to json export", + "risk": "safe", + "expectedTargetPatterns": ["scene", "json", "data"], + "expectedBestExamplePatterns": ["scene", "json", "data"], + "notes": "Narrow export/serialization edit." + }, + { + "id": "ex-ep-04", + "title": "App state selection flow", + "query": "change app state selection and update logic", + "risk": "safe", + "expectedTargetPatterns": ["appstate", "state", "app"], + "expectedBestExamplePatterns": ["appstate", "state", "app"], + "notes": "Local app-state behavior." + }, + { + "id": "ex-ep-05", + "title": "Canvas entry interaction", + "query": "edit the main canvas app entry behavior", + "risk": "safe", + "expectedTargetPatterns": ["app", "excalidraw", "canvas"], + "expectedBestExamplePatterns": ["app", "excalidraw", "canvas"], + "notes": "Concrete entry-surface edit." + }, + { + "id": "ex-ep-06", + "title": "Element mutation helper", + "query": "change how elements are updated after scene edits", + "risk": "safe", + "expectedTargetPatterns": ["element", "scene"], + "expectedBestExamplePatterns": ["element", "scene"], + "notes": "Targets the local element mutation path without asking for repo-wide migration." + }, + { + "id": "ex-ep-07", + "title": "Rewrite scene mutation architecture", + "query": "rewrite all scene mutation flows across the whole app in one pass", + "risk": "unsafe", + "notes": "Broad architectural request intended to trigger abstention." + }, + { + "id": "ex-ep-08", + "title": "Replace state model globally", + "query": "migrate every app state update path to a new state architecture", + "risk": "unsafe", + "notes": "Migration-scale change across the repository." + }, + { + "id": "ex-ep-09", + "title": "Refactor export and collaboration together", + "query": "change the entire export pipeline and collaboration serialization at once", + "risk": "unsafe", + "notes": "Coupled multi-subsystem change that should not look edit-ready from one search." + }, + { + "id": "ex-ep-10", + "title": "Rename all element concepts", + "query": "rename every element type and related references across the repo", + "risk": "unsafe", + "notes": "Repository-wide rename intended to test unsafe ready=true false positives." + } + ] +} diff --git a/tests/fixtures/mcp/wrapper-echo-server.mjs b/tests/fixtures/mcp/wrapper-echo-server.mjs new file mode 100644 index 0000000..1d6d6c5 --- /dev/null +++ b/tests/fixtures/mcp/wrapper-echo-server.mjs @@ -0,0 +1,62 @@ +import { spawn } from 'node:child_process'; +import { writeFileSync } from 'node:fs'; +import path from 'node:path'; +import { fileURLToPath } from 'node:url'; + +const __dirname = path.dirname(fileURLToPath(import.meta.url)); +const pidFile = process.env.MCP_TEST_PID_FILE; + +const echoChild = spawn(process.execPath, [path.join(__dirname, 'echo-server.mjs')], { + stdio: ['pipe', 'pipe', 'inherit'] +}); + +const sidecar = spawn(process.execPath, [path.join(__dirname, 'hanging-server.mjs')], { + stdio: 'ignore' +}); + +function cleanupChildren() { + for (const child of [echoChild, sidecar]) { + if (child.pid) { + try { + process.kill(child.pid, 'SIGTERM'); + } catch { + // Best-effort cleanup for test fixture shutdown. + } + } + } +} + +if (pidFile) { + writeFileSync( + pidFile, + JSON.stringify({ wrapperPid: process.pid, echoPid: echoChild.pid, sidecarPid: sidecar.pid }), + 'utf8' + ); +} + +process.stdin.pipe(echoChild.stdin); +echoChild.stdout.pipe(process.stdout); + +process.once('SIGTERM', () => { + cleanupChildren(); + process.exit(0); +}); +process.once('SIGINT', () => { + cleanupChildren(); + process.exit(0); +}); +process.once('SIGHUP', () => { + cleanupChildren(); + process.exit(0); +}); + +echoChild.on('exit', (code) => { + cleanupChildren(); + process.exit(code ?? 0); +}); + +echoChild.on('error', (error) => { + cleanupChildren(); + console.error(error); + process.exit(1); +}); diff --git a/tests/fixtures/mcp/wrapper-hanging-server.mjs b/tests/fixtures/mcp/wrapper-hanging-server.mjs new file mode 100644 index 0000000..7636a76 --- /dev/null +++ b/tests/fixtures/mcp/wrapper-hanging-server.mjs @@ -0,0 +1,37 @@ +import { spawn } from 'node:child_process'; +import { writeFileSync } from 'node:fs'; +import path from 'node:path'; +import { fileURLToPath } from 'node:url'; + +const __dirname = path.dirname(fileURLToPath(import.meta.url)); +const pidFile = process.env.MCP_TEST_PID_FILE; + +const sidecar = spawn(process.execPath, [path.join(__dirname, 'hanging-server.mjs')], { + stdio: 'ignore' +}); + +function cleanupAndExit(code = 0) { + if (sidecar.pid) { + try { + process.kill(sidecar.pid, 'SIGTERM'); + } catch { + // Best-effort cleanup for test fixture shutdown. + } + } + + process.exit(code); +} + +if (pidFile) { + writeFileSync( + pidFile, + JSON.stringify({ wrapperPid: process.pid, sidecarPid: sidecar.pid }), + 'utf8' + ); +} + +process.once('SIGTERM', () => cleanupAndExit(0)); +process.once('SIGINT', () => cleanupAndExit(0)); +process.once('SIGHUP', () => cleanupAndExit(0)); + +setInterval(() => {}, 1000); diff --git a/tests/get-codebase-health.test.ts b/tests/get-codebase-health.test.ts new file mode 100644 index 0000000..3082629 --- /dev/null +++ b/tests/get-codebase-health.test.ts @@ -0,0 +1,125 @@ +import { describe, expect, it } from 'vitest'; +import { promises as fs } from 'fs'; +import os from 'os'; +import path from 'path'; +import { handle } from '../src/tools/get-codebase-health.js'; +import type { ToolContext } from '../src/tools/types.js'; + +async function createContextRoot(): Promise<{ root: string; ctx: ToolContext }> { + const root = await fs.mkdtemp(path.join(os.tmpdir(), 'codebase-health-tool-')); + const healthPath = path.join(root, '.codebase-context', 'health.json'); + await fs.mkdir(path.dirname(healthPath), { recursive: true }); + + const ctx: ToolContext = { + indexState: { status: 'ready' }, + paths: { + baseDir: path.join(root, '.codebase-context'), + memory: path.join(root, '.codebase-context', 'memory.json'), + intelligence: path.join(root, '.codebase-context', 'intelligence.json'), + health: healthPath, + keywordIndex: path.join(root, '.codebase-context', 'index.json'), + vectorDb: path.join(root, '.codebase-context', 'index') + }, + rootPath: root, + performIndexing: () => undefined + }; + + return { root, ctx }; +} + +describe('get_codebase_health', () => { + it('returns filtered top-risk files from health.json', async () => { + const { root, ctx } = await createContextRoot(); + try { + await fs.writeFile( + ctx.paths.health, + JSON.stringify( + { + header: { buildId: 'build-1', formatVersion: 1 }, + generatedAt: '2026-04-17T00:00:00.000Z', + summary: { + files: 2, + highRiskFiles: 1, + mediumRiskFiles: 1, + lowRiskFiles: 0 + }, + files: [ + { + file: 'src/auth/auth.service.ts', + level: 'high', + score: 5, + reasons: ['High fan-in: 9 files depend on it'] + }, + { + file: 'src/auth/token.store.ts', + level: 'medium', + score: 2, + reasons: ['Moderate code complexity (cyclomatic 11)'] + } + ] + }, + null, + 2 + ) + ); + + const response = await handle({ level: 'high' }, ctx); + const payload = JSON.parse(response.content?.[0]?.text ?? '{}') as { + status: string; + files: Array<{ file: string; level: string }>; + }; + + expect(payload.status).toBe('success'); + expect(payload.files).toHaveLength(1); + expect(payload.files[0]).toMatchObject({ + file: 'src/auth/auth.service.ts', + level: 'high' + }); + } finally { + await fs.rm(root, { recursive: true, force: true }); + } + }); + + it('returns a single file record lookup', async () => { + const { root, ctx } = await createContextRoot(); + try { + await fs.writeFile( + ctx.paths.health, + JSON.stringify( + { + header: { buildId: 'build-2', formatVersion: 1 }, + generatedAt: '2026-04-17T00:00:00.000Z', + summary: { + files: 1, + highRiskFiles: 1, + mediumRiskFiles: 0, + lowRiskFiles: 0 + }, + files: [ + { + file: 'src/auth/auth.service.ts', + level: 'high', + score: 4, + reasons: ['Hotspot rank #2 by graph centrality'] + } + ] + }, + null, + 2 + ) + ); + + const response = await handle({ file: 'src/auth/auth.service.ts' }, ctx); + const payload = JSON.parse(response.content?.[0]?.text ?? '{}') as { + status: string; + file?: { file: string; level: string; reasons: string[] }; + }; + + expect(payload.status).toBe('success'); + expect(payload.file?.level).toBe('high'); + expect(payload.file?.reasons[0]).toContain('Hotspot rank'); + } finally { + await fs.rm(root, { recursive: true, force: true }); + } + }); +}); diff --git a/tests/memory-store-scope.test.ts b/tests/memory-store-scope.test.ts new file mode 100644 index 0000000..e5108f6 --- /dev/null +++ b/tests/memory-store-scope.test.ts @@ -0,0 +1,65 @@ +import { describe, expect, it } from 'vitest'; +import { + buildMemoryIdentityParts, + normalizeMemory, + normalizeMemoryScope +} from '../src/memory/store.js'; + +describe('memory scope normalization', () => { + it('normalizes file and symbol scopes to forward-slash paths', () => { + expect(normalizeMemoryScope({ kind: 'file', file: '.\\src\\auth\\auth.service.ts' })).toEqual({ + kind: 'file', + file: 'src/auth/auth.service.ts' + }); + + expect( + normalizeMemoryScope({ + kind: 'symbol', + file: '.\\src\\auth\\auth.service.ts', + symbol: 'AuthService' + }) + ).toEqual({ + kind: 'symbol', + file: 'src/auth/auth.service.ts', + symbol: 'AuthService' + }); + }); + + it('keeps scoped and global memories distinct in identity hashing inputs', () => { + const base = { + type: 'decision' as const, + category: 'architecture' as const, + memory: 'Use AuthService for token reads', + reason: 'Direct token reads bypass refresh behavior.' + }; + + expect(buildMemoryIdentityParts(base)).not.toBe( + buildMemoryIdentityParts({ + ...base, + scope: { kind: 'file', file: 'src/auth/auth.service.ts' } + }) + ); + }); + + it('parses scoped memories from raw JSON payloads', () => { + const normalized = normalizeMemory({ + id: 'abc123def456', + type: 'gotcha', + category: 'architecture', + memory: 'Avoid direct token reads', + reason: 'They skip refresh logic.', + date: '2026-04-17T00:00:00.000Z', + scope: { + kind: 'symbol', + file: 'src/auth/auth.service.ts', + symbol: 'AuthService' + } + }); + + expect(normalized?.scope).toEqual({ + kind: 'symbol', + file: 'src/auth/auth.service.ts', + symbol: 'AuthService' + }); + }); +}); diff --git a/tests/multi-project-routing.test.ts b/tests/multi-project-routing.test.ts index 6240082..fa8f181 100644 --- a/tests/multi-project-routing.test.ts +++ b/tests/multi-project-routing.test.ts @@ -11,7 +11,12 @@ import { KEYWORD_INDEX_FILENAME, VECTOR_DB_DIRNAME } from '../src/constants/codebase-context.js'; -import { CONTEXT_RESOURCE_URI, buildProjectContextResourceUri } from '../src/resources/uri.js'; +import { + CONTEXT_RESOURCE_URI, + FULL_CONTEXT_RESOURCE_URI, + buildProjectContextResourceUri, + buildProjectFullContextResourceUri +} from '../src/resources/uri.js'; interface SearchResultRow { summary: string; @@ -583,6 +588,64 @@ describe('multi-project routing', () => { expect(response.contents[0]?.text).not.toContain('Project selection required'); }); + it('lists bounded and full context resources for active and project-scoped flows', async () => { + const { server } = await import('../src/index.js'); + const toolHandler = (server as unknown as TestServer)._requestHandlers.get('tools/call'); + const resourcesHandler = (server as unknown as TestServer)._requestHandlers.get('resources/list'); + + if (!toolHandler || !resourcesHandler) { + throw new Error('required handlers not registered'); + } + + await callTool(toolHandler, 140, 'search_codebase', { + query: 'feature', + project: secondaryRoot + }); + + const response = (await resourcesHandler({ + jsonrpc: '2.0', + id: 141, + method: 'resources/list', + params: {} + })) as { resources: Array<{ uri: string }> }; + + const uris = response.resources.map((resource) => resource.uri); + expect(uris).toContain(CONTEXT_RESOURCE_URI); + expect(uris).toContain(FULL_CONTEXT_RESOURCE_URI); + expect(uris).toContain(buildProjectContextResourceUri(primaryRoot)); + expect(uris).toContain(buildProjectFullContextResourceUri(primaryRoot)); + expect(uris).toContain(buildProjectContextResourceUri(secondaryRoot)); + expect(uris).toContain(buildProjectFullContextResourceUri(secondaryRoot)); + }); + + it('generic full context resource follows the active project after selection', async () => { + const { server } = await import('../src/index.js'); + const requestHandler = (server as unknown as TestServer)._requestHandlers.get('tools/call'); + const resourceHandler = (server as unknown as TestServer)._requestHandlers.get( + 'resources/read' + ); + + if (!requestHandler || !resourceHandler) { + throw new Error('required handlers not registered'); + } + + await callTool(requestHandler, 142, 'search_codebase', { + query: 'feature', + project: secondaryRoot + }); + + const response = (await resourceHandler({ + jsonrpc: '2.0', + id: 143, + method: 'resources/read', + params: { uri: FULL_CONTEXT_RESOURCE_URI } + })) as ResourceReadResponse; + + expect(response.contents[0]?.uri).toBe(FULL_CONTEXT_RESOURCE_URI); + expect(response.contents[0]?.text).toContain('# Codebase Map'); + expect(response.contents[0]?.text).not.toContain('Project selection required'); + }); + it('builds a workspace overview for multiple configured roots before selection', async () => { const { server, refreshKnownRootsFromClient } = await import('../src/index.js'); const typedServer = server as unknown as TestServer & { @@ -615,6 +678,7 @@ describe('multi-project routing', () => { 'client-announced roots as the workspace boundary' ); expect(response.contents[0]?.text).toContain('codebase://context/project/'); + expect(response.contents[0]?.text).toContain('codebase://context/full/project/'); expect(response.contents[0]?.text).toContain('retry tool calls with `project`'); expect(response.contents[0]?.text).toContain('apps/dashboard'); expect(response.contents[0]?.text).toMatch(/\[(idle|indexing|ready)\]/); @@ -661,6 +725,18 @@ describe('multi-project routing', () => { expect(response.contents[0]?.uri).toBe(buildProjectContextResourceUri(payload.project.project)); expect(response.contents[0]?.text).toContain('# Codebase Map'); + + const fullResponse = (await resourceHandler({ + jsonrpc: '2.0', + id: 171, + method: 'resources/read', + params: { uri: buildProjectFullContextResourceUri(payload.project.project) } + })) as ResourceReadResponse; + + expect(fullResponse.contents[0]?.uri).toBe( + buildProjectFullContextResourceUri(payload.project.project) + ); + expect(fullResponse.contents[0]?.text).toContain('# Codebase Map'); }); it('returns unknown_project error when project path does not exist', async () => { diff --git a/tests/proof-truth-surfaces.test.ts b/tests/proof-truth-surfaces.test.ts new file mode 100644 index 0000000..cabb7a5 --- /dev/null +++ b/tests/proof-truth-surfaces.test.ts @@ -0,0 +1,148 @@ +import { existsSync, readFileSync } from 'node:fs'; +import { resolve } from 'node:path'; +import { describe, expect, it } from 'vitest'; + +const root = resolve(import.meta.dirname, '..'); + +type GateComparator = { + comparatorName: string; + status: string; +}; + +type GateArtifact = { + gate: { + status: string; + claimAllowed: boolean; + baseline: { + status: string; + missingMetrics?: string[]; + }; + comparators: GateComparator[]; + }; +}; + +type ComparatorArtifact = { + status: string; + averageFirstRelevantHit?: number | null; +}; + +type ComparatorEvidence = Record; + +function readText(relPath: string): string { + return readFileSync(resolve(root, relPath), 'utf8'); +} + +function readOptionalText(relPath: string): string | null { + const absPath = resolve(root, relPath); + if (!existsSync(absPath)) { + return null; + } + + return readFileSync(absPath, 'utf8'); +} + +function readJson(relPath: string): T { + return JSON.parse(readText(relPath)) as T; +} + +const gateArtifact = readJson('results/gate-evaluation.json'); +const comparatorEvidence = readJson('results/comparator-evidence.json'); + +const benchmarkDoc = readText('docs/benchmark.md'); +const comparisonDoc = readText('docs/comparison-table.md'); +const registryChecklist = readText('docs/registry-sync-checklist.md'); +const readme = readText('README.md'); +const capabilities = readText('docs/capabilities.md'); +const demo = readText('docs/demo.md'); +const spec = readOptionalText('.planning/SPEC.md'); +const roadmap = readOptionalText('.planning/ROADMAP.md'); +const milestones = readOptionalText('.planning/MILESTONES.md'); + +function expectContains(text: string, snippets: string[]): void { + for (const snippet of snippets) { + expect(text).toContain(snippet); + } +} + +describe('proof truth surfaces', () => { + it('reads the current blocked discovery artifacts', () => { + expect(gateArtifact.gate.status).toBeTruthy(); + expect(typeof gateArtifact.gate.claimAllowed).toBe('boolean'); + expect(comparatorEvidence['raw Claude Code']).toBeDefined(); + expect(comparatorEvidence['codebase-memory-mcp']).toBeDefined(); + }); + + it('keeps the proof docs aligned to the current gate artifact', () => { + expectContains(benchmarkDoc, [ + 'discovery benchmark', + `\`${gateArtifact.gate.status}\``, + '`claimAllowed`' + ]); + expectContains(comparisonDoc, [ + 'Comparator Summary', + `\`${gateArtifact.gate.status}\``, + `claimAllowed\` stays \`${String(gateArtifact.gate.claimAllowed)}\`` + ]); + expectContains(registryChecklist, [ + `claimAllowed: ${String(gateArtifact.gate.claimAllowed)}`, + gateArtifact.gate.status + ]); + }); + + it('documents the raw-Claude missing-metric caveat when the artifact still lacks ranked-hit evidence', () => { + const rawClaude = comparatorEvidence['raw Claude Code']; + const rawClaudeGate = gateArtifact.gate.baseline; + + if (rawClaude.averageFirstRelevantHit === null) { + expect(rawClaudeGate.status).toBe('pending_evidence'); + expect(rawClaudeGate.missingMetrics ?? []).toContain('averageFirstRelevantHit'); + expect(benchmarkDoc).toMatch(/raw Claude Code[\s\S]*averageFirstRelevantHit[\s\S]*null/i); + expect(comparisonDoc).toMatch(/raw Claude Code[\s\S]*pending_evidence/i); + expect(registryChecklist).toContain('averageFirstRelevantHit: null'); + } + }); + + it('reflects comparator gate failures and setup failures from the checked-in evidence', () => { + const codebaseMemoryGate = gateArtifact.gate.comparators.find( + (comparator) => comparator.comparatorName === 'codebase-memory-mcp' + ); + + if (codebaseMemoryGate?.status === 'failed') { + expect(benchmarkDoc).toMatch(/codebase-memory-mcp[\s\S]*gate: `failed`/i); + expect(comparisonDoc).toMatch(/codebase-memory-mcp[\s\S]*gate: `failed`/i); + expect(registryChecklist).toContain('comparator artifact `ok` but gate `failed`'); + } + + const setupFailedComparators = Object.entries(comparatorEvidence) + .filter(([, artifact]) => artifact.status === 'setup_failed') + .map(([name]) => name); + + for (const comparatorName of setupFailedComparators) { + expect(benchmarkDoc).toContain(`\`${comparatorName}\``); + expect(comparisonDoc).toContain(`\`${comparatorName}\``); + } + }); + + it('keeps package-facing proof mentions secondary and discovery-only', () => { + expectContains(readme, ['discovery-only proof', gateArtifact.gate.status, 'claimAllowed']); + expectContains(capabilities, [ + 'discovery-only', + gateArtifact.gate.status, + `claimAllowed: ${String(gateArtifact.gate.claimAllowed)}` + ]); + expectContains(demo, [gateArtifact.gate.status, 'claimAllowed']); + }); + + it('keeps shared planning summaries aligned to the same proof posture', () => { + if (!spec || !roadmap || !milestones) { + return; + } + + expectContains(spec, ['[PROOF-02]', 'discovery benchmark', 'claimAllowed: false']); + expectContains(roadmap, [ + 'Phase 28: Keep Discovery Proof Honest and Align Truth Surfaces', + 'claimAllowed: false' + ]); + expectContains(milestones, ['What Phase 28 aligned:', 'claimAllowed: false']); + }); +}); diff --git a/tests/release-truth-surfaces.test.ts b/tests/release-truth-surfaces.test.ts new file mode 100644 index 0000000..c6d8e60 --- /dev/null +++ b/tests/release-truth-surfaces.test.ts @@ -0,0 +1,121 @@ +import { existsSync, readFileSync } from 'node:fs'; +import { resolve } from 'node:path'; +import { describe, expect, it } from 'vitest'; + +const root = resolve(import.meta.dirname, '..'); + +type PackageJson = { + version: string; + files?: string[]; +}; + +type ReleaseManifest = { + '.': string; +}; + +function readText(relPath: string): string { + return readFileSync(resolve(root, relPath), 'utf8'); +} + +function readOptionalText(relPath: string): string | null { + const absPath = resolve(root, relPath); + if (!existsSync(absPath)) { + return null; + } + + return readFileSync(absPath, 'utf8'); +} + +function readJson(relPath: string): T { + return JSON.parse(readText(relPath)) as T; +} + +function normalizePath(target: string): string { + return target.replace(/^\.\/+/, '').replace(/\\/g, '/'); +} + +function stripFragment(target: string): string { + return target.split('#', 1)[0] ?? target; +} + +function isStableExternalUrl(target: string): boolean { + return /^https?:\/\//.test(target); +} + +function isPackagedPath(target: string, packagedPaths: string[]): boolean { + const normalizedTarget = normalizePath(stripFragment(target)); + return packagedPaths.some((entry) => { + const normalizedEntry = normalizePath(entry); + return ( + normalizedTarget === normalizedEntry || normalizedTarget.startsWith(`${normalizedEntry}/`) + ); + }); +} + +function extractMarkdownLinks(markdown: string): string[] { + const matches = markdown.matchAll(/\[[^\]]+\]\(([^)]+)\)/g); + const links: string[] = []; + + for (const match of matches) { + if (match.index != null && match.index > 0 && markdown[match.index - 1] === '!') { + continue; + } + + const href = match[1]?.trim(); + if (href) { + links.push(href); + } + } + + return links; +} + +describe('release truth surfaces', () => { + const packageJson = readJson('package.json'); + const releaseManifest = readJson('.release-please-manifest.json'); + const changelog = readText('CHANGELOG.md'); + const readme = readText('README.md'); + const workflow = readText('.github/workflows/publish-npm-on-release.yml'); + const todoDoc = readOptionalText('docs/TODO.md'); + const visualsDoc = readOptionalText('docs/visuals.md'); + const packagedPaths = ['README.md', 'LICENSE', ...(packageJson.files ?? [])]; + + it('keeps package metadata, release manifest, and changelog on 2.2.0', () => { + expect(packageJson.version).toBe('2.2.0'); + expect(releaseManifest['.']).toBe('2.2.0'); + expect(changelog).toContain('## [2.2.0]'); + expect(changelog).not.toContain('## Unreleased'); + }); + + it('limits packaged README links to shipped files or stable external URLs', () => { + const invalidLinks = extractMarkdownLinks(readme).filter((href) => { + if (href.startsWith('#')) { + return false; + } + + if (isStableExternalUrl(href)) { + return false; + } + + return !isPackagedPath(href, packagedPaths); + }); + + expect(invalidLinks).toEqual([]); + }); + + it('marks the stale launch-planning docs as historical reference only', () => { + if (!todoDoc || !visualsDoc) { + return; + } + + expect(todoDoc).toContain('historical reference'); + expect(todoDoc).toContain('.planning/ROADMAP.md'); + expect(visualsDoc).toContain('Historical reference only'); + expect(visualsDoc).toContain('Historical snapshot'); + }); + + it('keeps the manual publish fallback aligned to v2.2.0', () => { + expect(workflow).toContain("description: 'Tag to publish (e.g. v2.2.0)'"); + expect(workflow).toContain("default: 'v2.2.0'"); + }); +}); diff --git a/tests/resource-uri.test.ts b/tests/resource-uri.test.ts index 37757f6..11d3c15 100644 --- a/tests/resource-uri.test.ts +++ b/tests/resource-uri.test.ts @@ -1,8 +1,12 @@ import { describe, it, expect } from 'vitest'; import { buildProjectContextResourceUri, + buildProjectFullContextResourceUri, CONTEXT_RESOURCE_URI, + FULL_CONTEXT_RESOURCE_URI, + getProjectPathFromFullContextResourceUri, getProjectPathFromContextResourceUri, + isFullContextResourceUri, isContextResourceUri, normalizeResourceUri } from '../src/resources/uri.js'; @@ -13,12 +17,23 @@ describe('resource URI normalization', () => { expect(isContextResourceUri(CONTEXT_RESOURCE_URI)).toBe(true); }); + it('accepts canonical full resource URI', () => { + expect(normalizeResourceUri(FULL_CONTEXT_RESOURCE_URI)).toBe(FULL_CONTEXT_RESOURCE_URI); + expect(isFullContextResourceUri(FULL_CONTEXT_RESOURCE_URI)).toBe(true); + }); + it('accepts namespaced resource URI from some MCP hosts', () => { const namespaced = `codebase-context/${CONTEXT_RESOURCE_URI}`; expect(normalizeResourceUri(namespaced)).toBe(CONTEXT_RESOURCE_URI); expect(isContextResourceUri(namespaced)).toBe(true); }); + it('accepts namespaced full resource URI from some MCP hosts', () => { + const namespaced = `codebase-context/${FULL_CONTEXT_RESOURCE_URI}`; + expect(normalizeResourceUri(namespaced)).toBe(FULL_CONTEXT_RESOURCE_URI); + expect(isFullContextResourceUri(namespaced)).toBe(true); + }); + it('round-trips project-scoped context URIs', () => { const projectPath = '/repo/apps/dashboard'; const uri = buildProjectContextResourceUri(projectPath); @@ -27,9 +42,19 @@ describe('resource URI normalization', () => { expect(getProjectPathFromContextResourceUri(`host/${uri}`)).toBe(projectPath); }); + it('round-trips project-scoped full context URIs', () => { + const projectPath = '/repo/apps/dashboard'; + const uri = buildProjectFullContextResourceUri(projectPath); + expect(uri).toBe('codebase://context/full/project/%2Frepo%2Fapps%2Fdashboard'); + expect(getProjectPathFromFullContextResourceUri(uri)).toBe(projectPath); + expect(getProjectPathFromFullContextResourceUri(`host/${uri}`)).toBe(projectPath); + }); + it('rejects unknown URIs', () => { expect(isContextResourceUri('codebase://other')).toBe(false); + expect(isFullContextResourceUri('codebase://other')).toBe(false); expect(isContextResourceUri('other/codebase://other')).toBe(false); expect(getProjectPathFromContextResourceUri('codebase://other')).toBeUndefined(); + expect(getProjectPathFromFullContextResourceUri('codebase://other')).toBeUndefined(); }); }); diff --git a/tests/run-eval-config.test.ts b/tests/run-eval-config.test.ts new file mode 100644 index 0000000..3667473 --- /dev/null +++ b/tests/run-eval-config.test.ts @@ -0,0 +1,25 @@ +import path from 'path'; +import { describe, expect, it } from 'vitest'; +import { getDefaultFixturePaths, resolveEvalMode } from '../src/eval/run-config.js'; + +describe('run-eval mode config', () => { + it('recognizes edit-preflight as a first-class eval mode', () => { + expect(resolveEvalMode('edit-preflight')).toBe('edit-preflight'); + expect(resolveEvalMode('discovery')).toBe('discovery'); + expect(resolveEvalMode('retrieval')).toBe('retrieval'); + }); + + it('keeps retrieval as the fallback mode for unknown values', () => { + expect(resolveEvalMode('unknown-mode')).toBe('retrieval'); + expect(resolveEvalMode(undefined)).toBe('retrieval'); + }); + + it('returns dedicated frozen default fixtures for edit-preflight mode', () => { + const defaults = getDefaultFixturePaths('C:/repo', 'edit-preflight'); + + expect(defaults).toEqual({ + fixtureA: path.join('C:/repo', 'tests', 'fixtures', 'edit-preflight-angular-spotify.json'), + fixtureB: path.join('C:/repo', 'tests', 'fixtures', 'edit-preflight-excalidraw.json') + }); + }); +}); diff --git a/tests/search-health-scope.test.ts b/tests/search-health-scope.test.ts new file mode 100644 index 0000000..2b5e932 --- /dev/null +++ b/tests/search-health-scope.test.ts @@ -0,0 +1,164 @@ +import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'; +import { promises as fs } from 'fs'; +import os from 'os'; +import path from 'path'; +import { handle } from '../src/tools/search-codebase.js'; +import type { ToolContext } from '../src/tools/types.js'; + +const searchMocks = vi.hoisted(() => ({ + search: vi.fn() +})); + +vi.mock('../src/core/search.js', async () => { + class CodebaseSearcher { + constructor(_rootPath: string) {} + + async search(query: string, limit: number, filters?: unknown) { + return searchMocks.search(query, limit, filters); + } + } + + return { CodebaseSearcher }; +}); + +describe('search_codebase health and scoped memories', () => { + let tempRoot: string; + let ctx: ToolContext; + + beforeEach(async () => { + searchMocks.search.mockReset(); + tempRoot = await fs.mkdtemp(path.join(os.tmpdir(), 'search-health-scope-')); + + const contextDir = path.join(tempRoot, '.codebase-context'); + await fs.mkdir(contextDir, { recursive: true }); + + ctx = { + indexState: { status: 'ready' }, + paths: { + baseDir: contextDir, + memory: path.join(contextDir, 'memory.json'), + intelligence: path.join(contextDir, 'intelligence.json'), + health: path.join(contextDir, 'health.json'), + keywordIndex: path.join(contextDir, 'index.json'), + vectorDb: path.join(contextDir, 'index') + }, + rootPath: tempRoot, + performIndexing: () => undefined + }; + + await fs.writeFile( + ctx.paths.memory, + JSON.stringify( + [ + { + id: 'scoped-memory', + type: 'gotcha', + category: 'architecture', + memory: 'Avoid direct token reads', + reason: 'They bypass AuthService refresh logic.', + date: '2026-04-17T00:00:00.000Z', + scope: { + kind: 'symbol', + file: 'src/auth/auth.service.ts', + symbol: 'AuthService' + } + }, + { + id: 'global-memory', + type: 'decision', + category: 'architecture', + memory: 'Use auth interceptors', + reason: 'They keep HTTP token injection consistent.', + date: '2026-04-17T00:00:00.000Z' + } + ], + null, + 2 + ) + ); + + await fs.writeFile( + ctx.paths.intelligence, + JSON.stringify( + { + header: { buildId: 'build-1', formatVersion: 1 }, + generatedAt: '2026-04-17T00:00:00.000Z', + patterns: { + stateManagement: { + primary: { + name: 'Signals', + frequency: '78%', + trend: 'Stable' + } + } + }, + goldenFiles: [{ file: 'src/auth/auth.service.ts', score: 0.97 }], + internalFileGraph: { + imports: { + 'src/app/auth-shell.ts': ['src/auth/auth.service.ts'] + } + } + }, + null, + 2 + ) + ); + + await fs.writeFile( + ctx.paths.health, + JSON.stringify( + { + header: { buildId: 'build-1', formatVersion: 1 }, + generatedAt: '2026-04-17T00:00:00.000Z', + summary: { + files: 1, + highRiskFiles: 1, + mediumRiskFiles: 0, + lowRiskFiles: 0 + }, + files: [ + { + file: 'src/auth/auth.service.ts', + level: 'high', + score: 5, + reasons: ['High fan-in: 9 files depend on it'] + } + ] + }, + null, + 2 + ) + ); + }); + + afterEach(async () => { + await fs.rm(tempRoot, { recursive: true, force: true }); + }); + + it('surfaces file health and prioritizes scoped memories in search output', async () => { + searchMocks.search.mockResolvedValueOnce([ + { + summary: 'Auth service token management', + snippet: 'export class AuthService { getToken() { return token; } }', + filePath: 'src/auth/auth.service.ts', + startLine: 1, + endLine: 20, + score: 0.91, + language: 'ts', + metadata: { symbolName: 'AuthService', symbolKind: 'class', symbolPath: ['AuthService'] }, + relevanceReason: 'Matches auth service token query' + } + ]); + + const response = await handle({ query: 'auth service token', intent: 'edit' }, ctx); + const payload = JSON.parse(response.content?.[0]?.text ?? '{}') as { + preflight?: { health?: { level: string; reasons?: string[] } }; + relatedMemories?: string[]; + results: Array<{ health?: { level: string; reasons?: string[] } }>; + }; + + expect(payload.preflight?.health?.level).toBe('high'); + expect(payload.results[0]?.health?.level).toBe('high'); + expect(payload.relatedMemories?.[0]).toContain('src/auth/auth.service.ts#AuthService'); + }); +}); diff --git a/tests/search-scoped-memory.test.ts b/tests/search-scoped-memory.test.ts new file mode 100644 index 0000000..37e4b62 --- /dev/null +++ b/tests/search-scoped-memory.test.ts @@ -0,0 +1,128 @@ +import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'; +import { promises as fs } from 'fs'; +import os from 'os'; +import path from 'path'; +import { handle } from '../src/tools/search-codebase.js'; +import type { ToolContext } from '../src/tools/types.js'; + +const searchMocks = vi.hoisted(() => ({ + search: vi.fn() +})); + +vi.mock('../src/core/search.js', async () => { + class CodebaseSearcher { + constructor(_rootPath: string) {} + + async search(query: string, limit: number, filters?: unknown) { + return searchMocks.search(query, limit, filters); + } + } + + return { CodebaseSearcher }; +}); + +describe('search_codebase scoped memories', () => { + let tempRoot: string; + let ctx: ToolContext; + + beforeEach(async () => { + searchMocks.search.mockReset(); + tempRoot = await fs.mkdtemp(path.join(os.tmpdir(), 'search-scoped-memory-')); + + const contextDir = path.join(tempRoot, '.codebase-context'); + await fs.mkdir(contextDir, { recursive: true }); + + ctx = { + indexState: { status: 'ready' }, + paths: { + baseDir: contextDir, + memory: path.join(contextDir, 'memory.json'), + intelligence: path.join(contextDir, 'intelligence.json'), + keywordIndex: path.join(contextDir, 'index.json'), + vectorDb: path.join(contextDir, 'index') + }, + rootPath: tempRoot, + performIndexing: () => undefined + }; + + await fs.writeFile( + ctx.paths.memory, + JSON.stringify( + [ + { + id: 'scoped-memory', + type: 'gotcha', + category: 'architecture', + memory: 'Avoid direct token reads', + reason: 'They bypass AuthService refresh logic.', + date: '2026-04-17T00:00:00.000Z', + scope: { + kind: 'symbol', + file: 'src/auth/auth.service.ts', + symbol: 'AuthService' + } + }, + { + id: 'global-memory', + type: 'decision', + category: 'architecture', + memory: 'Use auth interceptors', + reason: 'They keep HTTP token injection consistent.', + date: '2026-04-17T00:00:00.000Z' + } + ], + null, + 2 + ) + ); + + await fs.writeFile( + ctx.paths.intelligence, + JSON.stringify( + { + header: { buildId: 'build-1', formatVersion: 1 }, + generatedAt: '2026-04-17T00:00:00.000Z', + patterns: { + stateManagement: { + primary: { + name: 'Signals', + frequency: '78%', + trend: 'Stable' + } + } + }, + goldenFiles: [{ file: 'src/auth/auth.service.ts', score: 0.97 }] + }, + null, + 2 + ) + ); + }); + + afterEach(async () => { + await fs.rm(tempRoot, { recursive: true, force: true }); + }); + + it('prioritizes scoped memories in search output', async () => { + searchMocks.search.mockResolvedValueOnce([ + { + summary: 'Auth service token management', + snippet: 'export class AuthService { getToken() { return token; } }', + filePath: 'src/auth/auth.service.ts', + startLine: 1, + endLine: 20, + score: 0.91, + language: 'ts', + metadata: { symbolName: 'AuthService', symbolKind: 'class', symbolPath: ['AuthService'] }, + relevanceReason: 'Matches auth service token query' + } + ]); + + const response = await handle({ query: 'auth service token', intent: 'edit' }, ctx); + const payload = JSON.parse(response.content?.[0]?.text ?? '{}') as { + relatedMemories?: string[]; + }; + + expect(payload.relatedMemories?.[0]).toContain('src/auth/auth.service.ts#AuthService'); + }); +}); diff --git a/tests/tools/dispatch.test.ts b/tests/tools/dispatch.test.ts index b180d3d..d51f313 100644 --- a/tests/tools/dispatch.test.ts +++ b/tests/tools/dispatch.test.ts @@ -3,8 +3,8 @@ import { TOOLS, dispatchTool } from '../../src/tools/index.js'; import type { ToolContext } from '../../src/tools/types.js'; describe('Tool Dispatch', () => { - it('exports all 10 tools', () => { - expect(TOOLS.length).toBe(10); + it('exports all 11 tools', () => { + expect(TOOLS.length).toBe(11); expect(TOOLS.map((t) => t.name)).toEqual([ 'search_codebase', 'get_codebase_metadata', @@ -15,7 +15,8 @@ describe('Tool Dispatch', () => { 'get_symbol_references', 'detect_circular_dependencies', 'remember', - 'get_memory' + 'get_memory', + 'get_codebase_health' ]); }); @@ -59,6 +60,7 @@ describe('Tool Dispatch', () => { baseDir: '/tmp', memory: '/tmp/memory.jsonl', intelligence: '/tmp/intelligence.json', + health: '/tmp/health.json', keywordIndex: '/tmp/index.json', vectorDb: '/tmp/vector-db' }, @@ -80,6 +82,7 @@ describe('Tool Dispatch', () => { baseDir: '/tmp', memory: '/tmp/memory.jsonl', intelligence: '/tmp/intelligence.json', + health: '/tmp/health.json', keywordIndex: '/tmp/index.json', vectorDb: '/tmp/vector-db' }, diff --git a/tests/zombie-guard.test.ts b/tests/zombie-guard.test.ts index 2061744..f286d6e 100644 --- a/tests/zombie-guard.test.ts +++ b/tests/zombie-guard.test.ts @@ -10,10 +10,16 @@ import { describe, it, expect, beforeAll } from 'vitest'; import { spawn } from 'node:child_process'; -import { existsSync } from 'node:fs'; +import { existsSync, mkdirSync, mkdtempSync, writeFileSync } from 'node:fs'; import path from 'node:path'; import os from 'node:os'; import { fileURLToPath } from 'node:url'; +import { Client } from '@modelcontextprotocol/sdk/client/index.js'; +import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js'; +import { + CODEBASE_CONTEXT_DIRNAME, + KEYWORD_INDEX_FILENAME +} from '../src/constants/codebase-context.js'; const __dirname = path.dirname(fileURLToPath(import.meta.url)); const ENTRY_POINT = path.resolve(__dirname, '..', 'dist', 'index.js'); @@ -51,6 +57,53 @@ function spawnServer( }); } +function isProcessAlive(pid: number): boolean { + try { + process.kill(pid, 0); + return true; + } catch (error) { + return (error as NodeJS.ErrnoException).code !== 'ESRCH'; + } +} + +async function waitForProcessExit(pid: number, timeoutMs = 5000): Promise { + const deadline = Date.now() + timeoutMs; + while (Date.now() < deadline) { + if (!isProcessAlive(pid)) { + return; + } + await new Promise((resolve) => setTimeout(resolve, 50)); + } + throw new Error(`Process ${pid} still alive after ${timeoutMs}ms`); +} + +async function connectClient( + args: string[], + env: Record = {} +): Promise<{ client: Client; transport: StdioClientTransport; pid: number }> { + const transport = new StdioClientTransport({ + command: process.execPath, + args: [ENTRY_POINT, ...args], + env: { ...process.env, ...env } + }); + const client = new Client({ name: 'zombie-guard-test', version: '1.0.0' }); + await client.connect(transport); + + if (transport.pid === null) { + throw new Error('Expected stdio transport pid after initialize'); + } + + return { client, transport, pid: transport.pid }; +} + +function createIdleTestProjectRoot(): string { + const rootPath = mkdtempSync(path.join(os.tmpdir(), 'codebase-context-idle-')); + const contextDir = path.join(rootPath, CODEBASE_CONTEXT_DIRNAME); + mkdirSync(contextDir, { recursive: true }); + writeFileSync(path.join(contextDir, KEYWORD_INDEX_FILENAME), '{}', 'utf8'); + return rootPath; +} + describe('zombie process prevention', () => { beforeAll(() => { if (!existsSync(ENTRY_POINT)) { @@ -119,4 +172,34 @@ describe('zombie process prevention', () => { expect(elapsed).toBeGreaterThan(800); expect(elapsed).toBeLessThan(8_000); }, 12_000); + + it('exits after post-initialize idle timeout when the client stays silent', async () => { + const rootPath = createIdleTestProjectRoot(); + const { client, pid } = await connectClient([rootPath], { + CODEBASE_CONTEXT_STDIO_IDLE_TIMEOUT_MS: '1000' + }); + + expect(isProcessAlive(pid)).toBe(true); + await waitForProcessExit(pid, 6000); + await client.close().catch(() => undefined); + }, 12_000); + + it('resets the idle timer when MCP requests keep arriving', async () => { + const rootPath = createIdleTestProjectRoot(); + const { client, pid } = await connectClient([rootPath], { + CODEBASE_CONTEXT_STDIO_IDLE_TIMEOUT_MS: '1500' + }); + + await new Promise((resolve) => setTimeout(resolve, 700)); + expect(isProcessAlive(pid)).toBe(true); + + const tools = await client.listTools(); + expect(tools.tools.length).toBeGreaterThan(0); + + await new Promise((resolve) => setTimeout(resolve, 700)); + expect(isProcessAlive(pid)).toBe(true); + + await waitForProcessExit(pid, 6000); + await client.close().catch(() => undefined); + }, 15_000); });