fix(architecture): "overview" compact subset + aspects schema enum + server-side validation#560
Open
Kirchlive wants to merge 3 commits into
Open
Conversation
`aspects=["overview"]` silently returned only `{total_nodes, total_edges}`
because "overview" was never registered in either guard:
- `want_aspect` (src/store/store.c) skipped all DB sub-queries
- `aspect_wanted` (src/mcp/mcp.c) skipped all JSON serialization
The JSON schema (mcp.c) accepts `aspects` as free-form strings without
an enum, so "overview" was silently accepted and produced an empty
result instead of a validation error.
This 2-line patch treats "overview" as a synonym for "all" in both
guards, restoring useful output for the most natural exploratory aspect
name.
Follow-up (not in this PR): add an enum constraint to the `aspects`
JSON schema so unknown aspect strings raise a validation error instead
of returning empty.
…pects schema enum Following review feedback on the initial alias approach: making "overview" identical to "all" still produces the same ~30k-char overflow payload (file_tree alone is ~275 entries). A useful "overview" needs to be a true compact subset. Changes: 1. "overview" is now a compact subset = every aspect EXCEPT file_tree. A small static helper `aspect_in_overview(name)` (in both store.c and mcp.c) gates which aspects "overview" expands to. Replacing it with a per-aspect block-list (e.g. also dropping clusters) is trivial later. 2. The aspects parameter now has a JSON Schema enum, so unknown aspect strings raise a validation error instead of being silently accepted (the previous footgun that masked the alias-not-registered bug for so long). Behavior: aspects=["all"] → unchanged, includes file_tree aspects=["overview"] → all aspects EXCEPT file_tree, stays under cap aspects=["bogus"] → validation error (was: silent empty response) aspects omitted → unchanged (= all)
The JSON-Schema enum added in the previous commit is advisory: it only fires when the MCP client validates client-side before sending. Many clients (including Claude Code) do not, so an unknown aspect like ["bogus"] still produced a silent-empty response — exactly the footgun that hid the original alias-not-registered bug. This commit adds the missing authoritative layer: server-side validation in handle_get_architecture(). A single SSOT array VALID_ASPECTS[] backs both the enum (advisory, client-side) and aspect_is_valid() (authoritative, server-side). Any future aspect addition stays in sync by editing one array. On an unknown aspect, the server returns isError:true with a model-friendly message listing the valid values, so an agent can self-correct without re-prompting: Unknown aspect 'bogus'. Valid: all, overview, structure, dependencies, routes, languages, packages, entry_points, hotspots, boundaries, layers, file_tree, clusters. Defense in depth — clients that DO validate still get a clean client-side error; clients that don't are now caught by the server. Cleanup: properly frees project and aspects_doc on the early-error path.
Author
|
v3 summary — the PR went through three iterations; the final state to review is the combination of both commits:
All four behaviors hand-verified after binary reload: |
Owner
|
Thanks @Kirchlive — the
A small unit test for the bogus-aspect → error path would also be welcome. 🙏 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
get_architecture(aspects=["overview"])previously returned only{total_nodes, total_edges}—"overview"was never registered in either aspect guard, and the JSON schema accepted any string (no enum), so the typo / unregistered token was silently accepted and produced an empty payload.This PR turns that footgun into a useful, validated tool.
What this PR does (3 layers)
1.
"overview"is a useful compact subset, not an alias to"all"The naive fix (treat
"overview"as a synonym for"all") still produces the ~30k-char overflow payload —file_treealone is ~275 entries on a mid-size repo, pushing the response pastMAX_MCP_OUTPUT_TOKENS."overview"now expands to every aspect exceptfile_tree, gated by a small static helper present in both files:src/store/store.c(gates DB sub-queries) andsrc/mcp/mcp.c(gates JSON serialization) both use it. Extending the block-list (e.g. also dropclusters) is one line.2. JSON-Schema enum on
aspects(advisory, client-side)Self-documenting and gives well-behaved MCP clients a clean client-side error on typos.
3. Server-side validation (authoritative)
The enum alone is advisory — many MCP clients (Claude Code included) do not validate against tool schemas before sending. Verified empirically: with only the enum in place,
aspects=["bogus"]still produced silent-empty.Server-side validation closes the loop:
In
handle_get_architecture, after parsing the aspects array, each token is checked againstVALID_ASPECTS. On the first unknown token the handler returnsisError: truewith a model-friendly message listing the valid values, so an agent can self-correct without re-prompting:Defense in depth: client-side enum catches well-behaved clients early; server-side
aspect_is_validcatches everyone else.Behavior matrix
aspectsaspects=["all"]file_treeaspects=["overview"]{total_nodes,total_edges}file_tree, stays under capaspects=["hotspots"]aspects=["bogus"]{total_nodes,total_edges}isError: truewith valid-values listLocally verified
Built without libgit2 (
make -f Makefile.cbm cbm LIBGIT2_FLAGS= LIBGIT2_LIBS=), installed to~/.local/bin/codebase-memory-mcp. Hand-tests on an ~8k-node project after binary reload:["overview"]→ 13 keys, nofile_tree, comfortably under output cap["all"]→ full payload incl.file_tree(unchanged)["hotspots"]→ only hotspots (unchanged)["bogus"]→ server returns the validation error string aboveCommits
refactor(get_architecture): make "overview" a compact aspect subset + add aspects schema enumfeat(get_architecture): server-side aspect validation (defense in depth)(Both are kept separate so the subset semantics and the validation layer can be reviewed independently.)
Test plan
get_architecture(aspects=["overview"])→ rich payload withoutfile_tree, fits under the MCP token capget_architecture(aspects=["all"])→ unchanged, still includesfile_treeget_architecture(aspects=["bogus"])→isError: truewith valid-values listget_architecture(aspects=["hotspots"])→ onlyhotspotsget_architecture()(noaspects) → unchanged, returns all