Skip to content

fix(architecture): "overview" compact subset + aspects schema enum + server-side validation#560

Open
Kirchlive wants to merge 3 commits into
DeusData:mainfrom
Kirchlive:fix/get-architecture-overview-alias
Open

fix(architecture): "overview" compact subset + aspects schema enum + server-side validation#560
Kirchlive wants to merge 3 commits into
DeusData:mainfrom
Kirchlive:fix/get-architecture-overview-alias

Conversation

@Kirchlive

@Kirchlive Kirchlive commented Jun 22, 2026

Copy link
Copy Markdown

Summary

get_architecture(aspects=["overview"]) previously returned only {total_nodes, total_edges}"overview" was never registered in either aspect guard, and the JSON schema accepted any string (no enum), so the typo / unregistered token was silently accepted and produced an empty payload.

This PR turns that footgun into a useful, validated tool.

What this PR does (3 layers)

1. "overview" is a useful compact subset, not an alias to "all"

The naive fix (treat "overview" as a synonym for "all") still produces the ~30k-char overflow payload — file_tree alone is ~275 entries on a mid-size repo, pushing the response past MAX_MCP_OUTPUT_TOKENS.

"overview" now expands to every aspect except file_tree, gated by a small static helper present in both files:

/* "overview" = compact architecture summary: every aspect EXCEPT the large
   per-file listing (file_tree), which alone dominates the payload (~275 entries)
   and pushes the response past MAX_MCP_OUTPUT_TOKENS. */
static bool aspect_in_overview(const char *name) {
    return strcmp(name, "file_tree") != 0;
}

src/store/store.c (gates DB sub-queries) and src/mcp/mcp.c (gates JSON serialization) both use it. Extending the block-list (e.g. also drop clusters) is one line.

2. JSON-Schema enum on aspects (advisory, client-side)

"aspects": {
  "type": "array",
  "items": {
    "type": "string",
    "enum": ["all","overview","structure","dependencies","routes","languages",
             "packages","entry_points","hotspots","boundaries","layers",
             "file_tree","clusters"]
  },
  "description": "Aspects to include. 'all' = everything; 'overview' = compact summary (all except file_tree); omit = all."
}

Self-documenting and gives well-behaved MCP clients a clean client-side error on typos.

3. Server-side validation (authoritative)

The enum alone is advisory — many MCP clients (Claude Code included) do not validate against tool schemas before sending. Verified empirically: with only the enum in place, aspects=["bogus"] still produced silent-empty.

Server-side validation closes the loop:

/* Canonical list of valid aspect tokens for get_architecture.
 * SSOT consumed by both the JSON-Schema enum (advisory, client-side)
 * and the server-side validation (authoritative). Update both together
 * if the aspect set changes. */
static const char *VALID_ASPECTS[] = {
    "all", "overview",
    "structure", "dependencies", "routes",
    "languages", "packages", "entry_points",
    "hotspots", "boundaries", "layers",
    "file_tree", "clusters",
    NULL
};

In handle_get_architecture, after parsing the aspects array, each token is checked against VALID_ASPECTS. On the first unknown token the handler returns isError: true with a model-friendly message listing the valid values, so an agent can self-correct without re-prompting:

Unknown aspect 'bogus'. Valid: all, overview, structure, dependencies, routes, languages, packages, entry_points, hotspots, boundaries, layers, file_tree, clusters.

Defense in depth: client-side enum catches well-behaved clients early; server-side aspect_is_valid catches everyone else.

Behavior matrix

Call Before PR After PR
omit aspects full payload (=all) unchanged
aspects=["all"] full payload incl. file_tree unchanged
aspects=["overview"] silent {total_nodes,total_edges} full payload minus file_tree, stays under cap
aspects=["hotspots"] only hotspots unchanged
aspects=["bogus"] silent {total_nodes,total_edges} isError: true with valid-values list

Locally verified

Built without libgit2 (make -f Makefile.cbm cbm LIBGIT2_FLAGS= LIBGIT2_LIBS=), installed to ~/.local/bin/codebase-memory-mcp. Hand-tests on an ~8k-node project after binary reload:

  • ["overview"] → 13 keys, no file_tree, comfortably under output cap
  • ["all"] → full payload incl. file_tree (unchanged)
  • ["hotspots"] → only hotspots (unchanged)
  • ["bogus"] → server returns the validation error string above

Commits

  1. refactor(get_architecture): make "overview" a compact aspect subset + add aspects schema enum
  2. feat(get_architecture): server-side aspect validation (defense in depth)

(Both are kept separate so the subset semantics and the validation layer can be reviewed independently.)

Test plan

  • get_architecture(aspects=["overview"]) → rich payload without file_tree, fits under the MCP token cap
  • get_architecture(aspects=["all"]) → unchanged, still includes file_tree
  • get_architecture(aspects=["bogus"])isError: true with valid-values list
  • get_architecture(aspects=["hotspots"]) → only hotspots
  • get_architecture() (no aspects) → unchanged, returns all

`aspects=["overview"]` silently returned only `{total_nodes, total_edges}`
because "overview" was never registered in either guard:
- `want_aspect` (src/store/store.c) skipped all DB sub-queries
- `aspect_wanted` (src/mcp/mcp.c) skipped all JSON serialization

The JSON schema (mcp.c) accepts `aspects` as free-form strings without
an enum, so "overview" was silently accepted and produced an empty
result instead of a validation error.

This 2-line patch treats "overview" as a synonym for "all" in both
guards, restoring useful output for the most natural exploratory aspect
name.

Follow-up (not in this PR): add an enum constraint to the `aspects`
JSON schema so unknown aspect strings raise a validation error instead
of returning empty.
…pects schema enum

Following review feedback on the initial alias approach: making "overview"
identical to "all" still produces the same ~30k-char overflow payload
(file_tree alone is ~275 entries). A useful "overview" needs to be a true
compact subset.

Changes:

1. "overview" is now a compact subset = every aspect EXCEPT file_tree.
   A small static helper `aspect_in_overview(name)` (in both store.c and
   mcp.c) gates which aspects "overview" expands to. Replacing it with a
   per-aspect block-list (e.g. also dropping clusters) is trivial later.

2. The aspects parameter now has a JSON Schema enum, so unknown aspect
   strings raise a validation error instead of being silently accepted
   (the previous footgun that masked the alias-not-registered bug for so
   long).

Behavior:
  aspects=["all"]      → unchanged, includes file_tree
  aspects=["overview"] → all aspects EXCEPT file_tree, stays under cap
  aspects=["bogus"]    → validation error (was: silent empty response)
  aspects omitted      → unchanged (= all)
@Kirchlive Kirchlive changed the title fix(get_architecture): treat "overview" as alias for "all" fix(architecture): make "overview" a compact aspect subset + validate aspects enum Jun 22, 2026
The JSON-Schema enum added in the previous commit is advisory: it only
fires when the MCP client validates client-side before sending. Many
clients (including Claude Code) do not, so an unknown aspect like
["bogus"] still produced a silent-empty response — exactly the footgun
that hid the original alias-not-registered bug.

This commit adds the missing authoritative layer: server-side validation
in handle_get_architecture(). A single SSOT array VALID_ASPECTS[]
backs both the enum (advisory, client-side) and aspect_is_valid()
(authoritative, server-side). Any future aspect addition stays in sync
by editing one array.

On an unknown aspect, the server returns isError:true with a
model-friendly message listing the valid values, so an agent can
self-correct without re-prompting:

  Unknown aspect 'bogus'. Valid: all, overview, structure,
  dependencies, routes, languages, packages, entry_points,
  hotspots, boundaries, layers, file_tree, clusters.

Defense in depth — clients that DO validate still get a clean
client-side error; clients that don't are now caught by the server.

Cleanup: properly frees project and aspects_doc on the early-error
path.
@Kirchlive Kirchlive changed the title fix(architecture): make "overview" a compact aspect subset + validate aspects enum fix(architecture): "overview" compact subset + aspects schema enum + server-side validation Jun 22, 2026
@Kirchlive

Copy link
Copy Markdown
Author

v3 summary — the PR went through three iterations; the final state to review is the combination of both commits:

  1. aspects=["overview"] is a compact subset (every aspect except file_tree, which alone dominates the payload). Not an alias to ["all"]. ["all"] is unchanged.
  2. JSON-Schema enum on aspects (advisory, client-side).
  3. Server-side validation in handle_get_architecture: unknown aspects return isError: true with the valid-values list — the authoritative layer, since many MCP clients (Claude Code included) don't validate against tool schemas before sending. Both the enum and the validator are backed by a single VALID_ASPECTS[] SSOT array, so future additions stay in sync.

All four behaviors hand-verified after binary reload: ["overview"] 13 keys minus file_tree, ["all"] unchanged with file_tree, ["hotspots"] only hotspots, ["bogus"] → server validation error.

@DeusData

Copy link
Copy Markdown
Owner

Thanks @Kirchlive — the overview alias plus the aspect validation look good. Two CI items to clear before merge:

  1. lint is failing — please run make -f Makefile.cbm lint-ci locally and fix what it reports.
  2. DCO — the commit author email must match the Signed-off-by email (git commit --amend -s, or git rebase --signoff).

A small unit test for the bogus-aspect → error path would also be welcome. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants