M1 surpass by win4r · Pull Request #569 · DeusData/codebase-memory-mcp

win4r · 2026-06-23T09:27:51Z

What does this PR do?

Checklist

Every commit is signed off (git commit -s) — required, CI rejects
unsigned commits (DCO, see CONTRIBUTING.md)
Tests pass locally (make -f Makefile.cbm test)
Lint passes (make -f Makefile.cbm lint-ci)
New behavior is covered by a test (reproduce-first for bug fixes)

A node group variable carried through a WITH aggregation (e.g. `WITH g, count(*) AS c RETURN g.file_path`) returned blank for every property except its name: the carried virtual binding held only the group key (the node's name) and lacked a store handle, so node_prop() could neither read other fields nor compute degrees. Fix: capture the node id of a bare node group-var in with_agg_find_or_create and tag the carried virtual binding with it; in node_prop(), when such a stub (id set, string fields unpopulated) is asked for a missing property, re-fetch the full node via cbm_store_find_node_by_id and project it. Also propagate the store onto virtual bindings so node_prop can re-fetch and compute degrees. The stub gate is heuristic but never yields a wrong value — worst case is one redundant indexed lookup. Adds regression test cypher_exec_with_node_groupvar_prop. Signed-off-by: Kris Kersey <kris@kerseyfabrications.com> (cherry picked from commit 8b03974)

…esults MATCH (c:Class)-[:DEFINES_METHOD]->(m:Method) returned at most 10 results for any class, regardless of how many methods it actually has. Root cause: bind_cap was set to scan_count (the number of nodes matched in the initial pattern — typically 1 when querying a single class by name). max_new = bind_cap * 10 = 10, so the edge expansion loop exited after collecting 10 results. No error, no warning, no truncation indicator. This is language-agnostic: any class with more than 10 methods in any language was silently truncated. The fix is two characters: bind_cap = scan_count > max_rows ? scan_count : max_rows Regression test: a Python class with 15 methods must return all 15 via MATCH (c:Class)-[:DEFINES_METHOD]->(m:Method) with label filtering. Signed-off-by: Thomas Dyar <tdyar@intersystems.com> (cherry picked from commit c43fc8d)

A call carrying enough long arguments drove append_args_json()'s running position past the fixed CBM_SZ_2K `props` stack buffer in emit_normal_calls_edge(): format_call_arg() returns snprintf's *untruncated* length, so `pos += (size_t)n` could exceed `bufsize`, after which the trailing `buf[pos] = '\0'` (and `buf[pos++] = ']'`) wrote out of bounds. The stack canary caught it as SIGABRT, so full-repo indexing of large TypeScript codebases crashed the server in the parallel resolve pass (emit_service_edge -> emit_normal_calls_edge -> finalize_and_emit -> append_args_json). Confirmed with AddressSanitizer: stack-buffer-overflow WRITE at pass_parallel.c:1124, 'props' (2048 B). Fix: when an argument does not fully fit, roll back to before its separator and stop appending (atomic field, matching append_json_string's behaviour), so `pos` can never advance past the buffer. Add regression test parallel_args_json_no_overflow: indexes a fixture whose single call carries 60 long string args (args JSON well past 2 KB); under the ASan test build it aborts without this fix and passes with it. Signed-off-by: Andrius Skerla <1492322+rainder@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 74d15a6)

Signed-off-by: Saurav Kumar <sauravsk2507@gmail.com> (cherry picked from commit c3a1a79)

git_allocator moved out of the top-level git2.h into git2/sys/alloc.h in libgit2 1.8.0. Add an explicit include so the mimalloc binding compiles against libgit2 >= 1.8 (e.g. MacPorts libgit2 1.9.4). (cherry picked from commit 586fc8a)

manage_adr stores ADRs in project_summaries, but a full re-index (triggered by file changes or new files) deletes the DB in try_incremental_or_delete_db and rebuilds it from the graph buffer, which writes an empty project_summaries table. file_hashes were re-persisted after the rebuild but project_summaries were not, so the ADR was silently lost. Fix: capture the ADR before the DB is unlinked, stash it on the pipeline struct, and restore it after the rebuilt DB is reopened in dump_and_persist_hashes. The incremental path is unaffected (it never rewrites the DB). Verified: ADR now survives a full re-index. Signed-off-by: RithvikReddy0-0 <rithvikreddymukkara@gmail.com> (cherry picked from commit 7b6c063)

detect_changes advertised a `since` parameter in its inputSchema but the handler never read it — it always diffed against base_branch (default "main"), so detect_changes(since="HEAD~10") silently returned the wrong or empty result when HEAD was on the default branch. Fix: read `since` and, when present, route it through base_branch so the existing shell-arg validation (cbm_validate_shell_arg) and the `<base>...HEAD` diff apply unchanged; `since` takes precedence over base_branch. Also narrows the schema description — the prior "date" form (e.g. 2026-01-01) is not a revision and never worked through this path — and documents the inherited three-dot semantics. Adds regression tests tool_detect_changes_since and tool_detect_changes_since_precedence. Refs DeusData#371 Signed-off-by: Kris Kersey <kris@kerseyfabrications.com> (cherry picked from commit 53501b0)

trace_path resolved a function_name from the first row of an unordered name query with no ambiguity check, so a same-named entity (e.g. a shell script's main()) could silently shadow the intended C main(). get_code_snippet reported "ambiguous" for a short name even when one match was the obvious definition (the .c body vs a .h declaration). Fix: add a deterministic resolution ranking — a callable label outranks a module, then the larger definition by line span wins, preferring a real definition without hardcoding file extensions — and a picker that flags a genuine tie. trace_path now traces the preferred node and returns the existing ambiguous-suggestions response on a true tie instead of silently taking nodes[0]; get_code_snippet resolves directly to the preferred match, reporting ambiguity only for real ties. Adds regression tests tool_trace_call_path_ambiguous and tool_trace_call_path_prefers_definition. Signed-off-by: Kris Kersey <kris@kerseyfabrications.com> (cherry picked from commit 382dc24)

Signed-off-by: King Star <mcxin.y@gmail.com> (cherry picked from commit 935027a)

Mark this as a community fork of DeusData/codebase-memory-mcp (MIT, © 2025 DeusData) and list the integrated incremental-reindex fix (DeusData#528) plus the 9 cherry-picked upstream PRs (DeusData#465 DeusData#412 DeusData#475 DeusData#527 DeusData#512 DeusData#539 DeusData#464 DeusData#466 DeusData#526). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: win4r <win4r@outlook.com>

Re-runnable deterministic comparison (dup-nodes, kinds, call-graph parity). Baseline on LingoLearn: cbm dup_nodes=38, Swift-type-kinds=1 vs codegraph 0/5. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…unction push_class_body_children's body-container list had drifted from extract_class_def's and lacked enum_class_body/protocol_body, so Swift enum/protocol members were re-walked and emitted as spurious top-level Functions (38 dup-nodes on LingoLearn). Route those bodies through the nested-class path. dup_nodes 38->0; real Methods + their CALLS edges unchanged (review keeps 7 callers). Adds regression test. WS2a of the M1 'surpass codegraph' track. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…own) WS1 of the M1 'surpass codegraph' track. New `explore` MCP tool composes the existing resolve / cbm_store_bfs / resolve_snippet_source / batch_count_degrees internals into ONE agent-ergonomic call returning markdown: blast-radius (attributed callers) + verbatim line-numbered source grouped by file, with inline fan-in hotspot flags and a query_graph (cypher) escape-hatch footer. Matches codegraph's explore and exceeds it (precise caller attribution + hotspots + cypher, which codegraph's explore lacks). Adversarially reviewed (5 lenses, each finding refuted against the code); memory-safety clean. Fixed all 3 confirmed honesty/silent-truncation findings: clamp depth>=1 + honest 'within N hops' label for depth>1; elision marker when a body exceeds 160 lines; cap notice when >16 query terms. Adds tests: explore-in-tools/list (schema validity) + 2 error-path guards. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…odegraph 79) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

DeusData · 2026-06-23T21:10:41Z

Thanks @win4r — building on my note on #567: the genuinely new and useful parts here are the read-only explore MCP tool and the Swift duplicate-node fix — both look good and safe.

The blockers are the same as #567: the README rebrand to a personal fork, the bench/headtohead.sh harness (it hardcodes /Users/charlesqin/... and an external codegraph dependency, so it can't live upstream), the empty description, and unsigned commits.

Could you resubmit just the useful code — e.g. the explore tool as its own focused PR and the Swift fix as another — without the README rebrand or bench/, with signed-off commits (git commit -s) and a short description? Happy to review and merge those — the code is genuinely good; it just needs to come in clean (no rebrand/bench) and with CI green (signed-off commits, lint and tests passing). 🙏

KerseyFabrications and others added 14 commits June 20, 2026 23:09

fix(foundation): properly escape JSON control characters as \u00XX

e9f1628

Signed-off-by: Saurav Kumar <sauravsk2507@gmail.com> (cherry picked from commit c3a1a79)

fix(mcp): return valid UTF-8 snippets

eceeb40

Signed-off-by: King Star <mcxin.y@gmail.com> (cherry picked from commit 935027a)

bench(phase0): head-to-head harness + baseline vs codegraph

8df7218

Re-runnable deterministic comparison (dup-nodes, kinds, call-graph parity). Baseline on LingoLearn: cbm dup_nodes=38, Swift-type-kinds=1 vs codegraph 0/5. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

bench: record M1 results (dup 38->0, explore 1-call; cbm-pro ~85 vs c…

55ef188

…odegraph 79) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

DeusData mentioned this pull request Jun 23, 2026

M2 idiomatic kinds #568

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M1 surpass#569

M1 surpass#569
win4r wants to merge 14 commits into
DeusData:mainfrom
win4r:m1-surpass

win4r commented Jun 23, 2026

Uh oh!

DeusData commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

win4r commented Jun 23, 2026

What does this PR do?

Checklist

Uh oh!

DeusData commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants