Skip to content

M1 surpass#569

Open
win4r wants to merge 14 commits into
DeusData:mainfrom
win4r:m1-surpass
Open

M1 surpass#569
win4r wants to merge 14 commits into
DeusData:mainfrom
win4r:m1-surpass

Conversation

@win4r

@win4r win4r commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Checklist

  • Every commit is signed off (git commit -s) — required, CI rejects
    unsigned commits (DCO, see CONTRIBUTING.md)
  • Tests pass locally (make -f Makefile.cbm test)
  • Lint passes (make -f Makefile.cbm lint-ci)
  • New behavior is covered by a test (reproduce-first for bug fixes)

KerseyFabrications and others added 14 commits June 20, 2026 23:09
A node group variable carried through a WITH aggregation
(e.g. `WITH g, count(*) AS c RETURN g.file_path`) returned blank for every
property except its name: the carried virtual binding held only the group
key (the node's name) and lacked a store handle, so node_prop() could
neither read other fields nor compute degrees.
Fix: capture the node id of a bare node group-var in with_agg_find_or_create
and tag the carried virtual binding with it; in node_prop(), when such a stub
(id set, string fields unpopulated) is asked for a missing property, re-fetch
the full node via cbm_store_find_node_by_id and project it. Also propagate
the store onto virtual bindings so node_prop can re-fetch and compute
degrees. The stub gate is heuristic but never yields a wrong value — worst
case is one redundant indexed lookup. Adds regression test
cypher_exec_with_node_groupvar_prop.

Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
(cherry picked from commit 8b03974)
…esults

MATCH (c:Class)-[:DEFINES_METHOD]->(m:Method) returned at most 10 results
for any class, regardless of how many methods it actually has.

Root cause: bind_cap was set to scan_count (the number of nodes matched in
the initial pattern — typically 1 when querying a single class by name).
max_new = bind_cap * 10 = 10, so the edge expansion loop exited after
collecting 10 results. No error, no warning, no truncation indicator.

This is language-agnostic: any class with more than 10 methods in any
language was silently truncated. The fix is two characters:
  bind_cap = scan_count > max_rows ? scan_count : max_rows

Regression test: a Python class with 15 methods must return all 15 via
MATCH (c:Class)-[:DEFINES_METHOD]->(m:Method) with label filtering.

Signed-off-by: Thomas Dyar <tdyar@intersystems.com>
(cherry picked from commit c43fc8d)
A call carrying enough long arguments drove append_args_json()'s running
position past the fixed CBM_SZ_2K `props` stack buffer in
emit_normal_calls_edge(): format_call_arg() returns snprintf's *untruncated*
length, so `pos += (size_t)n` could exceed `bufsize`, after which the
trailing `buf[pos] = '\0'` (and `buf[pos++] = ']'`) wrote out of bounds. The
stack canary caught it as SIGABRT, so full-repo indexing of large TypeScript
codebases crashed the server in the parallel resolve pass
(emit_service_edge -> emit_normal_calls_edge -> finalize_and_emit ->
append_args_json). Confirmed with AddressSanitizer:
stack-buffer-overflow WRITE at pass_parallel.c:1124, 'props' (2048 B).

Fix: when an argument does not fully fit, roll back to before its separator
and stop appending (atomic field, matching append_json_string's behaviour),
so `pos` can never advance past the buffer.

Add regression test parallel_args_json_no_overflow: indexes a fixture whose
single call carries 60 long string args (args JSON well past 2 KB); under the
ASan test build it aborts without this fix and passes with it.

Signed-off-by: Andrius Skerla <1492322+rainder@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit 74d15a6)
Signed-off-by: Saurav Kumar <sauravsk2507@gmail.com>
(cherry picked from commit c3a1a79)
git_allocator moved out of the top-level git2.h into git2/sys/alloc.h
in libgit2 1.8.0. Add an explicit include so the mimalloc binding
compiles against libgit2 >= 1.8 (e.g. MacPorts libgit2 1.9.4).

(cherry picked from commit 586fc8a)
manage_adr stores ADRs in project_summaries, but a full re-index
(triggered by file changes or new files) deletes the DB in
try_incremental_or_delete_db and rebuilds it from the graph buffer,
which writes an empty project_summaries table. file_hashes were
re-persisted after the rebuild but project_summaries were not, so the
ADR was silently lost.

Fix: capture the ADR before the DB is unlinked, stash it on the
pipeline struct, and restore it after the rebuilt DB is reopened in
dump_and_persist_hashes. The incremental path is unaffected (it never
rewrites the DB). Verified: ADR now survives a full re-index.

Signed-off-by: RithvikReddy0-0 <rithvikreddymukkara@gmail.com>
(cherry picked from commit 7b6c063)
detect_changes advertised a `since` parameter in its inputSchema but the
handler never read it — it always diffed against base_branch (default
"main"), so detect_changes(since="HEAD~10") silently returned the wrong or
empty result when HEAD was on the default branch.

Fix: read `since` and, when present, route it through base_branch so the
existing shell-arg validation (cbm_validate_shell_arg) and the
`<base>...HEAD` diff apply unchanged; `since` takes precedence over
base_branch. Also narrows the schema description — the prior "date" form
(e.g. 2026-01-01) is not a revision and never worked through this path — and
documents the inherited three-dot semantics. Adds regression tests
tool_detect_changes_since and tool_detect_changes_since_precedence.

Refs DeusData#371

Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
(cherry picked from commit 53501b0)
trace_path resolved a function_name from the first row of an unordered name
query with no ambiguity check, so a same-named entity (e.g. a shell script's
main()) could silently shadow the intended C main(). get_code_snippet
reported "ambiguous" for a short name even when one match was the obvious
definition (the .c body vs a .h declaration).

Fix: add a deterministic resolution ranking — a callable label outranks a
module, then the larger definition by line span wins, preferring a real
definition without hardcoding file extensions — and a picker that flags a
genuine tie. trace_path now traces the preferred node and returns the
existing ambiguous-suggestions response on a true tie instead of silently
taking nodes[0]; get_code_snippet resolves directly to the preferred match,
reporting ambiguity only for real ties. Adds regression tests
tool_trace_call_path_ambiguous and tool_trace_call_path_prefers_definition.

Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
(cherry picked from commit 382dc24)
Signed-off-by: King Star <mcxin.y@gmail.com>
(cherry picked from commit 935027a)
Mark this as a community fork of DeusData/codebase-memory-mcp (MIT, © 2025
DeusData) and list the integrated incremental-reindex fix (DeusData#528) plus the
9 cherry-picked upstream PRs (DeusData#465 DeusData#412 DeusData#475 DeusData#527 DeusData#512 DeusData#539 DeusData#464 DeusData#466 DeusData#526).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: win4r <win4r@outlook.com>
Re-runnable deterministic comparison (dup-nodes, kinds, call-graph parity).
Baseline on LingoLearn: cbm dup_nodes=38, Swift-type-kinds=1 vs codegraph 0/5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…unction

push_class_body_children's body-container list had drifted from extract_class_def's
and lacked enum_class_body/protocol_body, so Swift enum/protocol members were
re-walked and emitted as spurious top-level Functions (38 dup-nodes on LingoLearn).
Route those bodies through the nested-class path. dup_nodes 38->0; real Methods +
their CALLS edges unchanged (review keeps 7 callers). Adds regression test.

WS2a of the M1 'surpass codegraph' track. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…own)

WS1 of the M1 'surpass codegraph' track. New `explore` MCP tool composes the
existing resolve / cbm_store_bfs / resolve_snippet_source / batch_count_degrees
internals into ONE agent-ergonomic call returning markdown: blast-radius
(attributed callers) + verbatim line-numbered source grouped by file, with
inline fan-in hotspot flags and a query_graph (cypher) escape-hatch footer.
Matches codegraph's explore and exceeds it (precise caller attribution +
hotspots + cypher, which codegraph's explore lacks).

Adversarially reviewed (5 lenses, each finding refuted against the code);
memory-safety clean. Fixed all 3 confirmed honesty/silent-truncation findings:
clamp depth>=1 + honest 'within N hops' label for depth>1; elision marker when
a body exceeds 160 lines; cap notice when >16 query terms. Adds tests:
explore-in-tools/list (schema validity) + 2 error-path guards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…odegraph 79)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@DeusData

Copy link
Copy Markdown
Owner

Thanks @win4r — building on my note on #567: the genuinely new and useful parts here are the read-only explore MCP tool and the Swift duplicate-node fix — both look good and safe.

The blockers are the same as #567: the README rebrand to a personal fork, the bench/headtohead.sh harness (it hardcodes /Users/charlesqin/... and an external codegraph dependency, so it can't live upstream), the empty description, and unsigned commits.

Could you resubmit just the useful code — e.g. the explore tool as its own focused PR and the Swift fix as another — without the README rebrand or bench/, with signed-off commits (git commit -s) and a short description? Happy to review and merge those — the code is genuinely good; it just needs to come in clean (no rebrand/bench) and with CI green (signed-off commits, lint and tests passing). 🙏

@DeusData DeusData mentioned this pull request Jun 23, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants