Skip to content

M2 idiomatic kinds#568

Open
win4r wants to merge 15 commits into
DeusData:mainfrom
win4r:m2-idiomatic-kinds
Open

M2 idiomatic kinds#568
win4r wants to merge 15 commits into
DeusData:mainfrom
win4r:m2-idiomatic-kinds

Conversation

@win4r

@win4r win4r commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Checklist

  • Every commit is signed off (git commit -s) — required, CI rejects
    unsigned commits (DCO, see CONTRIBUTING.md)
  • Tests pass locally (make -f Makefile.cbm test)
  • Lint passes (make -f Makefile.cbm lint-ci)
  • New behavior is covered by a test (reproduce-first for bug fixes)

KerseyFabrications and others added 15 commits June 20, 2026 23:09
A node group variable carried through a WITH aggregation
(e.g. `WITH g, count(*) AS c RETURN g.file_path`) returned blank for every
property except its name: the carried virtual binding held only the group
key (the node's name) and lacked a store handle, so node_prop() could
neither read other fields nor compute degrees.
Fix: capture the node id of a bare node group-var in with_agg_find_or_create
and tag the carried virtual binding with it; in node_prop(), when such a stub
(id set, string fields unpopulated) is asked for a missing property, re-fetch
the full node via cbm_store_find_node_by_id and project it. Also propagate
the store onto virtual bindings so node_prop can re-fetch and compute
degrees. The stub gate is heuristic but never yields a wrong value — worst
case is one redundant indexed lookup. Adds regression test
cypher_exec_with_node_groupvar_prop.

Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
(cherry picked from commit 8b03974)
…esults

MATCH (c:Class)-[:DEFINES_METHOD]->(m:Method) returned at most 10 results
for any class, regardless of how many methods it actually has.

Root cause: bind_cap was set to scan_count (the number of nodes matched in
the initial pattern — typically 1 when querying a single class by name).
max_new = bind_cap * 10 = 10, so the edge expansion loop exited after
collecting 10 results. No error, no warning, no truncation indicator.

This is language-agnostic: any class with more than 10 methods in any
language was silently truncated. The fix is two characters:
  bind_cap = scan_count > max_rows ? scan_count : max_rows

Regression test: a Python class with 15 methods must return all 15 via
MATCH (c:Class)-[:DEFINES_METHOD]->(m:Method) with label filtering.

Signed-off-by: Thomas Dyar <tdyar@intersystems.com>
(cherry picked from commit c43fc8d)
A call carrying enough long arguments drove append_args_json()'s running
position past the fixed CBM_SZ_2K `props` stack buffer in
emit_normal_calls_edge(): format_call_arg() returns snprintf's *untruncated*
length, so `pos += (size_t)n` could exceed `bufsize`, after which the
trailing `buf[pos] = '\0'` (and `buf[pos++] = ']'`) wrote out of bounds. The
stack canary caught it as SIGABRT, so full-repo indexing of large TypeScript
codebases crashed the server in the parallel resolve pass
(emit_service_edge -> emit_normal_calls_edge -> finalize_and_emit ->
append_args_json). Confirmed with AddressSanitizer:
stack-buffer-overflow WRITE at pass_parallel.c:1124, 'props' (2048 B).

Fix: when an argument does not fully fit, roll back to before its separator
and stop appending (atomic field, matching append_json_string's behaviour),
so `pos` can never advance past the buffer.

Add regression test parallel_args_json_no_overflow: indexes a fixture whose
single call carries 60 long string args (args JSON well past 2 KB); under the
ASan test build it aborts without this fix and passes with it.

Signed-off-by: Andrius Skerla <1492322+rainder@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit 74d15a6)
Signed-off-by: Saurav Kumar <sauravsk2507@gmail.com>
(cherry picked from commit c3a1a79)
git_allocator moved out of the top-level git2.h into git2/sys/alloc.h
in libgit2 1.8.0. Add an explicit include so the mimalloc binding
compiles against libgit2 >= 1.8 (e.g. MacPorts libgit2 1.9.4).

(cherry picked from commit 586fc8a)
manage_adr stores ADRs in project_summaries, but a full re-index
(triggered by file changes or new files) deletes the DB in
try_incremental_or_delete_db and rebuilds it from the graph buffer,
which writes an empty project_summaries table. file_hashes were
re-persisted after the rebuild but project_summaries were not, so the
ADR was silently lost.

Fix: capture the ADR before the DB is unlinked, stash it on the
pipeline struct, and restore it after the rebuilt DB is reopened in
dump_and_persist_hashes. The incremental path is unaffected (it never
rewrites the DB). Verified: ADR now survives a full re-index.

Signed-off-by: RithvikReddy0-0 <rithvikreddymukkara@gmail.com>
(cherry picked from commit 7b6c063)
detect_changes advertised a `since` parameter in its inputSchema but the
handler never read it — it always diffed against base_branch (default
"main"), so detect_changes(since="HEAD~10") silently returned the wrong or
empty result when HEAD was on the default branch.

Fix: read `since` and, when present, route it through base_branch so the
existing shell-arg validation (cbm_validate_shell_arg) and the
`<base>...HEAD` diff apply unchanged; `since` takes precedence over
base_branch. Also narrows the schema description — the prior "date" form
(e.g. 2026-01-01) is not a revision and never worked through this path — and
documents the inherited three-dot semantics. Adds regression tests
tool_detect_changes_since and tool_detect_changes_since_precedence.

Refs DeusData#371

Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
(cherry picked from commit 53501b0)
trace_path resolved a function_name from the first row of an unordered name
query with no ambiguity check, so a same-named entity (e.g. a shell script's
main()) could silently shadow the intended C main(). get_code_snippet
reported "ambiguous" for a short name even when one match was the obvious
definition (the .c body vs a .h declaration).

Fix: add a deterministic resolution ranking — a callable label outranks a
module, then the larger definition by line span wins, preferring a real
definition without hardcoding file extensions — and a picker that flags a
genuine tie. trace_path now traces the preferred node and returns the
existing ambiguous-suggestions response on a true tie instead of silently
taking nodes[0]; get_code_snippet resolves directly to the preferred match,
reporting ambiguity only for real ties. Adds regression tests
tool_trace_call_path_ambiguous and tool_trace_call_path_prefers_definition.

Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
(cherry picked from commit 382dc24)
Signed-off-by: King Star <mcxin.y@gmail.com>
(cherry picked from commit 935027a)
Mark this as a community fork of DeusData/codebase-memory-mcp (MIT, © 2025
DeusData) and list the integrated incremental-reindex fix (DeusData#528) plus the
9 cherry-picked upstream PRs (DeusData#465 DeusData#412 DeusData#475 DeusData#527 DeusData#512 DeusData#539 DeusData#464 DeusData#466 DeusData#526).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: win4r <win4r@outlook.com>
Re-runnable deterministic comparison (dup-nodes, kinds, call-graph parity).
Baseline on LingoLearn: cbm dup_nodes=38, Swift-type-kinds=1 vs codegraph 0/5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…unction

push_class_body_children's body-container list had drifted from extract_class_def's
and lacked enum_class_body/protocol_body, so Swift enum/protocol members were
re-walked and emitted as spurious top-level Functions (38 dup-nodes on LingoLearn).
Route those bodies through the nested-class path. dup_nodes 38->0; real Methods +
their CALLS edges unchanged (review keeps 7 callers). Adds regression test.

WS2a of the M1 'surpass codegraph' track. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…own)

WS1 of the M1 'surpass codegraph' track. New `explore` MCP tool composes the
existing resolve / cbm_store_bfs / resolve_snippet_source / batch_count_degrees
internals into ONE agent-ergonomic call returning markdown: blast-radius
(attributed callers) + verbatim line-numbered source grouped by file, with
inline fan-in hotspot flags and a query_graph (cypher) escape-hatch footer.
Matches codegraph's explore and exceeds it (precise caller attribution +
hotspots + cypher, which codegraph's explore lacks).

Adversarially reviewed (5 lenses, each finding refuted against the code);
memory-safety clean. Fixed all 3 confirmed honesty/silent-truncation findings:
clamp depth>=1 + honest 'within N hops' label for depth>1; elision marker when
a body exceeds 160 lines; cap notice when >16 query terms. Adds tests:
explore-in-tools/list (schema validity) + 2 error-path guards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…odegraph 79)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t from class

WS2b of the M2 track. tree-sitter-swift emits one class_declaration node for
class/struct/enum/actor (distinguished by the declaration_kind keyword field);
relabel to Struct/Enum/Actor (class stays Class, protocol already Interface) so
the graph distinguishes Swift type kinds — closes codegraph's modeling edge
(LingoLearn: 1 lumped kind -> Struct:38/Enum:20/Class:6, Swift-kind-fidelity 1->3).

Label is load-bearing: added Struct/Enum/Actor to every resolver allowlist
(registry x3, resolve_as_class x2, store.c arch/semantic SQL x4, pass_configlink
CONFIGURES, pass_enrichment decorators + nlabels, search_code ranking) so
CALLS/INHERITS/USES_TYPE/CONFIGURES edges + architecture/search are unaffected
for real user code (review keeps 7 callers, extension-method callers intact:
addingDays 11, tap 12; dup_nodes 0).

Adversarially reviewed (4 lenses, 12 agents). Fixed a HIGH bug: a same-file
`extension` shares the extended type's FQN, so its (Class) type def clobbered the
real type's idiomatic label via the last-write-wins upsert (struct X -> Class).
Extensions now extract members but emit NO type def — which also removes the
phantom 'Class' nodes previously created for stdlib types the code only extends
(Date/Color/View). Net effect: total edges 1813->1689 (dropped edges are spurious
stdlib-constructor CALLS to those phantom nodes, NOT user-code relationships;
brings cbm closer to codegraph, which also doesn't node merely-extended stdlib
types). Adds 2 regression tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@DeusData

Copy link
Copy Markdown
Owner

Thanks @win4r — this is the cumulative branch (it includes #567 + #569 plus the Swift idiomatic type kinds Struct/Enum/Actor, which are a nice addition).

Same blockers as the other two: the README rebrand to a personal fork, the bench/ harness with a hardcoded personal path, the empty description, and DCO not signed.

Rather than three stacked, fork-branded PRs, could you send the useful pieces as focused, signed-off PRs — the explore tool, the Swift fixes/kinds — dropping the README rebrand and bench/? We'd be glad to land those — the engineering is genuinely good; it just needs to arrive clean (no rebrand/bench) and with CI green (signed-off commits, lint and tests passing). 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants