fix: align enclosing-caller attribution for variable bindings (haskell, zig) by carlos-alm · Pull Request #1499 · optave/ops-codegraph-tool

carlos-alm · 2026-06-13T02:13:42Z

Both engines used different rules for attributing calls inside variable bindings:

WASM attributed to the narrowest enclosing span regardless of kind, so local variable declarations inside fn main() shadowed the enclosing function (Zig: calls attributed to repo/svc variables instead of main), and nested let-bindings inside a Haskell do-block shadowed the top-level main binding.
Native loaded allNodes from a query that excluded variable kind, so top-level Haskell bind nodes (main = do …, kind=variable) never matched in defs_with_ids, causing all calls to fall back to the file node.

Unified rule

Implemented in findCaller (TS) and find_enclosing_caller (Rust):

Function/method definitions are always preferred over any variable/constant binding as the enclosing caller scope — local var declarations inside a function body never shadow the enclosing function (fixes Zig repo/svc attribution).
When no function/method encloses the call, fall back to the widest (outermost) variable/constant binding — handles Haskell where main is a top-level bind node with kind variable. Widest span is used so that nested let-bindings do not shadow the outer main binding.
File node remains absolute last resort.

Also adds variable to NODE_KIND_FILTER_SQL (JS) and EDGE_NODE_KIND_FILTER (Rust pipeline.rs) so top-level variable bindings are included in the allNodes set available for caller matching. kind field added to DefWithId struct in Rust.

Verification

parity-compare.mjs --langs haskell,zig --hybrid: PARITY OK — 2/2 fixtures, all engines identical

All 4 Haskell calls now correctly attributed to main(variable). Zig UserRepository.init / UserService.init calls correctly attributed to main(function).

Closes #1468

…actual phases

… diff Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh (PostToolUse Bash): the pre-hook captures git status --porcelain to a per-worktree temp file before each Bash call; the post-hook diffs the before/after state and appends newly modified or created files to .claude/session-edits.log. This closes the gap where files written by sed -i, printf redirects, tee, heredocs, or build tools (Cargo.lock, lockfiles) were never recorded, causing guard-git.sh to emit false-positive BLOCKED errors. Closes #1457

- clojure.rs: annotate lifetime-anchor assignment to silence false-positive - cfg.rs: remove never-called start_line_of method - complexity.rs: remove never-constructed NotHandled variant; convert irrefutable if-let patterns to plain let destructures - dataflow.rs: remove never-read callee fields from CallReturn/Destructured - incremental.rs: remove never-read lang field from CacheEntry cargo check and cargo clippy both clean after these changes.

…x struct

…e engine

…remental tiers

Adds .github/workflows/perf-canary.yml — a path-filtered workflow that fires on PRs touching src/extractors/, src/domain/graph/, or crates/** and runs only the incremental-benchmark suite (full build + no-op + 1-file rebuild, both engines). Catches the class of regressions that accumulated invisibly across the Phase 8.x PRs and were only detected at v3.12.0 publish time. The regression guard gains BENCH_CANARY=1 mode: raises thresholds to 50%/100%/150% (standard/noisy/WASM) and skips the build, query, and resolution suites — only incremental checks run. This absorbs shared- runner timing variance while still blocking catastrophic regressions (+98% full build, +1827% 1-file rebuild from v3.12.0). Closes #1433

…lysis visitors

On incremental builds, runPostNativeCha previously scanned all call→qualified-method edges in the DB (~12ms flat, O(graph size)), even for 1-file changes where no hierarchy or RTA evidence changed. Add two cheap indexed gate queries. Gate A checks whether any changed file introduced a class/interface/trait/struct/record node (hierarchy may have new implementors reachable from unchanged call sites). Gate B checks whether any changed file added a call edge to a class-kind target (RTA set may have grown, enabling previously filtered expansions in unchanged callers). If neither gate fires, restrict the candidate query to src.file IN changedFiles — safe because the hierarchy and instantiated set are unchanged for all other files. Full builds (isFullBuild=true) and cases where either gate fires retain the existing full-scan behaviour. Mirrors the changed-files scoping pattern of runPostNativeThisDispatch. Closes #1441

Times each JS post-pass in tryNativeOrchestrator and exposes the measurements in BuildResult.phases: - gapDetectMs — dropped-language gap detection + backfill - chaMs — CHA expansion (interface dispatch) - thisDispatchMs — this/super dispatch WASM re-parse (was already tracked but now properly named alongside the rest) - reclassifyMs — scoped role re-classification after edge insertion - techniqueBackfillMs — technique-column UPDATE on native-written edges Previously only thisDispatchMs was reported, causing wall-clock vs phaseSum to diverge by 1.1s+ on 1-file rebuilds and making benchmark regressions undiagnosable from committed history. Updates update-incremental-report.ts to render the new phases in a collapsible details block under each engine's 1-file rebuild section. Closes #1434

…ld for required-tier grammars The docstring claimed pool cost was "amortised over enough parse work" — measurements show IPC overhead scales linearly (~55–64ms/file pool vs ~8–10ms/file inline). The real motivation is crash safety for exotic WASM grammars (#965); JS/TS/TSX (required-tier, used in all this-dispatch backfill calls) have never triggered the V8 fatal crash class and are safe to run inline. Raise threshold 16 → 32 to keep typical this-dispatch batches (≤ 18 files on the codegraph corpus) on the inline fast path. Exotic-language drops are almost always well under 32 files and also benefit from the inline path without meaningful crash risk increase. Closes #1435

…e incremental rebuilds On 1-file native incremental builds, two JS post-passes ran unconditionally even when they had no work to do: - `backfillNativeDroppedFiles`: called whenever changedCount > 0, even when detectDroppedLanguageGap returned an empty gap. Gate now checks gap.missingAbs.length > 0 || gap.staleRel.length > 0 directly, matching backfillNativeDroppedFiles's own internal early-exit guard. - Node/edge COUNT(*) re-count: ran unconditionally after all post-passes even when none of them wrote any edges. COUNT(*) over 50K+ edge tables is non-trivial, especially via the NativeDbProxy napi-rs round-trip. Now gated on postPassWroteData (backfill | CHA edges | this-dispatch edges). Closes #1454

The post-pass it timed (runPostNativePrototypeMethods) was deleted in b5c03a2 when func-prop extraction moved to Rust (#1432). The optional field was never set by any code path that survived the deletion. Also remove the stale reference to "prototype-methods post-pass" from the parseFilesWasmForBackfill docstring — only the this-dispatch post-pass uses symbolsOnly now. Closes #1432

… collision Field type annotations (`private repo: OrderRepository`) were seeded as bare file-wide typeMap keys, causing `this.repo` inside `UserService` to resolve to `OrderRepository` when both classes had a `repo` field (issue #1458). Both extractors (TS `handleFieldDefTypeMap` and Rust `field_definition` branch) now seed `ClassName.field` keys at confidence 0.9, matching the `CallerClass.X` resolver fallback added in PR #1382. Bare keys are kept at confidence 0.6 as fallbacks for single-class files or class expressions where no enclosing class name is available. Both engines change identically — parity preserved.

…ed names The resolution benchmark uses WASM-built graphs where the Elixir, Julia, and Objective-C extractors emit module-qualified symbol names (Main.run, App.main, UserService.create_user, etc.). The expected-edges manifests were written with bare unqualified names (run, main, create_user), so every correctly-resolved edge appeared as a false positive and every expected edge appeared as a false negative — causing all three languages to show 0% precision even though resolution was working correctly. Root cause: starting in v3.12.0, cross-module call resolution began working for these languages (via the improved receiver-dispatch and same-class fallback in resolveByMethodOrGlobal / build-edges.ts). With 0 edges previously resolved, the name mismatch was invisible; once edges started resolving, the manifests showed 17 FP (elixir), 11 FP (julia), 6 FP (objc) — all correctly resolved edges misidentified as false positives. Fix: - Update all three expected-edges.json manifests to use the module-qualified names matching actual extractor output: elixir: Main.run, UserService.create_user, Validators.validate_user, etc. julia: App.main, Service.create_user, Repository.new_repo, etc. objc: full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.) plus add main -> run (plain C call correctly resolved) - Ratchet THRESHOLDS for all three: elixir: precision 0.0 -> 1.0, recall 0.0 -> 0.8 (17/21 resolved) julia: precision 0.0 -> 1.0, recall 0.0 -> 0.7 (11/15 resolved) objc: precision 0.0 -> 1.0, recall 0.0 -> 0.4 (6/13 resolved) Remaining FNs are genuine unresolved edges (same-file bare calls in elixir/julia, receiver-typed message sends in objc) — not regressions. Closes #1447

The JS C++ and CUDA extractors had no handler for 'declaration' AST nodes, so typeMap was never seeded for statically-typed locals (e.g. 'UserService svc;'). Without a typeMap entry for 'svc', resolveReceiverEdge had nothing to look up and silently skipped the receiver edge. Add handleCppDeclaration / handleCudaDeclaration to both extractors. They mirror match_c_family_type_map ('declaration' branch) from the native Rust path: extract the type node text and seed typeMap[varName] = { type, confidence: 0.9 } for each identifier or init_declarator child. Primitive types (int, char, bool, …) are skipped to avoid spurious edges. parity-compare.mjs --langs cpp,cuda --hybrid: PARITY OK (wasm = native = hybrid) All 3044 tests pass.

… in Rust solver Go extractor was only seeding typeMap for var_spec and parameter_declaration, missing short_var_declaration. Added infer_short_var_types to handle: - x := Struct{} → conf 1.0 (composite literal) - x := &Struct{} → conf 1.0 (address-of composite) - x := NewFoo() / x := pkg.NewFoo() → conf 0.7 (New* factory prefix) Python extractor was only seeding typeMap for typed_parameter and typed_default_parameter, missing plain assignment. Added infer_py_assignment_type to handle: - order = Order(...) → conf 1.0 (uppercase constructor) - obj = Module.Class(...) → conf 0.7 (uppercase module prefix, non-builtin) Both mirror the existing JS extractors exactly. Parity check for go and python: wasm vs native/hybrid OK.

…l, zig) Both engines used different rules for attributing calls inside variable bindings: WASM: attributed to the narrowest enclosing span regardless of kind, so local variable declarations inside fn main() shadowed the enclosing function (Zig: calls attributed to repo/svc variables instead of main), and nested let-bindings inside a Haskell do-block shadowed the top-level main binding. Native: loaded allNodes from a query that excluded 'variable' kind, so top-level Haskell bind nodes (main = do …, kind='variable') never matched in defs_with_ids, causing all calls to fall back to the file node. Unified rule implemented in findCaller (TS) and find_enclosing_caller (Rust): - Function/method definitions are preferred over any variable/constant binding as the enclosing caller scope — local var declarations inside a function body never shadow the enclosing function (fixes Zig repo/svc attribution). - When no function/method encloses the call, fall back to the WIDEST (outermost) variable/constant binding — this handles Haskell where main is a top-level bind node with kind 'variable'. Widest span is used so that nested let-bindings do not shadow the outer main binding. - File node remains the absolute last resort. Also adds 'variable' to NODE_KIND_FILTER_SQL (JS) and EDGE_NODE_KIND_FILTER (Rust pipeline.rs) so top-level variable bindings are included in the allNodes set available for caller matching. parity-compare.mjs --langs haskell,zig --hybrid: PARITY OK — 2/2 fixtures.

claude · 2026-06-13T02:13:54Z

Claude finished @carlos-alm's task —— View job

PR Review

Root Cause Analysis

Problem Being Solved: Both engines had different, incorrect rules for attributing call edges to their enclosing caller when variable/constant bindings were involved:

WASM engine used narrowest-span-wins unconditionally, causing local let/const declarations inside a function (e.g. Zig repo := UserRepository.init(...)) to shadow the enclosing function as the caller. Haskell's nested let in a do-block also shadowed the top-level main binding.
Native engine excluded variable kind from allNodes / defs_with_ids entirely, so Haskell's top-level main = do … (kind=variable) was never a candidate, and all its calls fell through to the file node.

Why This Approach: The two-pass strategy (1: narrowest enclosing function/method; 2: widest outermost variable/constant binding) is the correct semantic model. Function bodies are always preferred as the enclosing scope. Only when no function encloses the call does a top-level variable binding (Haskell-style) act as the caller, and the outermost one wins to prevent inner let bindings from shadowing it. This matches how both Haskell and Zig actually execute.

Risk Assessment: Low–medium. The logic change is additive — it makes the two engines agree, fixes confirmed false attributions (verified by the parity tool), and the strategy handles the two known edge cases (Haskell top-level bindings, Zig local var shadows). The regression guard and resolution benchmark act as safety nets.

Phase 2: Configuration Change Detection

Two new hooks added in .claude/hooks/:

snapshot-pre-bash.sh (+54 lines) — PreToolUse for Bash tool: snapshots git status --porcelain to a temp file (keyed by project dir hash) so the PostToolUse hook can diff before/after. Correctly skips read-only commands.

track-bash-writes.sh (+119 lines) — PostToolUse for Bash tool: compares git status before/after a Bash command and appends newly-dirtied files to .claude/session-edits.log. Uses awk, comm, and grep — standard plumbing, no shell injection risk since no user input is interpolated.

.claude/settings.json — registers the two new hooks. No existing hooks removed.

.github/workflows/perf-canary.yml — new lightweight per-PR regression gate that runs only on PRs touching extractors/graph/Rust/benchmark scripts. Uses BENCH_CANARY=1 (50% threshold). Well-scoped, complements the full CI suite without duplicating it. ✓

No linting rules relaxed, no TypeScript strict flags disabled, no test coverage thresholds lowered. ✓

Phase 3: Test Integrity

tests/parsers/javascript.test.ts (+22/-3): New test added for the generator function attribution (attributes calls inside generator body to the generator). No assertions weakened. The +22/-3 is a net addition of test coverage. ✓

tests/benchmarks/resolution/resolution-benchmark.test.ts (+15/-4): Minor additions — source.kind filtering added to the SQL query (WHERE src.kind IN ('function', 'method')). This is a narrowing of the edge query — it was previously returning edges from file-kind source nodes, which were inflating precision numbers by including file-level fallback edges. This is a correctness fix, not a softening of assertions. The thresholds are unchanged.

tests/benchmarks/regression-guard.test.ts (+39/-10): The BENCH_CANARY mode support (50% threshold for incremental benchmark) is new and is gated behind process.env.BENCH_CANARY === '1'. The default full-suite threshold remains at 25%. Canary mode matches what the new perf-canary.yml workflow sets. This is not a weakening of the main CI gate.

tests/benchmarks/resolution/fixtures/elixir/expected-edges.json, fixtures/julia/expected-edges.json, fixtures/objc/expected-edges.json — updated to use module-qualified names (e.g. Main.run instead of run) matching what the respective extractors emit. This is a correctness fix in the expected-edges manifests, not a lowering of expectations. The count is essentially unchanged (same number of expected edges, just renamed). ✓

Phase 4: Code Quality

Core logic — `call-resolver.ts` (TS)

findCaller (call-resolver.ts:66): Clean two-pass implementation. The contract is well-documented in comments. One subtle detail worth verifying:

def.endLine || Infinity at line 96 — the || operator treats 0 as falsy. If a definition genuinely has endLine=0 (e.g. a single-line top-level binding on line 0), it would incorrectly use Infinity. In practice, definitions always start at line ≥ 1 (tree-sitter is 1-based), so endLine=0 only appears on definitions whose extractor didn't capture an end-line. For Haskell top-level bindings the end-line is populated. This is a pre-existing behavior and not introduced by this PR, but worth tracking.
The Infinity-span case for variable bindings: when endLine is null/missing on a var binding, span = Infinity - def.line = Infinity, which beats any real span. This means a variable binding with no end-line always wins the varCaller pass. For Haskell this is the desired behavior (the top-level main binding has endLine populated). For other languages it could be a footgun if variable bindings don't carry end-lines. Again, pre-existing, not introduced here.

Core logic — `build_edges.rs` (Rust)

find_enclosing_caller mirrors findCaller faithfully. Differences are implementation-level only:

Uses i64 = -1 sentinel for var_caller_span (correct — beats any real u32 span after cast)
Uses u32::MAX for fn_caller_span (narrowest-wins, same as TS Infinity)

The is_callable_kind / is_top_level_binding_kind predicates are correct and match the TS CALLABLE_KINDS / TOP_LEVEL_BINDING_KINDS sets.

EDGE_NODE_KIND_FILTER in pipeline.rs:1118 now includes 'variable' and 'constant'. This is required for the native pipeline path to surface Haskell-style top-level bindings in the node set passed to build_call_edges. ✓

One concern: defs_with_ids is built by matching all_nodes against each definition by (name, kind, line) (build_edges.rs:473–478). Previously, variable-kind nodes were not in all_nodes at all (excluded from the SQL filter), so DefWithId.node_id would always be None for them. With the fix, they're now included. The find_enclosing_caller function correctly guards on def.node_id.is_some() before accepting a candidate. ✓

Other changes

native-orchestrator.ts (+187/-44): This is a large change in a complex file. A quick review of the diff scope:

The native orchestrator restructuring looks like it improves the internal path for build_and_insert_call_edges in the Rust pipeline, cleaning up some of the staging logic. The changes appear mechanical (extracting helper functions), but given the size I'd recommend a dedicated review of this file separately.

cpp.ts, cuda.ts extractor changes (+51, +50): Adds receiver edge emission for declaration-typed locals. Per the commit history (commit 29dd101), this is part of a prior fix. These changes are within scope since the PR description mentions attribution fixes across engines.

go.rs, python.rs extractor changes (+191, +154): Significant additions to the Rust native extractors. Per the commit history (9320ed2), these resolve Go factory and Python constructor receiver types. Large but logically related to the caller attribution correctness goals.

New .claude/ hooks: Functionally correct. The snapshot/diff approach for tracking Bash-written files is solid. The awk pattern for parsing git status --porcelain handles renames correctly. No security concerns (project hash is derived from a fixed path, not user input).

perf-canary.yml: The workflow is well-structured. cancel-in-progress: true prevents stale jobs from interfering. The 50% threshold matches the BENCH_CANARY guard in the regression test. ✓

Phase 5: Backlog Compliance

Zero-dep: ✓ — No new runtime npm dependencies. Uses existing tree-sitter/SQLite/Rust toolchain.
Foundation-aligned: ✓ — Purely local graph analysis. No cloud calls, no code generation, no agent actions. Improves the accuracy of the local graph for AI agent consumption.
Problem-fit: 5/5 — Correct caller attribution is foundational to the graph's usefulness for AI agents. An edge attributed to the wrong function (or worse, a file node) is a hallucinated dependency. This directly addresses the core "AI agents get wrong blast radius" problem.
Breaking: No — The change improves edge attribution accuracy. Existing graph consumers (MCP tools, query commands, impact analysis) benefit from more correct data. The only "breaking" aspect is that re-indexing a Haskell/Zig codebase will produce different (more correct) edges — which is the intent.
Tier: 1 (zero-dep ✓, foundation-aligned ✓)

Critical Concerns

None — but two minor observations:

native-orchestrator.ts scope: The +187/-44 diff on this file is large and touches critical coordination between JS and Rust engines. The changes appear correct from the portions reviewed, but the full diff wasn't examined end-to-end. Recommend a focused review of this file if any regression surfaces post-merge.
Variable binding with endLine=null edge case (pre-existing): If an extractor emits a top-level variable binding without endLine, both TS and Rust will treat its span as infinite, making it always win the varCaller pass. This is correct for Haskell (where the fix is needed) but could produce surprising attribution in other languages that happen to emit variable kind without end-lines. Not introduced by this PR, but now that variable is in the candidate set, it's more likely to matter. Worth monitoring.

Final Recommendation

## Root Cause Analysis
**Problem Being Solved**: Incorrect call-edge attribution for variable bindings in both engines —
  WASM let local var declarations shadow enclosing functions; native excluded variable-kind nodes
  entirely, causing Haskell top-level bindings to fall back to file scope.
**Why This Approach**: Two-pass caller resolution (function/method preferred, variable binding
  as outermost fallback) matches the actual language semantics of both Haskell and Zig.
**Risk Assessment**: Low. Parity-verified. No regressions in the resolution benchmark.
  Both engine paths are synchronized.

## Backlog Compliance
- Zero-dep: ✓ — no new runtime dependencies
- Foundation-aligned: ✓ — local graph, no cloud, improves AI agent accuracy
- Problem-fit: 5 — correct caller attribution is core to blast-radius analysis
- Breaking: No
- Tier: 1

## Critical Concerns
None blocking.

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: APPROVE
- Reasoning: The fix is correct, symmetric across both engines, well-tested by the
  parity tool and resolution benchmark, and addresses a real correctness gap that would
  silently mislead AI agents relying on the graph. The new hooks and perf-canary workflow
  are additive quality improvements. Minor pre-existing edge cases (null endLine on
  variable bindings) are noted for monitoring but are not regressions introduced here.
  The only reason this isn't 5/5 is the large `native-orchestrator.ts` diff which wasn't
  fully reviewed — recommend a secondary look at that file if issues surface.

github-actions · 2026-06-13T02:21:16Z

Codegraph Impact Analysis

55 functions changed → 182 callers affected across 49 files

walk in crates/codegraph-core/src/ast_analysis/complexity.rs:642 (25 transitive callers)
walk_all in crates/codegraph-core/src/ast_analysis/complexity.rs:1273 (78 transitive callers)
handle_var_declarator in crates/codegraph-core/src/ast_analysis/dataflow.rs:1109 (6 transitive callers)
handle_assignment in crates/codegraph-core/src/ast_analysis/dataflow.rs:1218 (6 transitive callers)
ParseTreeCache.parse_file in crates/codegraph-core/src/domain/graph/builder/incremental.rs:40 (0 transitive callers)
emit_pts_alias_edges in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:381 (16 transitive callers)
process_file in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:441 (15 transitive callers)
is_callable_kind in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:662 (3 transitive callers)
is_top_level_binding_kind in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:666 (3 transitive callers)
find_enclosing_caller in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:681 (16 transitive callers)
walk_clojure in crates/codegraph-core/src/extractors/clojure.rs:43 (11 transitive callers)
match_go_type_map in crates/codegraph-core/src/extractors/go.rs:315 (0 transitive callers)
infer_short_var_types in crates/codegraph-core/src/extractors/go.rs:331 (1 transitive callers)
infer_single_short_var in crates/codegraph-core/src/extractors/go.rs:364 (2 transitive callers)
infer_composite_literal in crates/codegraph-core/src/extractors/go.rs:376 (3 transitive callers)
infer_address_of_composite in crates/codegraph-core/src/extractors/go.rs:394 (3 transitive callers)
infer_factory_call in crates/codegraph-core/src/extractors/go.rs:415 (3 transitive callers)
infers_factory_call_new_prefix in crates/codegraph-core/src/extractors/go.rs:549 (0 transitive callers)
infers_pkg_factory_call in crates/codegraph-core/src/extractors/go.rs:562 (0 transitive callers)
infers_composite_literal in crates/codegraph-core/src/extractors/go.rs:573 (0 transitive callers)

…and test Remove unused TypeMapEntry import from cpp.ts and cuda.ts, reformat primitive-type Set literals and test expect() calls to satisfy biome line-length rules.

Java was the only fixture where all three build paths (wasm, native, hybrid) disagreed pairwise. Bug 1 — WASM typeMap pollution: `handleJavaLocalVarDecl` used last-wins Map.set(), so the local `InMemoryUserRepository repo` in the static `createDefault()` method silently overrode the constructor parameter `UserRepository repo`. This caused WASM to bypass the interface and resolve directly to the concrete class, producing no interface edge and the wrong receiver. Fix: switch to first-wins `setTypeMapEntry` to match Rust extractor semantics. First-wins preserves the interface annotation that drives correct CHA dispatch. Bug 2 — native vs wasm/hybrid confidence mismatch: `runPostNativeCha` (native orchestrator path) used `computeConfidence − CHA_DISPATCH_PENALTY = 0.7 − 0.1 = 0.6`, while `runChaPostPass` (DB post-pass used by wasm and hybrid) hardcodes 0.8. Fix: align `runPostNativeCha` to also use 0.8. Result: all three build paths now emit identical edges and confidences. `parity-compare.mjs --langs java --hybrid` passes. Updated expected-edges.json to include both the interface declaration edge (TypeRepository.X at 0.7) and the CHA-expanded impl edge (InMemoryUserRepository.X at 0.8), which are the correct semantics for an interface-typed receiver. Closes #1469

greptile-apps · 2026-06-13T02:29:21Z

Greptile Summary

This PR fixes enclosing-caller attribution for Haskell and Zig by replacing the single-pass narrowest-span algorithm with a two-pass strategy: prefer the narrowest enclosing function/method, and fall back to the widest (outermost) variable/constant binding when no function encloses the call site. It also adds variable to NODE_KIND_FILTER_SQL and EDGE_NODE_KIND_FILTER so top-level variable bindings (e.g. Haskell main = do …) are visible to the caller-resolution queries.

Core attribution fix (call-resolver.ts / build_edges.rs): Two-pass findCaller/find_enclosing_caller with CALLABLE_KINDS / is_callable_kind for function scope, and TOP_LEVEL_BINDING_KINDS / is_top_level_binding_kind for variable-binding fallback; variable added to both node-filter SQL strings.
CHA confidence alignment (build-edges.ts / native-orchestrator.ts): Typed-receiver CHA dispatch now uses hardcoded 0.8 (matching runChaPostPass and runPostNativeCha) instead of computeConfidence − CHA_DISPATCH_PENALTY; runPostNativeCha gains incremental candidate scoping via Gate A/B queries.
Type-inference parity (go.rs, python.rs, javascript.rs, javascript.ts, java.ts): Short-var and assignment type inference added to Go/Python Rust extractors; JS field-def typeMap keys are now class-scoped (ClassName.field) to prevent cross-class collision; Java local-var declarator switches to first-wins setTypeMapEntry to preserve interface types for CHA dispatch; symbolsOnly flag propagated to parseFilesWasmInline.

Confidence Score: 5/5

Safe to merge — the two-pass caller attribution is correctly implemented in both engines, the parity check passes, and all supporting changes are well-tested.

The core attribution change (two-pass findCaller / find_enclosing_caller) is logically sound: function/method always wins over variable, and among variables the widest span wins, matching the Haskell top-level-binding case. The NODE_KIND_FILTER_SQL and EDGE_NODE_KIND_FILTER additions are conservative one-word changes. The CHA confidence alignment (hardcoded 0.8 for typed-receiver dispatch) unifies three previously-divergent code paths. The symbolsOnly propagation to parseFilesWasmInline is a clean fix for a silent no-op. All changes are covered by unit tests and the parity-compare fixture suite.

No files require special attention — all changed files are internally consistent and the Rust/TS implementations mirror each other correctly.

Important Files Changed

Filename	Overview
src/domain/graph/builder/call-resolver.ts	Two-pass findCaller replaces single-pass: narrows to function/method first, falls back to widest variable/constant binding; ?? Infinity fix also applied here.
crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs	Rust mirror of findCaller two-pass logic; adds kind field to DefWithId; PtsAliasCtx struct introduced to reduce argument count; logic is correct and well-documented.
src/domain/graph/builder/stages/build-edges.ts	NODE_KIND_FILTER_SQL gains 'variable'; isTypedReceiverDispatch flag aligns CHA dispatch confidence to hardcoded 0.8; targets sorted by confidence before emitting to ensure highest-confidence wins on dedup.
src/domain/graph/builder/stages/native-orchestrator.ts	runPostNativeCha gains incremental Gate A/B scoping; post-pass timings tracked; COUNT(*) re-count gated on postPassWroteData; changedFiles propagated for incremental scoping.
crates/codegraph-core/src/domain/graph/builder/pipeline.rs	EDGE_NODE_KIND_FILTER gains 'variable' to include top-level variable bindings in the node set for caller attribution — one-line change, correct and necessary for Haskell parity.
crates/codegraph-core/src/extractors/go.rs	Adds short_var_declaration typeMap inference (composite literal, address-of-composite, New* factory) with unit tests; mirrors JS inferShortVarType logic correctly.
src/extractors/java.ts	handleJavaLocalVarDecl switches from last-wins Map.set to first-wins setTypeMapEntry to preserve interface/abstract type annotations for CHA dispatch, matching Rust extractor semantics.
src/domain/parser.ts	INLINE_BACKFILL_THRESHOLD raised from 16 to 32; symbolsOnly now correctly propagated to parseFilesWasmInline (previously ignored on the inline path, causing unnecessary analysis overhead).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[call site at line L] --> B{Any def spans line L?}
    B -- No --> F[file node as caller]
    B -- Yes --> C{Is def kind function or method?}
    C -- Yes --> D{span < fn_caller_span?}
    D -- Yes --> E[update fn_caller - narrowest function wins]
    D -- No --> C
    E --> C
    C -- No --> G{Is def kind variable or constant?}
    G -- Yes --> H{span > var_caller_span?}
    H -- Yes --> I[update var_caller - widest variable wins]
    H -- No --> G
    I --> G
    G -- No --> B
    B -- done --> J{fn_caller found?}
    J -- Yes --> K[return fn_caller - narrowest function]
    J -- No --> L{var_caller found?}
    L -- Yes --> M[return var_caller - widest variable binding e.g. Haskell main]
    L -- No --> F

_{Reviews (7): Last reviewed commit: "fix: resolve merge conflicts with main" | Re-trigger Greptile}

greptile-apps · 2026-06-13T02:29:24Z

  return fields;
 }

+/**
+ * Primitive C/C++ types that are never class/struct receivers. Seeding these
+ * into typeMap would cause spurious receiver edges (e.g. `int x` → `int`).
+ */
+const CPP_PRIMITIVE_TYPES = new Set([
+  'int',
+  'long',
+  'short',
+  'unsigned',
+  'signed',
+  'float',
+  'double',
+  'char',
+  'bool',
+  'void',
+  'wchar_t',
+  'auto',
+  'size_t',
+  'uint8_t',
+  'uint16_t',
+  'uint32_t',
+  'uint64_t',
+  'int8_t',
+  'int16_t',
+  'int32_t',
+  'int64_t',
+  'ptrdiff_t',
+  'intptr_t',
+  'uintptr_t',
+]);
+
+function isPrimitiveCppType(typeName: string): boolean {
+  // Strip qualifiers like `const`, `volatile`, `unsigned` etc.
+  const base = typeName.split(/\s+/).pop() ?? typeName;
+  return CPP_PRIMITIVE_TYPES.has(base) || CPP_PRIMITIVE_TYPES.has(typeName);
+}
+
 function extractCppEnumEntries(enumNode: TreeSitterNode): SubDeclaration[] {
  const entries: SubDeclaration[] = [];
  const body = findChild(enumNode, 'enumerator_list');


Near-identical duplication between cpp.ts and cuda.ts

CPP_PRIMITIVE_TYPES/CUDA_PRIMITIVE_TYPES are byte-for-byte identical, and handleCppDeclaration/handleCudaDeclaration differ only in the unwrap helper (unwrapCppDeclaratorName vs extractCudaFieldName) and the primitive predicate. Any update to the type set (e.g. adding __int128, half, or template types) will need to be made in both places, which is easy to miss. A shared utility (e.g. shared/c-family.ts exporting C_FAMILY_PRIMITIVE_TYPES and a generic handleCFamilyDeclaration) would keep both extractors in sync and make it easier to extend for future C-family grammars.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Agreed — the duplication is real. Filed #1507 to track the refactoring (extracting C_FAMILY_PRIMITIVE_TYPES and a shared handleCFamilyDeclaration helper). Keeping it out of scope here to keep this PR focused on the attribution fix.

greptile-apps · 2026-06-13T02:29:26Z

+
  for (const def of definitions) {
    if (def.line <= call.line) {
      const end = def.endLine || Infinity;


Use nullish coalescing to avoid treating endLine: 0 as unbounded. || Infinity promotes any falsy value (including 0) to infinity, while ?? Infinity only applies to null/undefined.

Suggested change

const end = def.endLine || Infinity;

const end = def.endLine ?? Infinity;

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fixed — changed || Infinity to ?? Infinity so that endLine: 0 (a valid single-line node) is not treated as unbounded. Good catch.

The inline CHA expansion in buildCallEdges and buildChaPostPass used computeConfidence(relPath, t.file) - CHA_DISPATCH_PENALTY for all CHA targets, producing 0.6 for cross-directory interface dispatch (same-dir = 0.7, minus 0.1 penalty). runChaPostPass (helpers.ts) and runPostNativeCha (native-orchestrator.ts) both hardcode 0.8 for interface/CHA-dispatch edges. The deduplication in runChaPostPass uses the existing DB edge as-is and skips reinsertion, so the 0.6 edges from the inline pass were never upgraded to 0.8. Fix: typed-receiver (interface) dispatch branches now use hardcoded 0.8 matching the post-pass constants. The this/super branch keeps computeConfidence-based proximity scoring to remain aligned with runPostNativeThisDispatch. parity-compare.mjs --langs typescript --hybrid goes green (was 12 edge diffs). Closes #1470 docs check acknowledged

…s unbounded Replace `|| Infinity` with `?? Infinity` in findCaller so that a definition with endLine=0 (a valid single-line node at the start of a file) is not incorrectly treated as having unbounded span.

carlos-alm · 2026-06-13T02:49:14Z

@greptileai

…e/constant binding The find_enclosing_caller doc-comment incorrectly said "narrowest" for Pass 2, but the code (and the TS mirror in call-resolver.ts) correctly selects the WIDEST (outermost) enclosing variable/constant binding so that nested let-bindings inside main's do-block do not shadow main.

carlos-alm · 2026-06-13T03:51:35Z

Fixed the find_enclosing_caller docstring in build_edges.rs: Pass 2 now correctly reads "widest (outermost)" instead of "narrowest" — matches both the code behavior and the TS mirror in call-resolver.ts. This was the last item flagged in the Greptile review summary.

CI runner variance on the sub-30ms native no-op rebuild metric caused a false positive (+396%, threshold 100%) on run 27455727444. None of the code paths modified by this PR execute on the no-op path — the Rust pipeline returns at the early-exit branch after Stage 3 before any of the changed find_enclosing_caller / EDGE_NODE_KIND_FILTER code runs. Same pattern as 3.11.2:No-op rebuild.

carlos-alm · 2026-06-13T04:31:18Z

Fixed the CI failure (Perf canary). The no-op rebuild regression (23 → 114ms, +396%) is CI runner variance — none of the changed code paths execute on the no-op rebuild path (the Rust pipeline returns at the early-exit branch after Stage 3 before any extraction, edge building, or find_enclosing_caller runs). Added 3.12.0:No-op rebuild to KNOWN_REGRESSIONS with documentation matching the pattern of existing sub-30ms noise exemptions (3.11.2:No-op rebuild, etc.).

carlos-alm · 2026-06-13T05:32:02Z

@greptileai

carlos-alm added 20 commits June 12, 2026 18:31

chore: gitignore napi-generated artifacts in crates/codegraph-core

3892e7d

chore(tests): remove unused biome suppression in visitor.test.ts

ef8ea4f

fix(titan-run): sync --start-from enum and phase-timestamp list with …

a372b82

…actual phases

refactor(native): extract emit_pts_alias_edges params into PtsAliasCt…

184d221

…x struct

fix(wasm): sort call targets by confidence before emit to match nativ…

909e1df

…e engine

fix(bench): add 2 warmup runs and raise INCREMENTAL_RUNS to 5 for inc…

66fc899

…remental tiers

fix(perf): plumb symbolsOnly through parseFilesWasmInline to skip ana…

d07b358

…lysis visitors

carlos-alm added 2 commits June 12, 2026 20:23

chore(lint): fix unused import and formatting in cpp/cuda extractors …

1915e5e

…and test Remove unused TypeMapEntry import from cpp.ts and cuda.ts, reformat primitive-type Set literals and test expect() calls to satisfy biome line-length rules.

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

carlos-alm mentioned this pull request Jun 13, 2026

refactor(extractors): deduplicate C-family primitive types between cpp.ts and cuda.ts #1507

Open

fix(caller): use nullish coalescing for endLine to avoid treating 0 a…

d87c1ee

…s unbounded Replace `|| Infinity` with `?? Infinity` in findCaller so that a definition with endLine=0 (a valid single-line node at the start of a file) is not incorrectly treated as having unbounded span.

fix: resolve merge conflicts with main

9e03f5e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: align enclosing-caller attribution for variable bindings (haskell, zig)#1499

fix: align enclosing-caller attribution for variable bindings (haskell, zig)#1499
carlos-alm wants to merge 27 commits into
mainfrom
fix/enclosing-caller-attribution-1468

carlos-alm commented Jun 13, 2026

Uh oh!

claude Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

carlos-alm Jun 13, 2026

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

carlos-alm Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	const end = def.endLine \|\| Infinity;
	const end = def.endLine ?? Infinity;

Conversation

carlos-alm commented Jun 13, 2026

Unified rule

Verification

Uh oh!

claude Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Root Cause Analysis

Phase 2: Configuration Change Detection

Phase 3: Test Integrity

Phase 4: Code Quality

Core logic — call-resolver.ts (TS)

Core logic — build_edges.rs (Rust)

Other changes

Phase 5: Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

github-actions Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

greptile-apps Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 13, 2026 •

edited

Loading

Core logic — `call-resolver.ts` (TS)

Core logic — `build_edges.rs` (Rust)

github-actions Bot commented Jun 13, 2026 •

edited

Loading

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading