fix(wasm): port missing node extractions to JS extractor (jelly-micro #1471) by carlos-alm · Pull Request #1509 · optave/ops-codegraph-tool

carlos-alm · 2026-06-13T02:52:55Z

Summary

The WASM JS extractor was missing 25 nodes that the native Rust extractor found in the jelly-micro fixture. Four root causes, each fixed independently:

Computed class method names (['property7'], ['generator10'], etc.): the query patterns in parser.ts and wasm-worker-entry.ts only matched property_identifier and private_property_identifier as the name field of method_definition. Added a third pattern for computed_property_name.
Class expressions inside functions (PostMixin, B): wasm-worker-entry.ts had a single-string JS_CLASS_PATTERN matching only class_declaration. Renamed to JS_CLASS_PATTERNS array and added the (class name: …) pattern for class-expression nodes. Also added (class name: (type_identifier) …) to TS_EXTRA_PATTERNS for TS parity.
Array-destructured constants (const [x, y] = …): both extractDestructuredBindingsWalk (query path) and handleVariableDecl (walk path) only handled object_pattern. Added an array_pattern branch that emits a single constant node with the full array pattern as the name, matching native behaviour.
Parameters of prototype arrow/function methods (Arit.prototype.sum = (x, y) => …): emitPrototypeMethod emitted the definition without calling extractParameters. Added the call and wired children through.

parity-compare.mjs --langs jelly-micro now reports 0 node diffs (was 25). 22 pre-existing edge diffs remain and are tracked in #1506.

Test plan

node scripts/parity-compare.mjs --langs jelly-micro reports 0 node diffs
npx vitest run tests/parsers/javascript.test.ts — 133 tests pass (12 new)
npx vitest run — full suite passes (pre-existing erlang failure tracked in fix(erlang): -include_lib import extraction not returning source/names correctly #1508)

Closes #1471

…actual phases

… diff Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh (PostToolUse Bash): the pre-hook captures git status --porcelain to a per-worktree temp file before each Bash call; the post-hook diffs the before/after state and appends newly modified or created files to .claude/session-edits.log. This closes the gap where files written by sed -i, printf redirects, tee, heredocs, or build tools (Cargo.lock, lockfiles) were never recorded, causing guard-git.sh to emit false-positive BLOCKED errors. Closes #1457

- clojure.rs: annotate lifetime-anchor assignment to silence false-positive - cfg.rs: remove never-called start_line_of method - complexity.rs: remove never-constructed NotHandled variant; convert irrefutable if-let patterns to plain let destructures - dataflow.rs: remove never-read callee fields from CallReturn/Destructured - incremental.rs: remove never-read lang field from CacheEntry cargo check and cargo clippy both clean after these changes.

…x struct

…e engine

…remental tiers

Adds .github/workflows/perf-canary.yml — a path-filtered workflow that fires on PRs touching src/extractors/, src/domain/graph/, or crates/** and runs only the incremental-benchmark suite (full build + no-op + 1-file rebuild, both engines). Catches the class of regressions that accumulated invisibly across the Phase 8.x PRs and were only detected at v3.12.0 publish time. The regression guard gains BENCH_CANARY=1 mode: raises thresholds to 50%/100%/150% (standard/noisy/WASM) and skips the build, query, and resolution suites — only incremental checks run. This absorbs shared- runner timing variance while still blocking catastrophic regressions (+98% full build, +1827% 1-file rebuild from v3.12.0). Closes #1433

…lysis visitors

On incremental builds, runPostNativeCha previously scanned all call→qualified-method edges in the DB (~12ms flat, O(graph size)), even for 1-file changes where no hierarchy or RTA evidence changed. Add two cheap indexed gate queries. Gate A checks whether any changed file introduced a class/interface/trait/struct/record node (hierarchy may have new implementors reachable from unchanged call sites). Gate B checks whether any changed file added a call edge to a class-kind target (RTA set may have grown, enabling previously filtered expansions in unchanged callers). If neither gate fires, restrict the candidate query to src.file IN changedFiles — safe because the hierarchy and instantiated set are unchanged for all other files. Full builds (isFullBuild=true) and cases where either gate fires retain the existing full-scan behaviour. Mirrors the changed-files scoping pattern of runPostNativeThisDispatch. Closes #1441

Times each JS post-pass in tryNativeOrchestrator and exposes the measurements in BuildResult.phases: - gapDetectMs — dropped-language gap detection + backfill - chaMs — CHA expansion (interface dispatch) - thisDispatchMs — this/super dispatch WASM re-parse (was already tracked but now properly named alongside the rest) - reclassifyMs — scoped role re-classification after edge insertion - techniqueBackfillMs — technique-column UPDATE on native-written edges Previously only thisDispatchMs was reported, causing wall-clock vs phaseSum to diverge by 1.1s+ on 1-file rebuilds and making benchmark regressions undiagnosable from committed history. Updates update-incremental-report.ts to render the new phases in a collapsible details block under each engine's 1-file rebuild section. Closes #1434

…ld for required-tier grammars The docstring claimed pool cost was "amortised over enough parse work" — measurements show IPC overhead scales linearly (~55–64ms/file pool vs ~8–10ms/file inline). The real motivation is crash safety for exotic WASM grammars (#965); JS/TS/TSX (required-tier, used in all this-dispatch backfill calls) have never triggered the V8 fatal crash class and are safe to run inline. Raise threshold 16 → 32 to keep typical this-dispatch batches (≤ 18 files on the codegraph corpus) on the inline fast path. Exotic-language drops are almost always well under 32 files and also benefit from the inline path without meaningful crash risk increase. Closes #1435

…e incremental rebuilds On 1-file native incremental builds, two JS post-passes ran unconditionally even when they had no work to do: - `backfillNativeDroppedFiles`: called whenever changedCount > 0, even when detectDroppedLanguageGap returned an empty gap. Gate now checks gap.missingAbs.length > 0 || gap.staleRel.length > 0 directly, matching backfillNativeDroppedFiles's own internal early-exit guard. - Node/edge COUNT(*) re-count: ran unconditionally after all post-passes even when none of them wrote any edges. COUNT(*) over 50K+ edge tables is non-trivial, especially via the NativeDbProxy napi-rs round-trip. Now gated on postPassWroteData (backfill | CHA edges | this-dispatch edges). Closes #1454

The post-pass it timed (runPostNativePrototypeMethods) was deleted in b5c03a2 when func-prop extraction moved to Rust (#1432). The optional field was never set by any code path that survived the deletion. Also remove the stale reference to "prototype-methods post-pass" from the parseFilesWasmForBackfill docstring — only the this-dispatch post-pass uses symbolsOnly now. Closes #1432

… collision Field type annotations (`private repo: OrderRepository`) were seeded as bare file-wide typeMap keys, causing `this.repo` inside `UserService` to resolve to `OrderRepository` when both classes had a `repo` field (issue #1458). Both extractors (TS `handleFieldDefTypeMap` and Rust `field_definition` branch) now seed `ClassName.field` keys at confidence 0.9, matching the `CallerClass.X` resolver fallback added in PR #1382. Bare keys are kept at confidence 0.6 as fallbacks for single-class files or class expressions where no enclosing class name is available. Both engines change identically — parity preserved.

…ed names The resolution benchmark uses WASM-built graphs where the Elixir, Julia, and Objective-C extractors emit module-qualified symbol names (Main.run, App.main, UserService.create_user, etc.). The expected-edges manifests were written with bare unqualified names (run, main, create_user), so every correctly-resolved edge appeared as a false positive and every expected edge appeared as a false negative — causing all three languages to show 0% precision even though resolution was working correctly. Root cause: starting in v3.12.0, cross-module call resolution began working for these languages (via the improved receiver-dispatch and same-class fallback in resolveByMethodOrGlobal / build-edges.ts). With 0 edges previously resolved, the name mismatch was invisible; once edges started resolving, the manifests showed 17 FP (elixir), 11 FP (julia), 6 FP (objc) — all correctly resolved edges misidentified as false positives. Fix: - Update all three expected-edges.json manifests to use the module-qualified names matching actual extractor output: elixir: Main.run, UserService.create_user, Validators.validate_user, etc. julia: App.main, Service.create_user, Repository.new_repo, etc. objc: full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.) plus add main -> run (plain C call correctly resolved) - Ratchet THRESHOLDS for all three: elixir: precision 0.0 -> 1.0, recall 0.0 -> 0.8 (17/21 resolved) julia: precision 0.0 -> 1.0, recall 0.0 -> 0.7 (11/15 resolved) objc: precision 0.0 -> 1.0, recall 0.0 -> 0.4 (6/13 resolved) Remaining FNs are genuine unresolved edges (same-file bare calls in elixir/julia, receiver-typed message sends in objc) — not regressions. Closes #1447

The JS C++ and CUDA extractors had no handler for 'declaration' AST nodes, so typeMap was never seeded for statically-typed locals (e.g. 'UserService svc;'). Without a typeMap entry for 'svc', resolveReceiverEdge had nothing to look up and silently skipped the receiver edge. Add handleCppDeclaration / handleCudaDeclaration to both extractors. They mirror match_c_family_type_map ('declaration' branch) from the native Rust path: extract the type node text and seed typeMap[varName] = { type, confidence: 0.9 } for each identifier or init_declarator child. Primitive types (int, char, bool, …) are skipped to avoid spurious edges. parity-compare.mjs --langs cpp,cuda --hybrid: PARITY OK (wasm = native = hybrid) All 3044 tests pass.

… in Rust solver Go extractor was only seeding typeMap for var_spec and parameter_declaration, missing short_var_declaration. Added infer_short_var_types to handle: - x := Struct{} → conf 1.0 (composite literal) - x := &Struct{} → conf 1.0 (address-of composite) - x := NewFoo() / x := pkg.NewFoo() → conf 0.7 (New* factory prefix) Python extractor was only seeding typeMap for typed_parameter and typed_default_parameter, missing plain assignment. Added infer_py_assignment_type to handle: - order = Order(...) → conf 1.0 (uppercase constructor) - obj = Module.Class(...) → conf 0.7 (uppercase module prefix, non-builtin) Both mirror the existing JS extractors exactly. Parity check for go and python: wasm vs native/hybrid OK.

…l, zig) Both engines used different rules for attributing calls inside variable bindings: WASM: attributed to the narrowest enclosing span regardless of kind, so local variable declarations inside fn main() shadowed the enclosing function (Zig: calls attributed to repo/svc variables instead of main), and nested let-bindings inside a Haskell do-block shadowed the top-level main binding. Native: loaded allNodes from a query that excluded 'variable' kind, so top-level Haskell bind nodes (main = do …, kind='variable') never matched in defs_with_ids, causing all calls to fall back to the file node. Unified rule implemented in findCaller (TS) and find_enclosing_caller (Rust): - Function/method definitions are preferred over any variable/constant binding as the enclosing caller scope — local var declarations inside a function body never shadow the enclosing function (fixes Zig repo/svc attribution). - When no function/method encloses the call, fall back to the WIDEST (outermost) variable/constant binding — this handles Haskell where main is a top-level bind node with kind 'variable'. Widest span is used so that nested let-bindings do not shadow the outer main binding. - File node remains the absolute last resort. Also adds 'variable' to NODE_KIND_FILTER_SQL (JS) and EDGE_NODE_KIND_FILTER (Rust pipeline.rs) so top-level variable bindings are included in the allNodes set available for caller matching. parity-compare.mjs --langs haskell,zig --hybrid: PARITY OK — 2/2 fixtures.

…and test Remove unused TypeMapEntry import from cpp.ts and cuda.ts, reformat primitive-type Set literals and test expect() calls to satisfy biome line-length rules.

Java was the only fixture where all three build paths (wasm, native, hybrid) disagreed pairwise. Bug 1 — WASM typeMap pollution: `handleJavaLocalVarDecl` used last-wins Map.set(), so the local `InMemoryUserRepository repo` in the static `createDefault()` method silently overrode the constructor parameter `UserRepository repo`. This caused WASM to bypass the interface and resolve directly to the concrete class, producing no interface edge and the wrong receiver. Fix: switch to first-wins `setTypeMapEntry` to match Rust extractor semantics. First-wins preserves the interface annotation that drives correct CHA dispatch. Bug 2 — native vs wasm/hybrid confidence mismatch: `runPostNativeCha` (native orchestrator path) used `computeConfidence − CHA_DISPATCH_PENALTY = 0.7 − 0.1 = 0.6`, while `runChaPostPass` (DB post-pass used by wasm and hybrid) hardcodes 0.8. Fix: align `runPostNativeCha` to also use 0.8. Result: all three build paths now emit identical edges and confidences. `parity-compare.mjs --langs java --hybrid` passes. Updated expected-edges.json to include both the interface declaration edge (TypeRepository.X at 0.7) and the CHA-expanded impl edge (InMemoryUserRepository.X at 0.8), which are the correct semantics for an interface-typed receiver. Closes #1469

The inline CHA expansion in buildCallEdges and buildChaPostPass used computeConfidence(relPath, t.file) - CHA_DISPATCH_PENALTY for all CHA targets, producing 0.6 for cross-directory interface dispatch (same-dir = 0.7, minus 0.1 penalty). runChaPostPass (helpers.ts) and runPostNativeCha (native-orchestrator.ts) both hardcode 0.8 for interface/CHA-dispatch edges. The deduplication in runChaPostPass uses the existing DB edge as-is and skips reinsertion, so the 0.6 edges from the inline pass were never upgraded to 0.8. Fix: typed-receiver (interface) dispatch branches now use hardcoded 0.8 matching the post-pass constants. The this/super branch keeps computeConfidence-based proximity scoring to remain aligned with runPostNativeThisDispatch. parity-compare.mjs --langs typescript --hybrid goes green (was 12 edge diffs). Closes #1470 docs check acknowledged

…s unbounded Replace `|| Infinity` with `?? Infinity` in findCaller so that a definition with endLine=0 (a valid single-line node at the start of a file) is not incorrectly treated as having unbounded span.

claude · 2026-06-13T02:53:06Z

Claude finished @carlos-alm's task —— View job

PR Review: fix(wasm): port missing node extractions to JS extractor (#1471)

Gather context and understand the PR
Phase 1: Root cause analysis
Phase 2: Configuration change detection
Phase 3: Test integrity review
Phase 4: Code quality review
Phase 5: Backlog compliance grading
Final assessment

Root Cause Analysis

Problem Being Solved: The WASM JS extractor was missing 25 nodes that the native Rust extractor correctly extracted from the jelly-micro fixture. The PR traces four distinct root causes and fixes each independently.

Why This Approach: Correct. The 25-node gap between engines is a real parity bug, not a cosmetic difference. Fixing extraction logic is the right response rather than documenting a gap. Each fix is surgical and independently justified.

Risk Assessment: Low. All four changes add new cases to existing switch/if chains — they don't alter the behavior of existing paths. New tests validate the specific patterns affected.

Phase 2: Configuration Change Detection

.claude/settings.json — adds two new PostToolUse hooks (track-bash-writes.sh and track-moves.sh) plus two PreToolUse hooks (snapshot-pre-bash.sh and guard-pr-body.sh). These are Claude Code quality/guard hooks. No linting rules relaxed, no TypeScript strict flags changed. ✓

.github/workflows/perf-canary.yml — new lightweight per-PR performance canary that runs only on extractor/graph/crates changes. Uses a 50% threshold (explicitly documented) — this is a separate gate from the 25% full-suite gate, not a relaxation. The BENCH_CANARY codepath in the regression guard was already there before this PR. ✓

tests/benchmarks/regression-guard.test.ts — adds NOISY_METRICS logic (No-op rebuild, 1-file rebuild, fnDeps depth 1) with a widened 50% threshold, justified by documented sub-30ms baseline volatility. The reasoning is consistent with the detailed comment (±10ms CI jitter = ±50%+ on sub-30ms metrics). This is not silencing a real regression — it's acknowledging measured CI variance on metrics where MIN_ABSOLUTE_DELTA already provides a floor. ✓ (Acceptable, though see concern below.)

No lint rules relaxed. No test timeouts raised. No coverage thresholds lowered. Clean pass.

Phase 3: Test Integrity

New tests in tests/parsers/javascript.test.ts cover all four root causes with positive and negative cases:

Computed method names (Engine parity: JS extractor misses 25 nodes the native extractor finds (jelly-micro) #1471): 5 new describe('computed method name extraction') tests covering getter, setter with param, regular method with param, generator, and async. ✓
Class expression inside function (Engine parity: JS extractor misses 25 nodes the native extractor finds (jelly-micro) #1471): 3 tests — class extraction, extends relationship, field property children. ✓
Array destructuring (Engine parity: JS extractor misses 25 nodes the native extractor finds (jelly-micro) #1471): 2 tests — positive (const [x, y]) and negative (let/var not extracted). ✓
Prototype method parameters (Engine parity: JS extractor misses 25 nodes the native extractor finds (jelly-micro) #1471): 2 tests for arrow and function-expression params. ✓

All 12 new tests assert correct behavior and include meaningful negative cases. No existing assertions weakened or removed. ✓

One concern: The regression-guard.test.ts change — widening 1-file rebuild and No-op rebuild thresholds from 25% to 50% and CANARY mode to 100%. The justification is empirical CI variance, but this PR does not include CI data showing the variance. The reasoning is plausible given how sub-30ms metrics behave, and the MIN_ABSOLUTE_DELTA floor mitigates the risk, but it weakens the guard without attached evidence.

Phase 4: Code Quality

Fix 1: Computed class method names

parser.ts:161 and wasm-worker-entry.ts:118: Added '(method_definition name: (computed_property_name) @meth_name) @meth_node' to COMMON_QUERY_PATTERNS. Both files are kept in sync (confirmed manually). Pattern is correct — computed_property_name is the tree-sitter node type for ['key']. ✓

Fix 2: Class expressions inside functions

parser.ts:173–177 and wasm-worker-entry.ts:129–133: JS_CLASS_PATTERNS expanded from a single class_declaration string to an array with a second (class name: ...) pattern for class expressions. TS_EXTRA_PATTERNS gets the type_identifier equivalent. Both files match. ✓

The walk path in javascript.ts:735 already handled 'class' node type — so this fix correctly brings the query path into parity with the walk path.

Fix 3: Array-destructured constants

javascript.ts:510–519 (query path extractDestructuredBindingsWalk) and javascript.ts:1029–1038 (walk path handleVariableDecl): Both paths get an array_pattern branch that emits a single constant node with the full pattern text (e.g., [x, y]). The approach — emit one node for the whole pattern rather than individual bindings — matches native engine behavior as stated. ✓

Minor observation: The array_pattern branch does not call extractDestructuredBindings (which would be wrong for array patterns). The design choice to emit the whole pattern string as the name is unusual but documented as matching native. This is acceptable, though worth flagging: if the native engine ever changes this to emit individual bindings, parity will break silently again.

Fix 4: Prototype method parameters

javascript.ts:3380–3388 (emitPrototypeMethod): extractParameters(rhs) is now called and the result is wired into the children field. This mirrors exactly what handleFunctionDecl and handleVarFnCapture do. ✓

One correctness concern: emitPrototypeMethod uses nodeStartLine(rhs) as the definition's line, not nodeStartLine(lhs) (the full assignment statement). For Arit.prototype.sum = (x, y) => x + y; on a single line this is equivalent, but for multi-line assignments like:

Arit.prototype.sum =
  (x, y) => x + y;

the definition line would be 2 (the arrow) rather than 1 (the assignment). This was pre-existing behavior, not introduced by this PR, but the parameter change makes it more visible.

Other changed files

src/domain/graph/builder/call-resolver.ts, native-orchestrator.ts, build-edges.ts: These are larger changes not directly related to the four stated fixes — they appear to be independent improvements bundled in this PR. Mixed concerns in one PR is noted as a mild scope concern (see below).
crates/codegraph-core/src/extractors/go.rs (+191 lines), python.rs (+154 lines): These are substantial native extractor additions also bundled here.
crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs (+119 lines).

Scope concern: The PR description accurately describes the four JS/WASM fixes, but the diff is significantly larger — it includes Go extractor additions, Python extractor additions, build-edges changes, benchmark infrastructure, and hook additions that are unrelated to the stated problem. Per the project's CLAUDE.md "One PR = one concern" policy, these unrelated additions should have been separate PRs.

Phase 5: Backlog Compliance

Zero-dep: ✓ — No new runtime dependencies. package.json dependencies section unchanged (3 existing deps). Hooks and CI use existing toolchain.

Foundation-aligned: ✓ — Fully local extraction fix. No cloud calls, no new infrastructure, no code generation. The parity fix directly improves symbol accuracy.

Problem-fit: 5/5 — Accurate node extraction is the foundational requirement for everything else: blast-radius analysis, call resolution, AI navigation. Missing 25 nodes is a direct hallucination source for AI agents using the WASM path. This fix eliminates false negatives.

Breaking: No — purely additive. Existing symbols continue to be emitted; new symbols are now also emitted where they were previously missing.

Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

PR scope violation (moderate): The diff contains significant changes beyond the four stated JS extractor fixes — Go extractor additions (+191 lines), Python extractor additions (+154 lines), Rust build_edges changes (+119 lines), benchmark infrastructure, and hook additions. CLAUDE.md explicitly requires "One PR = one concern." This makes the PR harder to review safely and harder to revert if any single component causes a regression.
Regression guard threshold widening without CI data (minor): regression-guard.test.ts widens 1-file rebuild/No-op rebuild from 25%→50% threshold with reasoning that is plausible but unverified in this PR. Recommend attaching a CI run showing the variance that motivated the change.
Computed method name representation (informational): Methods like ['property7'] are stored with the brackets as part of the name. There is no test asserting that the resolver can actually find these methods when called. This is a pre-existing design choice, not introduced here, but it may cause resolution failures on computed property access that are worth a follow-up issue.

Final Recommendation

## Root Cause Analysis
Problem Being Solved: WASM JS extractor missing 25 nodes vs native engine on jelly-micro fixture.
Why This Approach: Direct extraction fixes for four root causes — correct approach per codebase philosophy.
Risk Assessment: Low for the four JS extractor fixes. Moderate for bundled unrelated changes.

## Backlog Compliance
- Zero-dep: ✓
- Foundation-aligned: ✓
- Problem-fit: 5 — directly fixes false-negative node extraction that causes AI hallucination
- Breaking: No
- Tier: 1

## Critical Concerns
1. PR scope violation — Go, Python, Rust, and benchmark changes bundled with JS extractor fix
2. Regression guard threshold widening lacks CI variance data to justify change

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: APPROVE WITH CONDITIONS
- Reasoning: The four JS extractor fixes are correct, well-tested, and properly scoped. The
  new tests are high quality with positive/negative cases. The PR scope is larger than stated
  but the unrelated additions appear safe. The one concrete ask before merge: confirm the
  regression-guard threshold change is backed by empirical CI data (a comment or linked run
  showing the variance on the affected metrics would suffice).

github-actions · 2026-06-13T03:00:10Z

Codegraph Impact Analysis

53 functions changed → 67 callers affected across 20 files

emit_pts_alias_edges in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:381 (16 transitive callers)
process_file in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:441 (15 transitive callers)
is_callable_kind in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:662 (3 transitive callers)
is_top_level_binding_kind in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:666 (3 transitive callers)
find_enclosing_caller in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:681 (16 transitive callers)
match_go_type_map in crates/codegraph-core/src/extractors/go.rs:315 (0 transitive callers)
infer_short_var_types in crates/codegraph-core/src/extractors/go.rs:331 (1 transitive callers)
infer_single_short_var in crates/codegraph-core/src/extractors/go.rs:364 (2 transitive callers)
infer_composite_literal in crates/codegraph-core/src/extractors/go.rs:376 (3 transitive callers)
infer_address_of_composite in crates/codegraph-core/src/extractors/go.rs:394 (3 transitive callers)
infer_factory_call in crates/codegraph-core/src/extractors/go.rs:415 (3 transitive callers)
infers_factory_call_new_prefix in crates/codegraph-core/src/extractors/go.rs:549 (0 transitive callers)
infers_pkg_factory_call in crates/codegraph-core/src/extractors/go.rs:562 (0 transitive callers)
infers_composite_literal in crates/codegraph-core/src/extractors/go.rs:573 (0 transitive callers)
infers_address_of_composite in crates/codegraph-core/src/extractors/go.rs:585 (0 transitive callers)
non_new_prefix_not_inferred in crates/codegraph-core/src/extractors/go.rs:596 (0 transitive callers)
match_js_type_map in crates/codegraph-core/src/extractors/javascript.rs:104 (0 transitive callers)
is_python_builtin in crates/codegraph-core/src/extractors/python.rs:323 (2 transitive callers)
match_python_type_map in crates/codegraph-core/src/extractors/python.rs:367 (0 transitive callers)
infer_py_assignment_type in crates/codegraph-core/src/extractors/python.rs:418 (1 transitive callers)

greptile-apps · 2026-06-13T03:00:28Z

Greptile Summary

This PR ports four missing node-extraction fixes to the WASM JS extractor to close a 25-node parity gap with the native Rust extractor on the jelly-micro fixture. It also includes a broad set of companion improvements: Go/Python short-var type inference, Rust/TS two-pass findCaller strategy for Haskell-style top-level bindings, CHA confidence normalisation, incremental scoping for the CHA post-pass, class-scoped typeMap keys to prevent cross-class field collision, and a new per-PR perf-canary workflow.

WASM parity fixes: adds computed_property_name to the method query pattern, upgrades JS_CLASS_PATTERN to a JS_CLASS_PATTERNS array that also captures class expressions, adds an array_pattern branch for const [x, y] = … (in both query and walk paths), and wires extractParameters into emitPrototypeMethod to capture prototype arrow/function params.
findCaller two-pass strategy (TS + Rust): function/method scope is found with narrowest-span logic; variable/constant bindings use widest-span as a lower-priority fallback, so Haskell main = do … is correctly attributed without being shadowed by inner let bindings.
Infrastructure: EDGE_NODE_KIND_FILTER gains variable, CHA dispatch confidence hardcoded to 0.8 (consistent across all three paths), runPostNativeCha adds gate-based incremental scoping, benchmark warm-up runs and phase-timing breakdown, and handleFieldDefTypeMap emits a class-scoped primary key to fix bug: class-field annotation typeMap keys are not class-scoped (cross-class collision) #1458.

Confidence Score: 5/5

All changes are well-scoped, thoroughly tested (133 tests, 12 new), and backed by a parity-compare script showing 0 node diffs. No regressions are expected.

Every fix is independently verified by new unit tests and the end-to-end parity comparison. The four WASM extraction fixes are narrow tree-sitter query and AST-walk additions. The supporting changes (two-pass findCaller, CHA confidence normalisation, incremental CHA scoping) mirror well-understood Rust counterparts and are gated correctly. The previously flagged cpp.ts/cuda.ts first-wins issue was already addressed before this review.

No files require special attention — the test suite and parity script together give high confidence in the correctness of the extraction and edge-building changes.

Important Files Changed

Filename	Overview
src/domain/parser.ts	Adds computed_property_name to the method query pattern and mirrors wasm-worker-entry.ts by converting JS_CLASS_PATTERN to a JS_CLASS_PATTERNS array that also captures class expressions; both files are now in sync.
src/domain/wasm-worker-entry.ts	Same query-pattern changes as parser.ts — computed method names and class expression patterns; TS_EXTRA_PATTERNS gains the class-expression variant; JS/TS paths stay in sync.
src/extractors/javascript.ts	Adds array_pattern branch in extractDestructuredBindingsWalk (FUNCTION_SCOPE_TYPES guard already prevents function-local emission), adds array_pattern branch in handleVariableDecl, wires extractParameters into emitPrototypeMethod, and introduces class-scoped typeMap keys for field definitions.
src/domain/graph/builder/call-resolver.ts	Two-pass findCaller: function/method uses narrowest span; variable/constant uses widest span as fallback for Haskell-style top-level bindings. Mirrors the Rust implementation cleanly.
crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs	Mirrors the TS two-pass findCaller in Rust: adds kind field to DefWithId, is_callable_kind/is_top_level_binding_kind predicates, and widest-span selection for variable/constant; PtsAliasCtx struct eliminates clippy::too_many_arguments.
src/domain/graph/builder/stages/native-orchestrator.ts	runPostNativeCha gains changedFiles param and gate-based incremental scoping; CHA confidence hardcoded to 0.8; per-phase timing added; COUNT(*) fast-path skipped when no post-pass wrote data.
src/domain/graph/builder/stages/build-edges.ts	Typed-receiver CHA dispatch uses hardcoded 0.8 confidence; this/super dispatch retains proximity-based scoring; target sorting before edge emission matches native sort_targets_by_confidence; NODE_KIND_FILTER_SQL adds variable.
crates/codegraph-core/src/extractors/go.rs	Adds short_var_declaration type inference (composite literals, &composite, New* factory calls) mirroring the JS inferShortVarType chain; well-covered by five new unit tests.
src/extractors/java.ts	handleJavaLocalVarDecl switched from Map.set (last-wins) to setTypeMapEntry (first-wins) to match Rust extractor semantics.
.github/workflows/perf-canary.yml	New lightweight per-PR regression gate (50% threshold, incremental suite only) that triggers only when extractor/graph/Rust files change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[JS/TS File] --> B[Tree-sitter Query]
    B --> |method_definition name: computed_property_name| C[Computed Method Node]
    B --> |class_declaration name: identifier| D[Class Declaration Node]
    B --> |class name: identifier| E[Class Expression Node NEW]
    B --> |lexical_declaration const| F{Name field type?}
    F --> |object_pattern| G[extractDestructuredBindings each property to constant]
    F --> |array_pattern NEW| H[emit array pattern text as single constant]
    B --> |assignment_expression .prototype.| I[emitPrototypeMethod]
    I --> |rhs = arrow_function / function_expression| J[extractParameters NEW children on method def]
    K[findCaller] --> L{Pass 1: enclosing function/method?}
    L --> |yes narrowest span| M[Use fn/method as caller]
    L --> |no| N{Pass 2: enclosing variable/constant?}
    N --> |yes WIDEST span NEW| O[Use top-level binding as caller e.g. Haskell main]
    N --> |no| P[File scope]
    Q[runPostNativeCha] --> R{changedFiles != null?}
    R --> |yes| S{Gate A: new class hierarchy?}
    S --> |yes| T[Full scan]
    S --> |no| U{Gate B: new RTA evidence?}
    U --> |yes| T
    U --> |no NEW| V[Scoped scan: changed files only]
    R --> |null full build| T

_{Reviews (6): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile}

greptile-apps · 2026-06-13T03:00:32Z

+    const varName = unwrapCppDeclaratorName(nameNode);
+    if (varName) {
+      ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });
+    }


Unlike handleJavaLocalVarDecl which was explicitly changed to setTypeMapEntry (first-wins) in this same PR to match Rust extractor semantics, handleCppDeclaration uses ctx.typeMap.set directly (last-wins). If the Rust match_c_family_type_map uses first-wins (as the Java path does), a file with a constructor parameter typed IUserService svc followed by a local UserServiceImpl svc would resolve to the concrete type, potentially misrouting CHA dispatch that should fan out through the interface hierarchy. setTypeMapEntry imported just above would be the consistent choice here.

Suggested change

const varName = unwrapCppDeclaratorName(nameNode);

if (varName) {

ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });

}

const varName = unwrapCppDeclaratorName(nameNode);

if (varName) {

setTypeMapEntry(ctx.typeMap, varName, typeName, 0.9);

}

Fixed — replaced ctx.typeMap.set with setTypeMapEntry (first-wins) in handleCppDeclaration and added setTypeMapEntry to the helpers import. Consistent with the Java handler change and the Rust extractor semantics.

greptile-apps · 2026-06-13T03:00:33Z

+    const varName = extractCudaFieldName(nameNode);
+    if (varName) {
+      ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });
+    }


Same issue as handleCppDeclaration — uses ctx.typeMap.set (last-wins) rather than setTypeMapEntry (first-wins), inconsistent with the Java handler change in this PR. The same concern applies: a later concrete-type declaration would overwrite an earlier interface-typed parameter for the same name, potentially misrouting CHA dispatch. setTypeMapEntry should be imported and used here to match the intended first-wins semantics.

Suggested change

const varName = extractCudaFieldName(nameNode);

if (varName) {

ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });

}

const varName = extractCudaFieldName(nameNode);

if (varName) {

setTypeMapEntry(ctx.typeMap, varName, typeName, 0.9);

}

Fixed — replaced ctx.typeMap.set with setTypeMapEntry (first-wins) in handleCudaDeclaration and added setTypeMapEntry to the helpers import. Same fix as the C++ handler, consistent with Java and Rust extractor semantics.

Closes #1471 Four extraction gaps in the WASM JS extractor caused 25 node diffs against the native engine in the jelly-micro parity fixture: 1. Computed property method names (e.g. `['property7']`): the query patterns in both parser.ts and wasm-worker-entry.ts only matched `property_identifier` and `private_property_identifier` as the `name` field of `method_definition`; add a third pattern for `computed_property_name`. 2. Class expressions inside functions (`return class PostMixin …`): wasm-worker-entry.ts JS_CLASS_PATTERN was a single string matching only `class_declaration`; rename to JS_CLASS_PATTERNS array and add the `(class name: …)` pattern for anonymous class-expression nodes. Also add the matching `(class name: (type_identifier) …)` pattern to TS_EXTRA_PATTERNS so TS class expressions are covered too. 3. Array destructuring constants (`const [x, y] = …`): both `extractDestructuredBindingsWalk` (query path) and `handleVariableDecl` (walk path) only handled `object_pattern`; add an `array_pattern` branch that emits a single `constant` node whose name is the full array pattern text — matching native behaviour. 4. Parameters of prototype arrow/function methods (`Arit.prototype.sum = (x, y) => …`): `emitPrototypeMethod` emitted the definition node without calling `extractParameters`, so children were always absent; add the call and wire the result through. After this fix `parity-compare.mjs --langs jelly-micro` reports 0 node diffs (was 25). 22 pre-existing edge diffs remain (tracked in #1506) — those are out of scope.

…ion handlers

carlos-alm · 2026-06-13T03:21:04Z

@greptileai

greptile-apps · 2026-06-13T03:29:20Z

+      const chunkRows = db
+        .prepare(
+          `SELECT e.source_id, tgt.name AS method_name, src.file AS caller_file
+           FROM edges e
+           JOIN nodes tgt ON e.target_id = tgt.id
+           JOIN nodes src ON e.source_id = src.id
+           WHERE e.kind = 'calls' AND tgt.kind = 'method'
+           AND INSTR(tgt.name, '.') > 0
+           AND src.file IN (${ph})`,
+        )
+        .all(...chunk) as Array<{
+        source_id: number;
+        method_name: string;
+        caller_file: string | null;
+      }>;


The scoped-file scan loop (used when scopeToChangedFiles is true) doesn't populate caller_file for affected-files tracking because the column is never read from the result rows. The full-scan path returns caller_file (used downstream in affectedFiles.add(caller_file)), so the scoped path silently produces an empty affectedFiles set even when edges are emitted. Downstream role re-classification is then skipped entirely for those edge-endpoints instead of being scoped to just the affected files.

Suggested change

const chunkRows = db

.prepare(

`SELECT e.source_id, tgt.name AS method_name, src.file AS caller_file

FROM edges e

JOIN nodes tgt ON e.target_id = tgt.id

JOIN nodes src ON e.source_id = src.id

WHERE e.kind = 'calls' AND tgt.kind = 'method'

AND INSTR(tgt.name, '.') > 0

AND src.file IN (${ph})`,

)

.all(...chunk) as Array<{

source_id: number;

method_name: string;

caller_file: string | null;

}>;

const chunkRows = db

.prepare(

`SELECT e.source_id, tgt.name AS method_name, src.file AS caller_file

FROM edges e

JOIN nodes tgt ON e.target_id = tgt.id

JOIN nodes src ON e.source_id = src.id

WHERE e.kind = 'calls' AND tgt.kind = 'method'

AND INSTR(tgt.name, '.') > 0

AND src.file IN (${ph})`,

)

.all(...chunk) as Array<{

source_id: number;

method_name: string;

caller_file: string | null;

}>;

rows.push(...chunkRows);

The scoped-file path already selects src.file AS caller_file and includes it in the typed result — see lines 572–584 of the current code. The comment at line 559 was added precisely to document this. affectedFiles.add(caller_file) at line 662 therefore fires for both the scoped and full-scan paths. No change needed.

You're right. The diff already shows src.file AS caller_file in the SELECT (line 579) and caller_file: string | null in the typed result (line 584). My comment was incorrect — the column is present in both the query and the type annotation, so affectedFiles.add(caller_file) at line 662 will work correctly for the scoped path too. Apologies for the noise.

carlos-alm · 2026-06-13T06:15:23Z

Addressed the two out-of-scope concerns raised in the review:

Regression guard threshold widening: Tracked in test(bench): document CI variance data for regression guard threshold widening (jelly-micro PR #1509) #1516 — to attach CI variance data that motivated the threshold change.
Computed method name resolution: Tracked in fix(wasm): verify computed-property-name method resolution at call sites #1517 — to verify call-site resolution works for computed property names and align representation if needed.

The scope concern (Go/Python/Rust changes bundled) is acknowledged; it's a past decision in this PR that cannot be reversed now without significant churn. Going forward these will be split per the one-PR-one-concern policy.

…s-1471

carlos-alm · 2026-06-13T06:34:25Z

@greptileai

carlos-alm added 24 commits June 12, 2026 18:31

chore: gitignore napi-generated artifacts in crates/codegraph-core

3892e7d

chore(tests): remove unused biome suppression in visitor.test.ts

ef8ea4f

fix(titan-run): sync --start-from enum and phase-timestamp list with …

a372b82

…actual phases

refactor(native): extract emit_pts_alias_edges params into PtsAliasCt…

184d221

…x struct

fix(wasm): sort call targets by confidence before emit to match nativ…

909e1df

…e engine

fix(bench): add 2 warmup runs and raise INCREMENTAL_RUNS to 5 for inc…

66fc899

…remental tiers

fix(perf): plumb symbolsOnly through parseFilesWasmInline to skip ana…

d07b358

…lysis visitors

chore(lint): fix unused import and formatting in cpp/cuda extractors …

1915e5e

…and test Remove unused TypeMapEntry import from cpp.ts and cuda.ts, reformat primitive-type Set literals and test expect() calls to satisfy biome line-length rules.

fix(caller): use nullish coalescing for endLine to avoid treating 0 a…

d87c1ee

…s unbounded Replace `|| Infinity` with `?? Infinity` in findCaller so that a definition with endLine=0 (a valid single-line node at the start of a file) is not incorrectly treated as having unbounded span.

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

carlos-alm force-pushed the fix/js-extractor-nodes-1471 branch from 37bb91a to 7509e4f Compare June 13, 2026 03:19

fix(extractor): use setTypeMapEntry (first-wins) in C++/CUDA declarat…

ca77237

…ion handlers

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

fix: resolve merge conflicts with main

0f5d063

This was referenced Jun 13, 2026

test(bench): document CI variance data for regression guard threshold widening (jelly-micro PR #1509) #1516

Open

fix(wasm): verify computed-property-name method resolution at call sites #1517

Open

Merge remote-tracking branch 'origin/main' into fix/js-extractor-node…

a71d366

…s-1471

Conversation

carlos-alm commented Jun 13, 2026

Summary

Test plan

Uh oh!

claude Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(wasm): port missing node extractions to JS extractor (#1471)

Root Cause Analysis

Phase 2: Configuration Change Detection

Phase 3: Test Integrity

Phase 4: Code Quality

Fix 1: Computed class method names

Fix 2: Class expressions inside functions

Fix 3: Array-destructured constants

Fix 4: Prototype method parameters

Other changed files

Phase 5: Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

github-actions Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

greptile-apps Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 13, 2026 •

edited

Loading

github-actions Bot commented Jun 13, 2026 •

edited

Loading

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading