Skip to content

fix(wasm): port missing node extractions to JS extractor (jelly-micro #1471)#1509

Open
carlos-alm wants to merge 28 commits into
mainfrom
fix/js-extractor-nodes-1471
Open

fix(wasm): port missing node extractions to JS extractor (jelly-micro #1471)#1509
carlos-alm wants to merge 28 commits into
mainfrom
fix/js-extractor-nodes-1471

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

The WASM JS extractor was missing 25 nodes that the native Rust extractor found in the jelly-micro fixture. Four root causes, each fixed independently:

  • Computed class method names (['property7'], ['generator10'], etc.): the query patterns in parser.ts and wasm-worker-entry.ts only matched property_identifier and private_property_identifier as the name field of method_definition. Added a third pattern for computed_property_name.

  • Class expressions inside functions (PostMixin, B): wasm-worker-entry.ts had a single-string JS_CLASS_PATTERN matching only class_declaration. Renamed to JS_CLASS_PATTERNS array and added the (class name: …) pattern for class-expression nodes. Also added (class name: (type_identifier) …) to TS_EXTRA_PATTERNS for TS parity.

  • Array-destructured constants (const [x, y] = …): both extractDestructuredBindingsWalk (query path) and handleVariableDecl (walk path) only handled object_pattern. Added an array_pattern branch that emits a single constant node with the full array pattern as the name, matching native behaviour.

  • Parameters of prototype arrow/function methods (Arit.prototype.sum = (x, y) => …): emitPrototypeMethod emitted the definition without calling extractParameters. Added the call and wired children through.

parity-compare.mjs --langs jelly-micro now reports 0 node diffs (was 25). 22 pre-existing edge diffs remain and are tracked in #1506.

Test plan

Closes #1471

… diff

Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh
(PostToolUse Bash): the pre-hook captures git status --porcelain to a
per-worktree temp file before each Bash call; the post-hook diffs the
before/after state and appends newly modified or created files to
.claude/session-edits.log.

This closes the gap where files written by sed -i, printf redirects,
tee, heredocs, or build tools (Cargo.lock, lockfiles) were never
recorded, causing guard-git.sh to emit false-positive BLOCKED errors.

Closes #1457
- clojure.rs: annotate lifetime-anchor assignment to silence false-positive
- cfg.rs: remove never-called start_line_of method
- complexity.rs: remove never-constructed NotHandled variant; convert
  irrefutable if-let patterns to plain let destructures
- dataflow.rs: remove never-read callee fields from CallReturn/Destructured
- incremental.rs: remove never-read lang field from CacheEntry

cargo check and cargo clippy both clean after these changes.
Adds .github/workflows/perf-canary.yml — a path-filtered workflow that
fires on PRs touching src/extractors/, src/domain/graph/, or crates/**
and runs only the incremental-benchmark suite (full build + no-op +
1-file rebuild, both engines). Catches the class of regressions that
accumulated invisibly across the Phase 8.x PRs and were only detected
at v3.12.0 publish time.

The regression guard gains BENCH_CANARY=1 mode: raises thresholds to
50%/100%/150% (standard/noisy/WASM) and skips the build, query, and
resolution suites — only incremental checks run. This absorbs shared-
runner timing variance while still blocking catastrophic regressions
(+98% full build, +1827% 1-file rebuild from v3.12.0).

Closes #1433
On incremental builds, runPostNativeCha previously scanned all
call→qualified-method edges in the DB (~12ms flat, O(graph size)),
even for 1-file changes where no hierarchy or RTA evidence changed.

Add two cheap indexed gate queries. Gate A checks whether any changed
file introduced a class/interface/trait/struct/record node (hierarchy
may have new implementors reachable from unchanged call sites). Gate B
checks whether any changed file added a call edge to a class-kind target
(RTA set may have grown, enabling previously filtered expansions in
unchanged callers). If neither gate fires, restrict the candidate query
to src.file IN changedFiles — safe because the hierarchy and instantiated
set are unchanged for all other files.

Full builds (isFullBuild=true) and cases where either gate fires retain
the existing full-scan behaviour. Mirrors the changed-files scoping
pattern of runPostNativeThisDispatch.

Closes #1441
Times each JS post-pass in tryNativeOrchestrator and exposes the
measurements in BuildResult.phases:

- gapDetectMs  — dropped-language gap detection + backfill
- chaMs        — CHA expansion (interface dispatch)
- thisDispatchMs — this/super dispatch WASM re-parse (was already
                   tracked but now properly named alongside the rest)
- reclassifyMs — scoped role re-classification after edge insertion
- techniqueBackfillMs — technique-column UPDATE on native-written edges

Previously only thisDispatchMs was reported, causing wall-clock vs
phaseSum to diverge by 1.1s+ on 1-file rebuilds and making benchmark
regressions undiagnosable from committed history.

Updates update-incremental-report.ts to render the new phases in a
collapsible details block under each engine's 1-file rebuild section.

Closes #1434
…ld for required-tier grammars

The docstring claimed pool cost was "amortised over enough parse work" —
measurements show IPC overhead scales linearly (~55–64ms/file pool vs
~8–10ms/file inline). The real motivation is crash safety for exotic WASM
grammars (#965); JS/TS/TSX (required-tier, used in all this-dispatch
backfill calls) have never triggered the V8 fatal crash class and are safe
to run inline.

Raise threshold 16 → 32 to keep typical this-dispatch batches (≤ 18 files
on the codegraph corpus) on the inline fast path. Exotic-language drops are
almost always well under 32 files and also benefit from the inline path
without meaningful crash risk increase.

Closes #1435
…e incremental rebuilds

On 1-file native incremental builds, two JS post-passes ran unconditionally
even when they had no work to do:

- `backfillNativeDroppedFiles`: called whenever changedCount > 0, even when
  detectDroppedLanguageGap returned an empty gap. Gate now checks
  gap.missingAbs.length > 0 || gap.staleRel.length > 0 directly, matching
  backfillNativeDroppedFiles's own internal early-exit guard.

- Node/edge COUNT(*) re-count: ran unconditionally after all post-passes even
  when none of them wrote any edges. COUNT(*) over 50K+ edge tables is
  non-trivial, especially via the NativeDbProxy napi-rs round-trip. Now gated
  on postPassWroteData (backfill | CHA edges | this-dispatch edges).

Closes #1454
The post-pass it timed (runPostNativePrototypeMethods) was deleted in
b5c03a2 when func-prop extraction moved to Rust (#1432). The optional
field was never set by any code path that survived the deletion.

Also remove the stale reference to "prototype-methods post-pass" from
the parseFilesWasmForBackfill docstring — only the this-dispatch
post-pass uses symbolsOnly now.

Closes #1432
… collision

Field type annotations (`private repo: OrderRepository`) were seeded as bare
file-wide typeMap keys, causing `this.repo` inside `UserService` to resolve to
`OrderRepository` when both classes had a `repo` field (issue #1458).

Both extractors (TS `handleFieldDefTypeMap` and Rust `field_definition` branch)
now seed `ClassName.field` keys at confidence 0.9, matching the `CallerClass.X`
resolver fallback added in PR #1382. Bare keys are kept at confidence 0.6 as
fallbacks for single-class files or class expressions where no enclosing class
name is available.

Both engines change identically — parity preserved.
…ed names

The resolution benchmark uses WASM-built graphs where the Elixir, Julia,
and Objective-C extractors emit module-qualified symbol names (Main.run,
App.main, UserService.create_user, etc.). The expected-edges manifests
were written with bare unqualified names (run, main, create_user), so
every correctly-resolved edge appeared as a false positive and every
expected edge appeared as a false negative — causing all three languages
to show 0% precision even though resolution was working correctly.

Root cause: starting in v3.12.0, cross-module call resolution began working
for these languages (via the improved receiver-dispatch and same-class
fallback in resolveByMethodOrGlobal / build-edges.ts). With 0 edges
previously resolved, the name mismatch was invisible; once edges started
resolving, the manifests showed 17 FP (elixir), 11 FP (julia), 6 FP
(objc) — all correctly resolved edges misidentified as false positives.

Fix:
- Update all three expected-edges.json manifests to use the
  module-qualified names matching actual extractor output:
  elixir: Main.run, UserService.create_user, Validators.validate_user, etc.
  julia:  App.main, Service.create_user, Repository.new_repo, etc.
  objc:   full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.)
          plus add main -> run (plain C call correctly resolved)
- Ratchet THRESHOLDS for all three:
  elixir: precision 0.0 -> 1.0, recall 0.0 -> 0.8  (17/21 resolved)
  julia:  precision 0.0 -> 1.0, recall 0.0 -> 0.7  (11/15 resolved)
  objc:   precision 0.0 -> 1.0, recall 0.0 -> 0.4   (6/13 resolved)

Remaining FNs are genuine unresolved edges (same-file bare calls in
elixir/julia, receiver-typed message sends in objc) — not regressions.

Closes #1447
The JS C++ and CUDA extractors had no handler for 'declaration' AST nodes,
so typeMap was never seeded for statically-typed locals (e.g. 'UserService svc;').
Without a typeMap entry for 'svc', resolveReceiverEdge had nothing to look up and
silently skipped the receiver edge.

Add handleCppDeclaration / handleCudaDeclaration to both extractors. They mirror
match_c_family_type_map ('declaration' branch) from the native Rust path: extract
the type node text and seed typeMap[varName] = { type, confidence: 0.9 } for each
identifier or init_declarator child. Primitive types (int, char, bool, …) are
skipped to avoid spurious edges.

parity-compare.mjs --langs cpp,cuda --hybrid: PARITY OK (wasm = native = hybrid)
All 3044 tests pass.
… in Rust solver

Go extractor was only seeding typeMap for var_spec and parameter_declaration,
missing short_var_declaration. Added infer_short_var_types to handle:
- x := Struct{} → conf 1.0 (composite literal)
- x := &Struct{} → conf 1.0 (address-of composite)
- x := NewFoo() / x := pkg.NewFoo() → conf 0.7 (New* factory prefix)

Python extractor was only seeding typeMap for typed_parameter and
typed_default_parameter, missing plain assignment. Added
infer_py_assignment_type to handle:
- order = Order(...) → conf 1.0 (uppercase constructor)
- obj = Module.Class(...) → conf 0.7 (uppercase module prefix, non-builtin)

Both mirror the existing JS extractors exactly. Parity check for
go and python: wasm vs native/hybrid OK.
…l, zig)

Both engines used different rules for attributing calls inside variable bindings:

WASM: attributed to the narrowest enclosing span regardless of kind, so
local variable declarations inside fn main() shadowed the enclosing function
(Zig: calls attributed to repo/svc variables instead of main), and nested
let-bindings inside a Haskell do-block shadowed the top-level main binding.

Native: loaded allNodes from a query that excluded 'variable' kind, so
top-level Haskell bind nodes (main = do …, kind='variable') never matched
in defs_with_ids, causing all calls to fall back to the file node.

Unified rule implemented in findCaller (TS) and find_enclosing_caller (Rust):
- Function/method definitions are preferred over any variable/constant binding
  as the enclosing caller scope — local var declarations inside a function
  body never shadow the enclosing function (fixes Zig repo/svc attribution).
- When no function/method encloses the call, fall back to the WIDEST
  (outermost) variable/constant binding — this handles Haskell where main
  is a top-level bind node with kind 'variable'. Widest span is used so that
  nested let-bindings do not shadow the outer main binding.
- File node remains the absolute last resort.

Also adds 'variable' to NODE_KIND_FILTER_SQL (JS) and EDGE_NODE_KIND_FILTER
(Rust pipeline.rs) so top-level variable bindings are included in the allNodes
set available for caller matching.

parity-compare.mjs --langs haskell,zig --hybrid: PARITY OK — 2/2 fixtures.
…and test

Remove unused TypeMapEntry import from cpp.ts and cuda.ts, reformat
primitive-type Set literals and test expect() calls to satisfy biome
line-length rules.
Java was the only fixture where all three build paths (wasm, native,
hybrid) disagreed pairwise.

Bug 1 — WASM typeMap pollution:
`handleJavaLocalVarDecl` used last-wins Map.set(), so the local
`InMemoryUserRepository repo` in the static `createDefault()` method
silently overrode the constructor parameter `UserRepository repo`.
This caused WASM to bypass the interface and resolve directly to the
concrete class, producing no interface edge and the wrong receiver.
Fix: switch to first-wins `setTypeMapEntry` to match Rust extractor
semantics. First-wins preserves the interface annotation that drives
correct CHA dispatch.

Bug 2 — native vs wasm/hybrid confidence mismatch:
`runPostNativeCha` (native orchestrator path) used
`computeConfidence − CHA_DISPATCH_PENALTY = 0.7 − 0.1 = 0.6`,
while `runChaPostPass` (DB post-pass used by wasm and hybrid) hardcodes
0.8. Fix: align `runPostNativeCha` to also use 0.8.

Result: all three build paths now emit identical edges and confidences.
`parity-compare.mjs --langs java --hybrid` passes.
Updated expected-edges.json to include both the interface declaration
edge (TypeRepository.X at 0.7) and the CHA-expanded impl edge
(InMemoryUserRepository.X at 0.8), which are the correct semantics for
an interface-typed receiver.

Closes #1469
The inline CHA expansion in buildCallEdges and buildChaPostPass used
computeConfidence(relPath, t.file) - CHA_DISPATCH_PENALTY for all CHA
targets, producing 0.6 for cross-directory interface dispatch (same-dir
= 0.7, minus 0.1 penalty). runChaPostPass (helpers.ts) and
runPostNativeCha (native-orchestrator.ts) both hardcode 0.8 for
interface/CHA-dispatch edges.

The deduplication in runChaPostPass uses the existing DB edge as-is
and skips reinsertion, so the 0.6 edges from the inline pass were never
upgraded to 0.8.

Fix: typed-receiver (interface) dispatch branches now use hardcoded 0.8
matching the post-pass constants. The this/super branch keeps
computeConfidence-based proximity scoring to remain aligned with
runPostNativeThisDispatch.

parity-compare.mjs --langs typescript --hybrid goes green (was 12
edge diffs). Closes #1470

docs check acknowledged
…s unbounded

Replace `|| Infinity` with `?? Infinity` in findCaller so that a definition
with endLine=0 (a valid single-line node at the start of a file) is not
incorrectly treated as having unbounded span.
@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

Claude finished @carlos-alm's task —— View job


PR Review: fix(wasm): port missing node extractions to JS extractor (#1471)

  • Gather context and understand the PR
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity review
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance grading
  • Final assessment

Root Cause Analysis

Problem Being Solved: The WASM JS extractor was missing 25 nodes that the native Rust extractor correctly extracted from the jelly-micro fixture. The PR traces four distinct root causes and fixes each independently.

Why This Approach: Correct. The 25-node gap between engines is a real parity bug, not a cosmetic difference. Fixing extraction logic is the right response rather than documenting a gap. Each fix is surgical and independently justified.

Risk Assessment: Low. All four changes add new cases to existing switch/if chains — they don't alter the behavior of existing paths. New tests validate the specific patterns affected.


Phase 2: Configuration Change Detection

.claude/settings.json — adds two new PostToolUse hooks (track-bash-writes.sh and track-moves.sh) plus two PreToolUse hooks (snapshot-pre-bash.sh and guard-pr-body.sh). These are Claude Code quality/guard hooks. No linting rules relaxed, no TypeScript strict flags changed. ✓

.github/workflows/perf-canary.yml — new lightweight per-PR performance canary that runs only on extractor/graph/crates changes. Uses a 50% threshold (explicitly documented) — this is a separate gate from the 25% full-suite gate, not a relaxation. The BENCH_CANARY codepath in the regression guard was already there before this PR. ✓

tests/benchmarks/regression-guard.test.ts — adds NOISY_METRICS logic (No-op rebuild, 1-file rebuild, fnDeps depth 1) with a widened 50% threshold, justified by documented sub-30ms baseline volatility. The reasoning is consistent with the detailed comment (±10ms CI jitter = ±50%+ on sub-30ms metrics). This is not silencing a real regression — it's acknowledging measured CI variance on metrics where MIN_ABSOLUTE_DELTA already provides a floor. ✓ (Acceptable, though see concern below.)

No lint rules relaxed. No test timeouts raised. No coverage thresholds lowered. Clean pass.


Phase 3: Test Integrity

New tests in tests/parsers/javascript.test.ts cover all four root causes with positive and negative cases:

All 12 new tests assert correct behavior and include meaningful negative cases. No existing assertions weakened or removed. ✓

One concern: The regression-guard.test.ts change — widening 1-file rebuild and No-op rebuild thresholds from 25% to 50% and CANARY mode to 100%. The justification is empirical CI variance, but this PR does not include CI data showing the variance. The reasoning is plausible given how sub-30ms metrics behave, and the MIN_ABSOLUTE_DELTA floor mitigates the risk, but it weakens the guard without attached evidence.


Phase 4: Code Quality

Fix 1: Computed class method names

parser.ts:161 and wasm-worker-entry.ts:118: Added '(method_definition name: (computed_property_name) @meth_name) @meth_node' to COMMON_QUERY_PATTERNS. Both files are kept in sync (confirmed manually). Pattern is correct — computed_property_name is the tree-sitter node type for ['key']. ✓

Fix 2: Class expressions inside functions

parser.ts:173–177 and wasm-worker-entry.ts:129–133: JS_CLASS_PATTERNS expanded from a single class_declaration string to an array with a second (class name: ...) pattern for class expressions. TS_EXTRA_PATTERNS gets the type_identifier equivalent. Both files match. ✓

The walk path in javascript.ts:735 already handled 'class' node type — so this fix correctly brings the query path into parity with the walk path.

Fix 3: Array-destructured constants

javascript.ts:510–519 (query path extractDestructuredBindingsWalk) and javascript.ts:1029–1038 (walk path handleVariableDecl): Both paths get an array_pattern branch that emits a single constant node with the full pattern text (e.g., [x, y]). The approach — emit one node for the whole pattern rather than individual bindings — matches native engine behavior as stated. ✓

Minor observation: The array_pattern branch does not call extractDestructuredBindings (which would be wrong for array patterns). The design choice to emit the whole pattern string as the name is unusual but documented as matching native. This is acceptable, though worth flagging: if the native engine ever changes this to emit individual bindings, parity will break silently again.

Fix 4: Prototype method parameters

javascript.ts:3380–3388 (emitPrototypeMethod): extractParameters(rhs) is now called and the result is wired into the children field. This mirrors exactly what handleFunctionDecl and handleVarFnCapture do. ✓

One correctness concern: emitPrototypeMethod uses nodeStartLine(rhs) as the definition's line, not nodeStartLine(lhs) (the full assignment statement). For Arit.prototype.sum = (x, y) => x + y; on a single line this is equivalent, but for multi-line assignments like:

Arit.prototype.sum =
  (x, y) => x + y;

the definition line would be 2 (the arrow) rather than 1 (the assignment). This was pre-existing behavior, not introduced by this PR, but the parameter change makes it more visible.

Other changed files

  • src/domain/graph/builder/call-resolver.ts, native-orchestrator.ts, build-edges.ts: These are larger changes not directly related to the four stated fixes — they appear to be independent improvements bundled in this PR. Mixed concerns in one PR is noted as a mild scope concern (see below).
  • crates/codegraph-core/src/extractors/go.rs (+191 lines), python.rs (+154 lines): These are substantial native extractor additions also bundled here.
  • crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs (+119 lines).

Scope concern: The PR description accurately describes the four JS/WASM fixes, but the diff is significantly larger — it includes Go extractor additions, Python extractor additions, build-edges changes, benchmark infrastructure, and hook additions that are unrelated to the stated problem. Per the project's CLAUDE.md "One PR = one concern" policy, these unrelated additions should have been separate PRs.


Phase 5: Backlog Compliance

Zero-dep: ✓ — No new runtime dependencies. package.json dependencies section unchanged (3 existing deps). Hooks and CI use existing toolchain.

Foundation-aligned: ✓ — Fully local extraction fix. No cloud calls, no new infrastructure, no code generation. The parity fix directly improves symbol accuracy.

Problem-fit: 5/5 — Accurate node extraction is the foundational requirement for everything else: blast-radius analysis, call resolution, AI navigation. Missing 25 nodes is a direct hallucination source for AI agents using the WASM path. This fix eliminates false negatives.

Breaking: No — purely additive. Existing symbols continue to be emitted; new symbols are now also emitted where they were previously missing.

Tier: 1 (zero-dep ✓ + foundation-aligned ✓)


Critical Concerns

  1. PR scope violation (moderate): The diff contains significant changes beyond the four stated JS extractor fixes — Go extractor additions (+191 lines), Python extractor additions (+154 lines), Rust build_edges changes (+119 lines), benchmark infrastructure, and hook additions. CLAUDE.md explicitly requires "One PR = one concern." This makes the PR harder to review safely and harder to revert if any single component causes a regression.

  2. Regression guard threshold widening without CI data (minor): regression-guard.test.ts widens 1-file rebuild/No-op rebuild from 25%→50% threshold with reasoning that is plausible but unverified in this PR. Recommend attaching a CI run showing the variance that motivated the change.

  3. Computed method name representation (informational): Methods like ['property7'] are stored with the brackets as part of the name. There is no test asserting that the resolver can actually find these methods when called. This is a pre-existing design choice, not introduced here, but it may cause resolution failures on computed property access that are worth a follow-up issue.


Final Recommendation

## Root Cause Analysis
Problem Being Solved: WASM JS extractor missing 25 nodes vs native engine on jelly-micro fixture.
Why This Approach: Direct extraction fixes for four root causes — correct approach per codebase philosophy.
Risk Assessment: Low for the four JS extractor fixes. Moderate for bundled unrelated changes.

## Backlog Compliance
- Zero-dep: ✓
- Foundation-aligned: ✓
- Problem-fit: 5 — directly fixes false-negative node extraction that causes AI hallucination
- Breaking: No
- Tier: 1

## Critical Concerns
1. PR scope violation — Go, Python, Rust, and benchmark changes bundled with JS extractor fix
2. Regression guard threshold widening lacks CI variance data to justify change

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: APPROVE WITH CONDITIONS
- Reasoning: The four JS extractor fixes are correct, well-tested, and properly scoped. The
  new tests are high quality with positive/negative cases. The PR scope is larger than stated
  but the unrelated additions appear safe. The one concrete ask before merge: confirm the
  regression-guard threshold change is backed by empirical CI data (a comment or linked run
  showing the variance on the affected metrics would suffice).

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Codegraph Impact Analysis

53 functions changed67 callers affected across 20 files

  • emit_pts_alias_edges in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:381 (16 transitive callers)
  • process_file in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:441 (15 transitive callers)
  • is_callable_kind in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:662 (3 transitive callers)
  • is_top_level_binding_kind in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:666 (3 transitive callers)
  • find_enclosing_caller in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:681 (16 transitive callers)
  • match_go_type_map in crates/codegraph-core/src/extractors/go.rs:315 (0 transitive callers)
  • infer_short_var_types in crates/codegraph-core/src/extractors/go.rs:331 (1 transitive callers)
  • infer_single_short_var in crates/codegraph-core/src/extractors/go.rs:364 (2 transitive callers)
  • infer_composite_literal in crates/codegraph-core/src/extractors/go.rs:376 (3 transitive callers)
  • infer_address_of_composite in crates/codegraph-core/src/extractors/go.rs:394 (3 transitive callers)
  • infer_factory_call in crates/codegraph-core/src/extractors/go.rs:415 (3 transitive callers)
  • infers_factory_call_new_prefix in crates/codegraph-core/src/extractors/go.rs:549 (0 transitive callers)
  • infers_pkg_factory_call in crates/codegraph-core/src/extractors/go.rs:562 (0 transitive callers)
  • infers_composite_literal in crates/codegraph-core/src/extractors/go.rs:573 (0 transitive callers)
  • infers_address_of_composite in crates/codegraph-core/src/extractors/go.rs:585 (0 transitive callers)
  • non_new_prefix_not_inferred in crates/codegraph-core/src/extractors/go.rs:596 (0 transitive callers)
  • match_js_type_map in crates/codegraph-core/src/extractors/javascript.rs:104 (0 transitive callers)
  • is_python_builtin in crates/codegraph-core/src/extractors/python.rs:323 (2 transitive callers)
  • match_python_type_map in crates/codegraph-core/src/extractors/python.rs:367 (0 transitive callers)
  • infer_py_assignment_type in crates/codegraph-core/src/extractors/python.rs:418 (1 transitive callers)

@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR ports four missing node-extraction fixes to the WASM JS extractor to close a 25-node parity gap with the native Rust extractor on the jelly-micro fixture. It also includes a broad set of companion improvements: Go/Python short-var type inference, Rust/TS two-pass findCaller strategy for Haskell-style top-level bindings, CHA confidence normalisation, incremental scoping for the CHA post-pass, class-scoped typeMap keys to prevent cross-class field collision, and a new per-PR perf-canary workflow.

  • WASM parity fixes: adds computed_property_name to the method query pattern, upgrades JS_CLASS_PATTERN to a JS_CLASS_PATTERNS array that also captures class expressions, adds an array_pattern branch for const [x, y] = … (in both query and walk paths), and wires extractParameters into emitPrototypeMethod to capture prototype arrow/function params.
  • findCaller two-pass strategy (TS + Rust): function/method scope is found with narrowest-span logic; variable/constant bindings use widest-span as a lower-priority fallback, so Haskell main = do … is correctly attributed without being shadowed by inner let bindings.
  • Infrastructure: EDGE_NODE_KIND_FILTER gains variable, CHA dispatch confidence hardcoded to 0.8 (consistent across all three paths), runPostNativeCha adds gate-based incremental scoping, benchmark warm-up runs and phase-timing breakdown, and handleFieldDefTypeMap emits a class-scoped primary key to fix bug: class-field annotation typeMap keys are not class-scoped (cross-class collision) #1458.

Confidence Score: 5/5

All changes are well-scoped, thoroughly tested (133 tests, 12 new), and backed by a parity-compare script showing 0 node diffs. No regressions are expected.

Every fix is independently verified by new unit tests and the end-to-end parity comparison. The four WASM extraction fixes are narrow tree-sitter query and AST-walk additions. The supporting changes (two-pass findCaller, CHA confidence normalisation, incremental CHA scoping) mirror well-understood Rust counterparts and are gated correctly. The previously flagged cpp.ts/cuda.ts first-wins issue was already addressed before this review.

No files require special attention — the test suite and parity script together give high confidence in the correctness of the extraction and edge-building changes.

Important Files Changed

Filename Overview
src/domain/parser.ts Adds computed_property_name to the method query pattern and mirrors wasm-worker-entry.ts by converting JS_CLASS_PATTERN to a JS_CLASS_PATTERNS array that also captures class expressions; both files are now in sync.
src/domain/wasm-worker-entry.ts Same query-pattern changes as parser.ts — computed method names and class expression patterns; TS_EXTRA_PATTERNS gains the class-expression variant; JS/TS paths stay in sync.
src/extractors/javascript.ts Adds array_pattern branch in extractDestructuredBindingsWalk (FUNCTION_SCOPE_TYPES guard already prevents function-local emission), adds array_pattern branch in handleVariableDecl, wires extractParameters into emitPrototypeMethod, and introduces class-scoped typeMap keys for field definitions.
src/domain/graph/builder/call-resolver.ts Two-pass findCaller: function/method uses narrowest span; variable/constant uses widest span as fallback for Haskell-style top-level bindings. Mirrors the Rust implementation cleanly.
crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs Mirrors the TS two-pass findCaller in Rust: adds kind field to DefWithId, is_callable_kind/is_top_level_binding_kind predicates, and widest-span selection for variable/constant; PtsAliasCtx struct eliminates clippy::too_many_arguments.
src/domain/graph/builder/stages/native-orchestrator.ts runPostNativeCha gains changedFiles param and gate-based incremental scoping; CHA confidence hardcoded to 0.8; per-phase timing added; COUNT(*) fast-path skipped when no post-pass wrote data.
src/domain/graph/builder/stages/build-edges.ts Typed-receiver CHA dispatch uses hardcoded 0.8 confidence; this/super dispatch retains proximity-based scoring; target sorting before edge emission matches native sort_targets_by_confidence; NODE_KIND_FILTER_SQL adds variable.
crates/codegraph-core/src/extractors/go.rs Adds short_var_declaration type inference (composite literals, &composite, New* factory calls) mirroring the JS inferShortVarType chain; well-covered by five new unit tests.
src/extractors/java.ts handleJavaLocalVarDecl switched from Map.set (last-wins) to setTypeMapEntry (first-wins) to match Rust extractor semantics.
.github/workflows/perf-canary.yml New lightweight per-PR regression gate (50% threshold, incremental suite only) that triggers only when extractor/graph/Rust files change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[JS/TS File] --> B[Tree-sitter Query]
    B --> |method_definition name: computed_property_name| C[Computed Method Node]
    B --> |class_declaration name: identifier| D[Class Declaration Node]
    B --> |class name: identifier| E[Class Expression Node NEW]
    B --> |lexical_declaration const| F{Name field type?}
    F --> |object_pattern| G[extractDestructuredBindings each property to constant]
    F --> |array_pattern NEW| H[emit array pattern text as single constant]
    B --> |assignment_expression .prototype.| I[emitPrototypeMethod]
    I --> |rhs = arrow_function / function_expression| J[extractParameters NEW children on method def]
    K[findCaller] --> L{Pass 1: enclosing function/method?}
    L --> |yes narrowest span| M[Use fn/method as caller]
    L --> |no| N{Pass 2: enclosing variable/constant?}
    N --> |yes WIDEST span NEW| O[Use top-level binding as caller e.g. Haskell main]
    N --> |no| P[File scope]
    Q[runPostNativeCha] --> R{changedFiles != null?}
    R --> |yes| S{Gate A: new class hierarchy?}
    S --> |yes| T[Full scan]
    S --> |no| U{Gate B: new RTA evidence?}
    U --> |yes| T
    U --> |no NEW| V[Scoped scan: changed files only]
    R --> |null full build| T
Loading

Reviews (6): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile

Comment thread src/extractors/cpp.ts
Comment on lines +233 to +236
const varName = unwrapCppDeclaratorName(nameNode);
if (varName) {
ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unlike handleJavaLocalVarDecl which was explicitly changed to setTypeMapEntry (first-wins) in this same PR to match Rust extractor semantics, handleCppDeclaration uses ctx.typeMap.set directly (last-wins). If the Rust match_c_family_type_map uses first-wins (as the Java path does), a file with a constructor parameter typed IUserService svc followed by a local UserServiceImpl svc would resolve to the concrete type, potentially misrouting CHA dispatch that should fan out through the interface hierarchy. setTypeMapEntry imported just above would be the consistent choice here.

Suggested change
const varName = unwrapCppDeclaratorName(nameNode);
if (varName) {
ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });
}
const varName = unwrapCppDeclaratorName(nameNode);
if (varName) {
setTypeMapEntry(ctx.typeMap, varName, typeName, 0.9);
}

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced ctx.typeMap.set with setTypeMapEntry (first-wins) in handleCppDeclaration and added setTypeMapEntry to the helpers import. Consistent with the Java handler change and the Rust extractor semantics.

Comment thread src/extractors/cuda.ts
Comment on lines +233 to +236
const varName = extractCudaFieldName(nameNode);
if (varName) {
ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Same issue as handleCppDeclaration — uses ctx.typeMap.set (last-wins) rather than setTypeMapEntry (first-wins), inconsistent with the Java handler change in this PR. The same concern applies: a later concrete-type declaration would overwrite an earlier interface-typed parameter for the same name, potentially misrouting CHA dispatch. setTypeMapEntry should be imported and used here to match the intended first-wins semantics.

Suggested change
const varName = extractCudaFieldName(nameNode);
if (varName) {
ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });
}
const varName = extractCudaFieldName(nameNode);
if (varName) {
setTypeMapEntry(ctx.typeMap, varName, typeName, 0.9);
}

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced ctx.typeMap.set with setTypeMapEntry (first-wins) in handleCudaDeclaration and added setTypeMapEntry to the helpers import. Same fix as the C++ handler, consistent with Java and Rust extractor semantics.

Closes #1471

Four extraction gaps in the WASM JS extractor caused 25 node diffs
against the native engine in the jelly-micro parity fixture:

1. Computed property method names (e.g. `['property7']`): the query
   patterns in both parser.ts and wasm-worker-entry.ts only matched
   `property_identifier` and `private_property_identifier` as the
   `name` field of `method_definition`; add a third pattern for
   `computed_property_name`.

2. Class expressions inside functions (`return class PostMixin …`):
   wasm-worker-entry.ts JS_CLASS_PATTERN was a single string matching
   only `class_declaration`; rename to JS_CLASS_PATTERNS array and add
   the `(class name: …)` pattern for anonymous class-expression nodes.
   Also add the matching `(class name: (type_identifier) …)` pattern
   to TS_EXTRA_PATTERNS so TS class expressions are covered too.

3. Array destructuring constants (`const [x, y] = …`): both
   `extractDestructuredBindingsWalk` (query path) and
   `handleVariableDecl` (walk path) only handled `object_pattern`; add
   an `array_pattern` branch that emits a single `constant` node whose
   name is the full array pattern text — matching native behaviour.

4. Parameters of prototype arrow/function methods
   (`Arit.prototype.sum = (x, y) => …`): `emitPrototypeMethod` emitted
   the definition node without calling `extractParameters`, so children
   were always absent; add the call and wire the result through.

After this fix `parity-compare.mjs --langs jelly-micro` reports
0 node diffs (was 25). 22 pre-existing edge diffs remain (tracked
in #1506) — those are out of scope.
@carlos-alm carlos-alm force-pushed the fix/js-extractor-nodes-1471 branch from 37bb91a to 7509e4f Compare June 13, 2026 03:19
@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +570 to +584
const chunkRows = db
.prepare(
`SELECT e.source_id, tgt.name AS method_name, src.file AS caller_file
FROM edges e
JOIN nodes tgt ON e.target_id = tgt.id
JOIN nodes src ON e.source_id = src.id
WHERE e.kind = 'calls' AND tgt.kind = 'method'
AND INSTR(tgt.name, '.') > 0
AND src.file IN (${ph})`,
)
.all(...chunk) as Array<{
source_id: number;
method_name: string;
caller_file: string | null;
}>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The scoped-file scan loop (used when scopeToChangedFiles is true) doesn't populate caller_file for affected-files tracking because the column is never read from the result rows. The full-scan path returns caller_file (used downstream in affectedFiles.add(caller_file)), so the scoped path silently produces an empty affectedFiles set even when edges are emitted. Downstream role re-classification is then skipped entirely for those edge-endpoints instead of being scoped to just the affected files.

Suggested change
const chunkRows = db
.prepare(
`SELECT e.source_id, tgt.name AS method_name, src.file AS caller_file
FROM edges e
JOIN nodes tgt ON e.target_id = tgt.id
JOIN nodes src ON e.source_id = src.id
WHERE e.kind = 'calls' AND tgt.kind = 'method'
AND INSTR(tgt.name, '.') > 0
AND src.file IN (${ph})`,
)
.all(...chunk) as Array<{
source_id: number;
method_name: string;
caller_file: string | null;
}>;
const chunkRows = db
.prepare(
`SELECT e.source_id, tgt.name AS method_name, src.file AS caller_file
FROM edges e
JOIN nodes tgt ON e.target_id = tgt.id
JOIN nodes src ON e.source_id = src.id
WHERE e.kind = 'calls' AND tgt.kind = 'method'
AND INSTR(tgt.name, '.') > 0
AND src.file IN (${ph})`,
)
.all(...chunk) as Array<{
source_id: number;
method_name: string;
caller_file: string | null;
}>;
rows.push(...chunkRows);

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scoped-file path already selects src.file AS caller_file and includes it in the typed result — see lines 572–584 of the current code. The comment at line 559 was added precisely to document this. affectedFiles.add(caller_file) at line 662 therefore fires for both the scoped and full-scan paths. No change needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The diff already shows src.file AS caller_file in the SELECT (line 579) and caller_file: string | null in the typed result (line 584). My comment was incorrect — the column is present in both the query and the type annotation, so affectedFiles.add(caller_file) at line 662 will work correctly for the scoped path too. Apologies for the noise.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

Addressed the two out-of-scope concerns raised in the review:

The scope concern (Go/Python/Rust changes bundled) is acknowledged; it's a past decision in this PR that cannot be reversed now without significant churn. Going forward these will be split per the one-PR-one-concern policy.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Engine parity: JS extractor misses 25 nodes the native extractor finds (jelly-micro)

1 participant