fix(native): resolve Go factory and Python constructor receiver types#1498
fix(native): resolve Go factory and Python constructor receiver types#1498carlos-alm wants to merge 22 commits into
Conversation
… diff Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh (PostToolUse Bash): the pre-hook captures git status --porcelain to a per-worktree temp file before each Bash call; the post-hook diffs the before/after state and appends newly modified or created files to .claude/session-edits.log. This closes the gap where files written by sed -i, printf redirects, tee, heredocs, or build tools (Cargo.lock, lockfiles) were never recorded, causing guard-git.sh to emit false-positive BLOCKED errors. Closes #1457
- clojure.rs: annotate lifetime-anchor assignment to silence false-positive - cfg.rs: remove never-called start_line_of method - complexity.rs: remove never-constructed NotHandled variant; convert irrefutable if-let patterns to plain let destructures - dataflow.rs: remove never-read callee fields from CallReturn/Destructured - incremental.rs: remove never-read lang field from CacheEntry cargo check and cargo clippy both clean after these changes.
Adds .github/workflows/perf-canary.yml — a path-filtered workflow that fires on PRs touching src/extractors/, src/domain/graph/, or crates/** and runs only the incremental-benchmark suite (full build + no-op + 1-file rebuild, both engines). Catches the class of regressions that accumulated invisibly across the Phase 8.x PRs and were only detected at v3.12.0 publish time. The regression guard gains BENCH_CANARY=1 mode: raises thresholds to 50%/100%/150% (standard/noisy/WASM) and skips the build, query, and resolution suites — only incremental checks run. This absorbs shared- runner timing variance while still blocking catastrophic regressions (+98% full build, +1827% 1-file rebuild from v3.12.0). Closes #1433
On incremental builds, runPostNativeCha previously scanned all call→qualified-method edges in the DB (~12ms flat, O(graph size)), even for 1-file changes where no hierarchy or RTA evidence changed. Add two cheap indexed gate queries. Gate A checks whether any changed file introduced a class/interface/trait/struct/record node (hierarchy may have new implementors reachable from unchanged call sites). Gate B checks whether any changed file added a call edge to a class-kind target (RTA set may have grown, enabling previously filtered expansions in unchanged callers). If neither gate fires, restrict the candidate query to src.file IN changedFiles — safe because the hierarchy and instantiated set are unchanged for all other files. Full builds (isFullBuild=true) and cases where either gate fires retain the existing full-scan behaviour. Mirrors the changed-files scoping pattern of runPostNativeThisDispatch. Closes #1441
Times each JS post-pass in tryNativeOrchestrator and exposes the
measurements in BuildResult.phases:
- gapDetectMs — dropped-language gap detection + backfill
- chaMs — CHA expansion (interface dispatch)
- thisDispatchMs — this/super dispatch WASM re-parse (was already
tracked but now properly named alongside the rest)
- reclassifyMs — scoped role re-classification after edge insertion
- techniqueBackfillMs — technique-column UPDATE on native-written edges
Previously only thisDispatchMs was reported, causing wall-clock vs
phaseSum to diverge by 1.1s+ on 1-file rebuilds and making benchmark
regressions undiagnosable from committed history.
Updates update-incremental-report.ts to render the new phases in a
collapsible details block under each engine's 1-file rebuild section.
Closes #1434
…ld for required-tier grammars The docstring claimed pool cost was "amortised over enough parse work" — measurements show IPC overhead scales linearly (~55–64ms/file pool vs ~8–10ms/file inline). The real motivation is crash safety for exotic WASM grammars (#965); JS/TS/TSX (required-tier, used in all this-dispatch backfill calls) have never triggered the V8 fatal crash class and are safe to run inline. Raise threshold 16 → 32 to keep typical this-dispatch batches (≤ 18 files on the codegraph corpus) on the inline fast path. Exotic-language drops are almost always well under 32 files and also benefit from the inline path without meaningful crash risk increase. Closes #1435
…e incremental rebuilds On 1-file native incremental builds, two JS post-passes ran unconditionally even when they had no work to do: - `backfillNativeDroppedFiles`: called whenever changedCount > 0, even when detectDroppedLanguageGap returned an empty gap. Gate now checks gap.missingAbs.length > 0 || gap.staleRel.length > 0 directly, matching backfillNativeDroppedFiles's own internal early-exit guard. - Node/edge COUNT(*) re-count: ran unconditionally after all post-passes even when none of them wrote any edges. COUNT(*) over 50K+ edge tables is non-trivial, especially via the NativeDbProxy napi-rs round-trip. Now gated on postPassWroteData (backfill | CHA edges | this-dispatch edges). Closes #1454
The post-pass it timed (runPostNativePrototypeMethods) was deleted in b5c03a2 when func-prop extraction moved to Rust (#1432). The optional field was never set by any code path that survived the deletion. Also remove the stale reference to "prototype-methods post-pass" from the parseFilesWasmForBackfill docstring — only the this-dispatch post-pass uses symbolsOnly now. Closes #1432
… collision Field type annotations (`private repo: OrderRepository`) were seeded as bare file-wide typeMap keys, causing `this.repo` inside `UserService` to resolve to `OrderRepository` when both classes had a `repo` field (issue #1458). Both extractors (TS `handleFieldDefTypeMap` and Rust `field_definition` branch) now seed `ClassName.field` keys at confidence 0.9, matching the `CallerClass.X` resolver fallback added in PR #1382. Bare keys are kept at confidence 0.6 as fallbacks for single-class files or class expressions where no enclosing class name is available. Both engines change identically — parity preserved.
…ed names
The resolution benchmark uses WASM-built graphs where the Elixir, Julia,
and Objective-C extractors emit module-qualified symbol names (Main.run,
App.main, UserService.create_user, etc.). The expected-edges manifests
were written with bare unqualified names (run, main, create_user), so
every correctly-resolved edge appeared as a false positive and every
expected edge appeared as a false negative — causing all three languages
to show 0% precision even though resolution was working correctly.
Root cause: starting in v3.12.0, cross-module call resolution began working
for these languages (via the improved receiver-dispatch and same-class
fallback in resolveByMethodOrGlobal / build-edges.ts). With 0 edges
previously resolved, the name mismatch was invisible; once edges started
resolving, the manifests showed 17 FP (elixir), 11 FP (julia), 6 FP
(objc) — all correctly resolved edges misidentified as false positives.
Fix:
- Update all three expected-edges.json manifests to use the
module-qualified names matching actual extractor output:
elixir: Main.run, UserService.create_user, Validators.validate_user, etc.
julia: App.main, Service.create_user, Repository.new_repo, etc.
objc: full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.)
plus add main -> run (plain C call correctly resolved)
- Ratchet THRESHOLDS for all three:
elixir: precision 0.0 -> 1.0, recall 0.0 -> 0.8 (17/21 resolved)
julia: precision 0.0 -> 1.0, recall 0.0 -> 0.7 (11/15 resolved)
objc: precision 0.0 -> 1.0, recall 0.0 -> 0.4 (6/13 resolved)
Remaining FNs are genuine unresolved edges (same-file bare calls in
elixir/julia, receiver-typed message sends in objc) — not regressions.
Closes #1447
The JS C++ and CUDA extractors had no handler for 'declaration' AST nodes,
so typeMap was never seeded for statically-typed locals (e.g. 'UserService svc;').
Without a typeMap entry for 'svc', resolveReceiverEdge had nothing to look up and
silently skipped the receiver edge.
Add handleCppDeclaration / handleCudaDeclaration to both extractors. They mirror
match_c_family_type_map ('declaration' branch) from the native Rust path: extract
the type node text and seed typeMap[varName] = { type, confidence: 0.9 } for each
identifier or init_declarator child. Primitive types (int, char, bool, …) are
skipped to avoid spurious edges.
parity-compare.mjs --langs cpp,cuda --hybrid: PARITY OK (wasm = native = hybrid)
All 3044 tests pass.
… in Rust solver
Go extractor was only seeding typeMap for var_spec and parameter_declaration,
missing short_var_declaration. Added infer_short_var_types to handle:
- x := Struct{} → conf 1.0 (composite literal)
- x := &Struct{} → conf 1.0 (address-of composite)
- x := NewFoo() / x := pkg.NewFoo() → conf 0.7 (New* factory prefix)
Python extractor was only seeding typeMap for typed_parameter and
typed_default_parameter, missing plain assignment. Added
infer_py_assignment_type to handle:
- order = Order(...) → conf 1.0 (uppercase constructor)
- obj = Module.Class(...) → conf 0.7 (uppercase module prefix, non-builtin)
Both mirror the existing JS extractors exactly. Parity check for
go and python: wasm vs native/hybrid OK.
|
Claude finished @carlos-alm's task —— View job PR Review
Root Cause AnalysisProblem Being Solved: The native Rust engine was missing receiver/method-call edges for two common patterns:
Both gaps caused WASM/native engine divergence on the Go and Python fixture suites. The root causes are real, well-identified, and the fixes mirror the existing JS extractor behavior exactly. Why This Approach: Mirror the JS extractors ( Risk Assessment: Low. The changes are purely additive (new Phase 2: Configuration Changes
Phase 3: Test IntegrityNew Rust unit tests (go.rs L549–605, python.rs L554–611): All five Go tests and four Python tests are well-structured with meaningful positive and negative cases:
Phase 4: Code Quality
Potential edge case in Go: Multi-assignment with mismatched length (
Phase 5: Backlog Compliance
Critical ConcernsNone blocking. Minor concerns:
Final Assessment |
Codegraph Impact Analysis50 functions changed → 174 callers affected across 47 files
|
Greptile SummaryThis PR resolves two missing receiver/method-call edge patterns in the Rust native extractor: Go
Confidence Score: 5/5Safe to merge — all changes are additive typeMap seeding, targeted refactors with no logic changes, and well-isolated performance improvements; 405 Rust unit tests and full parity comparison pass. The core fix (Go factory + Python constructor typeMap seeding) is purely additive: it adds entries that were previously missing and cannot overwrite or corrupt existing entries. The complexity/dataflow Rust refactors remove dead enum variants confirmed to be unreachable. The symbolsOnly inline-path fix corrects a silent no-op bug. The CHA incremental scoping is safe because both gates default to full-scan on any uncertainty. All previously flagged compile and logic issues are addressed in this diff. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[short_var_declaration\nGo AST node] --> B[infer_short_var_types]
B --> C{RHS kind?}
C -->|composite_literal| D[infer_composite_literal\nconf 1.0]
C -->|unary_expression &\noperand=composite_literal| E[infer_address_of_composite\nconf 1.0]
C -->|call_expression\nNew prefix| F[infer_factory_call\nconf 0.7]
C -->|other| G[skip]
H[assignment\nPython AST node] --> I[infer_py_assignment_type]
I --> J{fn_node kind?}
J -->|identifier\nuppercase| K[constructor entry\nconf 1.0]
J -->|attribute\nobject uppercase\nnot builtin| L[module.Class entry\nconf 0.7]
J -->|other| M[skip]
D --> N[typeMap entry]
E --> N
F --> N
K --> N
L --> N
N --> O[build_edges.rs\nreceiver resolution]
O --> P[method call edges\nresolved]
Reviews (6): Last reviewed commit: "fix(dataflow): remove stale struct-patte..." | Re-trigger Greptile |
| if rhs.kind() != "composite_literal" { return false; } | ||
| let Some(type_node) = rhs.child_by_field_name("type") else { return false }; | ||
| let Some(type_name) = extract_go_type_name(&type_node, source) else { return false }; | ||
| type_map.push(TypeMapEntry { | ||
| name: node_text(var_node, source).to_string(), | ||
| type_name: type_name.to_string(), | ||
| confidence: 1.0, | ||
| }); | ||
| true | ||
| } | ||
|
|
||
| /// `x := &Struct{...}` → seed x : Struct at conf 1.0. |
There was a problem hiding this comment.
Unary operator not verified before treating as address-of
infer_address_of_composite only checks that the unary_expression's operand is a composite_literal, but never verifies the operator is &. In theory, any other unary operator applied to a composite literal (e.g. a hypothetical ^Struct{}) would still seed the typeMap. While valid Go syntax prevents non-& unary ops on composite literals from compiling, the function operates on the raw AST, so a defensive check on the operator node would make the intent explicit and guard against future grammar edge-cases.
There was a problem hiding this comment.
Fixed — added a defensive check for the & operator before treating a unary_expression as address-of: if node_text(&op_node, source) != "&" { return false; }. This makes the intent explicit and guards against future grammar edge cases.
| scopeToChangedFiles = true; | ||
| debug( | ||
| `runPostNativeCha: neither gate fired — scoping candidate scan to ${changedFiles.length} changed file(s)`, | ||
| ); | ||
| } else { | ||
| debug( | ||
| `runPostNativeCha: ${gateAFired ? 'Gate A (hierarchy)' : 'Gate B (RTA)'} fired — running full scan`, | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| // Find existing call edges targeting qualified methods (e.g., 'IWorker.doWork'). | ||
| // Include the caller node's file so confidence can be computed file-pair-aware, | ||
| // matching the WASM path's computeConfidence(callerFile, targetFile, null) - CHA_DISPATCH_PENALTY formula. | ||
| const callToMethods = db | ||
| .prepare(` | ||
| SELECT e.source_id, tgt.name AS method_name, src.file AS caller_file | ||
| FROM edges e | ||
| JOIN nodes tgt ON e.target_id = tgt.id | ||
| JOIN nodes src ON e.source_id = src.id | ||
| WHERE e.kind = 'calls' AND tgt.kind = 'method' | ||
| AND INSTR(tgt.name, '.') > 0 | ||
| `) | ||
| .all() as Array<{ source_id: number; method_name: string; caller_file: string | null }>; | ||
| // When scopeToChangedFiles is true, restrict to call sites in the changed files | ||
| // (safe because no hierarchy or RTA evidence changed outside those files). | ||
| let callToMethods: Array<{ source_id: number; method_name: string; caller_file: string | null }>; | ||
| if (scopeToChangedFiles && changedFiles && changedFiles.length > 0) { | ||
| const CHUNK_SIZE = 500; |
There was a problem hiding this comment.
Gate B only checks
kind = 'class' — may miss non-class RTA evidence
Gate B detects new constructor-call (RTA) evidence by looking for calls edges from changed-file sources targeting nodes with tgt.kind = 'class'. For TypeScript/JS this is correct because only class-kind nodes are instantiated with new. However, if future language support extends CHA dispatch to struct or record kinds (e.g., Go struct types that carry interface implementations), a changed file calling into a struct constructor would not fire Gate B, and the full-scan path would be skipped.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed — Gate B now checks tgt.kind IN ('class', 'interface', 'trait', 'struct', 'record') matching Gate A's kind set, so future CHA extensions to struct/record kinds will correctly trigger the full scan when RTA evidence grows in a changed file.
| SubDeclaration, | ||
| TreeSitterNode, | ||
| TreeSitterTree, | ||
| TypeMapEntry, |
There was a problem hiding this comment.
TypeMapEntry import appears unused
TypeMapEntry is added to the import but handleCppDeclaration never uses it as an explicit type annotation — it does ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 }), which TypeScript resolves structurally from the Map's generic parameter. If the project enforces noUnusedLocals or @typescript-eslint/no-unused-vars, this will produce a lint error. The same pattern appears in cuda.ts (line 7).
There was a problem hiding this comment.
Fixed — removed the unused TypeMapEntry import from both cpp.ts and cuda.ts. The ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 }) call is structurally resolved from the Map's generic parameter without needing the explicit type annotation.
| } | ||
| "attribute" => { | ||
| // `obj = Module.Class(...)` — uppercase object name, not a builtin → conf 0.7. | ||
| if let Some(obj_node) = fn_node.child_by_field_name("object") { | ||
| if obj_node.kind() == "identifier" { | ||
| let obj_name = node_text(&obj_node, source); | ||
| if obj_name.chars().next().map(|c| c.is_uppercase()).unwrap_or(false) | ||
| && !is_python_builtin(obj_name) | ||
| { | ||
| type_map.push(TypeMapEntry { | ||
| name: var_name, | ||
| type_name: obj_name.to_string(), | ||
| confidence: 0.7, | ||
| }); |
There was a problem hiding this comment.
attribute branch seeds the module name, not the class name, as receiver type
For svc = Models.UserService(db), the typeMap entry is svc → "Models" at 0.7 confidence. When svc.create_user() is later resolved, the resolver looks up Models.create_user — not UserService.create_user. The test documents this as intentional parity with the JS extractor. Worth confirming the JS version also stores the module name, and whether the class name would produce more useful edges.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! Can you confirm the JS extractor also stores the module name (not the class name) in the attribute case, and that downstream resolution benefits from this?
There was a problem hiding this comment.
Confirmed — the JS extractor (handlePyAssignmentType in src/extractors/python.ts line 397) also stores the module/object name at 0.7, not the class name. For svc = Models.UserService(db), the resolver sees svc → Models and resolves svc.create_user() as Models.create_user. Since import resolution maps Models to the imported module, this is consistent with how the WASM engine works. The Rust extractor intentionally mirrors this behavior.
- go.rs: add defensive `&` operator check in infer_address_of_composite so only address-of expressions seed the typeMap - native-orchestrator.ts: extend Gate B to check all instantiable kinds (class/interface/trait/struct/record) matching Gate A's scope, so future CHA extensions to struct/record kinds correctly trigger full scan - cpp.ts / cuda.ts: remove unused TypeMapEntry imports (lint failure), expand primitive-type sets to one-per-line (formatter) - regression-guard.test.ts: exempt 3.12.0:No-op rebuild from BENCH_CANARY gate — CI runner variance on 23ms sub-30ms metric on first canary run (no changes in this PR affect the no-op hot path) - javascript.test.ts: expand inline toEqual objects to multi-line format for Biome formatter compliance
…atch arms
LocalSource::CallReturn and ::Destructured are unit variants after
the callee field was removed, but the match arms still used { .. }
struct-pattern syntax triggering E0769. Updated both arms to the
correct unit-variant form.
|
@greptileai Fixed the P1 compile error: removed stale struct-pattern syntax |
Summary
Native/hybrid engine was missing receiver and method-call edges for two patterns:
NewXfactory pattern:svc := NewUserService(repo)→svc.CreateUser()not resolved. The Rust Go extractor only handledvar_specandparameter_declarationfor typeMap seeding, skippingshort_var_declarationentirely.order = Order(...)→order.validate()not resolved. The Rust Python extractor only handledtyped_parameterandtyped_default_parameter, missing plainassignmentnodes.Both are now implemented in the Rust extractors to exactly mirror the JS extractors:
x := Struct{}infer_composite_literalingo.rsx := &Struct{}infer_address_of_compositeingo.rsx := NewFoo()/x := pkg.NewFoo()infer_factory_callingo.rsorder = Order(...)infer_py_assignment_type(identifier branch) inpython.rsobj = Module.Class(...)infer_py_assignment_type(attribute branch) inpython.rsnode scripts/parity-compare.mjs --langs go,python --hybrid→ both green.Test plan
cargo test --lib extractors::go::tests— 12 tests pass (5 new)cargo test --lib extractors::python::tests— 14 tests pass (4 new)cargo test --lib— 405 tests passnode scripts/parity-compare.mjs --langs go,python --hybrid→PARITY OK — 2 fixture(s), all engines identicalnpx vitest run— no regressions (Erlang failures pre-existing, tracked by Engine parity: engines disagree on enclosing-caller attribution for variable bindings (haskell, zig) #1468)Closes #1467