fix(bench): update elixir/julia/objc expected-edges to module-qualified names by carlos-alm · Pull Request #1496 · optave/ops-codegraph-tool

carlos-alm · 2026-06-13T01:36:11Z

Summary

Root cause of perf: investigate elixir/julia/objc false positive regression in v3.12.0 #1447: The three expected-edges manifests (elixir, julia, objc) were written with bare unqualified function names (run, main, create_user), but the extractors emit module-qualified names (Main.run, App.main, UserService.create_user, full ObjC selectors). When v3.12.0's improved cross-module resolution started successfully resolving edges for these languages, every correct edge appeared as a false positive (name mismatch with manifest) and every expected edge appeared as false negative — driving global precision from 89.9% to 84.4%.
Not a regression in PR fix(native): extract func-prop methods in Rust and cut native build post-pass costs #1442: The hypothesis that func-prop extraction was running on non-JS files was incorrect. The dispatch in mod.rs is clean (each language has its own extractor). The real cause was stale expected-edges manifests that never matched actual extractor output.
Perf regression (perf: investigate native build speed regression in v3.12.0 (+22% build, +41% 1-file incremental) #1446): Already exempted in KNOWN_REGRESSIONS as 3.12.0:Full build and 3.12.0:1-file rebuild. No code change needed for that issue — the benchmark guard already accounts for it.

Changes:

Updated all three expected-edges.json to use actual extractor output names
Ratcheted THRESHOLDS: elixir 0%→100% precision / 80% recall; julia 0%→100% precision / 70% recall; objc 0%→100% precision / 40% recall
Resolution benchmark: 176/176 tests pass, full suite 3044/3074 pass

…ed names The resolution benchmark uses WASM-built graphs where the Elixir, Julia, and Objective-C extractors emit module-qualified symbol names (Main.run, App.main, UserService.create_user, etc.). The expected-edges manifests were written with bare unqualified names (run, main, create_user), so every correctly-resolved edge appeared as a false positive and every expected edge appeared as a false negative — causing all three languages to show 0% precision even though resolution was working correctly. Root cause: starting in v3.12.0, cross-module call resolution began working for these languages (via the improved receiver-dispatch and same-class fallback in resolveByMethodOrGlobal / build-edges.ts). With 0 edges previously resolved, the name mismatch was invisible; once edges started resolving, the manifests showed 17 FP (elixir), 11 FP (julia), 6 FP (objc) — all correctly resolved edges misidentified as false positives. Fix: - Update all three expected-edges.json manifests to use the module-qualified names matching actual extractor output: elixir: Main.run, UserService.create_user, Validators.validate_user, etc. julia: App.main, Service.create_user, Repository.new_repo, etc. objc: full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.) plus add main -> run (plain C call correctly resolved) - Ratchet THRESHOLDS for all three: elixir: precision 0.0 -> 1.0, recall 0.0 -> 0.8 (17/21 resolved) julia: precision 0.0 -> 1.0, recall 0.0 -> 0.7 (11/15 resolved) objc: precision 0.0 -> 1.0, recall 0.0 -> 0.4 (6/13 resolved) Remaining FNs are genuine unresolved edges (same-file bare calls in elixir/julia, receiver-typed message sends in objc) — not regressions. Closes #1447

claude · 2026-06-13T01:36:23Z

Claude finished @carlos-alm's task —— View job

PR Review — fix(bench): update elixir/julia/objc expected-edges to module-qualified names

Root Cause Analysis

Problem Being Solved: The expected-edges manifests for elixir, julia, and objc contained bare unqualified function names (run, main, create_user), while the tree-sitter extractors have always emitted module-qualified names (Main.run, App.main, UserService.create_user, full ObjC selectors). Prior to v3.12.0, zero edges resolved for these languages, so the name mismatch was invisible (0 FP, 0 TP — trivially passing at 0.0/0.0). Once v3.12.0's improved cross-module resolution started successfully resolving edges, every correct edge appeared as a false positive and every expected edge as a false negative. This drove apparent precision to 0% while the extractor was working correctly.

Why This Approach: The fix is surgical and correct: update the manifests to match what the extractors actually emit, then ratchet the thresholds to reflect real performance. This is the right direction — the manifests are authoritative ground truth, so they must match actual extractor behavior. The alternative (changing the extractor to emit bare names) would make Elixir/Julia/ObjC inconsistent with every other module-aware language the tool supports.

Risk Assessment: Low. This is a test-data correction, not a production code change. The extractors are unchanged. The thresholds are ratcheted up (from 0.0 to 1.0 precision), which is strictly more restrictive. The PR cannot introduce runtime regressions.

Phase 2: Configuration Change Detection

No Biome rules, TypeScript flags, or build configs touched. The only resolution-benchmark.test.ts change is threshold values for three languages — all moving from { precision: 0.0, recall: 0.0 } to { precision: 1.0, recall: 0.8/0.7/0.4 }. These are ratchets upward, not relaxations. ✓

Phase 3: Test Integrity

No assertions weakened. The change makes tests more strict:

elixir: 0% precision floor → 100% precision floor (any future FP will fail CI immediately)
julia: same — 0% → 100%
objc: same — 0% → 100%

One subtlety worth verifying: the PR description says "81% recall" for elixir but sets the threshold to 0.8 (80%). That's a small but intentional buffer — defensible and consistent with how other languages are handled. No concern.

The notes field added to all manifest entries is genuinely useful: it documents why module-qualified names are used rather than leaving future maintainers confused. ✓

Phase 4: Code Quality

objc/expected-edges.json — one edge added (main → run, plain C call): this new edge is correctly grounded. main.m shows:

void run(void) { ... }
int main(int argc, const char *argv[]) {
    @autoreleasepool { run(); }
}

Both main and run are plain C functions (not ObjC methods), so they correctly use bare names — consistent with how the extractor would see them. The new main → run edge was missing from the original manifest; adding it is correct. ✓

ObjC selector names verified against Service.m and Validators.m:

UserService.initWithRepository: ✓ (trailing colon matches ObjC convention)
UserService.createUserWithId:name:email: ✓
UserService.getUserWithId: ✓
UserService.removeUserWithId: ✓
Validators.isValidEmail: ✓
Validators.isValidName: ✓
UserRepository.saveWithId:name:, findById:, deleteWithId: ✓ (match [_repo ...] calls in Service.m)
Validators.validateUser:email: + its two same-class callees ✓

Elixir manifest verified against main.ex: defmodule Main do / def run do → extractor emits Main.run. All cross-module calls in the file match the updated manifest. ✓

Julia manifest follows the same module-qualification pattern consistently. ✓

Minor nit (resolution-benchmark.test.ts:174): The comment says "81% recall" but the threshold is set to 0.8:

// elixir: cross-module qualified calls resolve at 100% precision, 81% recall.
elixir: { precision: 1.0, recall: 0.8 },

The comment slightly overstates actual precision ("81%") while the threshold is set to 80%. This is intentional headroom, but the discrepancy in the comment could confuse future readers into thinking the manifest was designed to hit exactly 81% when CI only requires 80%. Not blocking, but worth a follow-up fix to say "≥80% recall" or "11/14 recall" to match the actual fraction. Same applies to julia ("73%" comment, 0.7 threshold) and objc ("46%" comment, 0.4 threshold).

Phase 5: Backlog Compliance

Axis	Score	Reason
Zero-dep	✓	No new dependencies
Foundation-aligned	✓	Pure test-data correction; no production code, no cloud calls, no schema changes
Problem-fit	3/5	Fixes CI accuracy metrics. Doesn't directly improve agent navigation or blast-radius detection, but ensures the benchmark correctly reflects real extraction quality — which matters for honest capability tracking
Breaking	No	Test-only change; no API, DB schema, CLI output, or MCP contract touched
Tier	1	Zero-dep + foundation-aligned

Critical Concerns

None. The changes are well-scoped, the root cause is correctly identified, and the fix is the right approach. The thresholds are ratcheted upward (more strict), not downward.

The only genuine question would be: Was the extractor actually correct before v3.12.0? The PR says v3.12.0 improved cross-module resolution, which revealed the manifest mismatch. If zero edges resolved before, the manifests at 0.0/0.0 were vacuously correct — this PR converts them to genuinely informative thresholds. That's a net improvement to test quality.

Final Recommendation

Rating: ⭐⭐⭐⭐ (4/5)
Action: APPROVE
Reasoning: Correct diagnosis, minimal scope, moves tests strictly more restrictive, well-documented. The only reason it's not 5 stars is the minor comment imprecision (81%/73%/46% stated vs 80%/70%/40% enforced) — not worth blocking for, but worth a follow-up tightening.

greptile-apps · 2026-06-13T01:40:42Z

Greptile Summary

This PR fixes stale benchmark manifests for Elixir, Julia, and Objective-C that used bare unqualified function names (run, main, create_user) while the extractors emit module-qualified names (Main.run, App.main, UserService.create_user). The mismatch caused every resolved edge to appear as a false positive and every expected edge as a false negative, collapsing precision to 0% for all three languages and artificially dragging global precision from 89.9% to 84.4%.

Elixir / Julia manifests: all 21 / 15 edge names rewritten to module-qualified form; thresholds raised from 0% to 100% precision / 80% and 70% recall respectively, matching measured extractor output with a conservative floor.
ObjC manifest: selectors updated to include full trailing colons and multi-part labels (createUserWithId:name:email:); one previously-missing plain-C edge (main → run) added; thresholds raised to 100% precision / 40% recall, consistent with the 6/13 TPs produced by static and same-file resolution modes.

Confidence Score: 5/5

Pure benchmark manifest and threshold update — no production code paths are touched, and all 176 resolution tests pass.

The changes are limited to fixture JSON files and threshold constants. The name-qualification math is internally consistent across all three languages (Elixir 17/21 = 80.95%, Julia 11/15 = 73.3%, ObjC 6/13 = 46.2%), and every threshold is set as a conservative floor below the measured value. The new ObjC plain-C edge (main → run) is a basic static resolution that the extractor handles reliably. No harness logic was modified.

No files require special attention.

Important Files Changed

Filename	Overview
tests/benchmarks/resolution/fixtures/elixir/expected-edges.json	All 21 edge names updated to module-qualified form (e.g. validate_user → Validators.validate_user, run → Main.run); notes updated to explain extractor naming convention. Math checks out: 17 cross-module TPs / 21 total = 80.95% ≈ 81% recall, threshold conservatively set to 0.8.
tests/benchmarks/resolution/fixtures/julia/expected-edges.json	All 15 edge names updated to module-qualified form (e.g. main → App.main, create_user → Service.create_user); notes updated. Math: 11 cross-module TPs / 15 total = 73.3%, threshold set conservatively to 0.7.
tests/benchmarks/resolution/fixtures/objc/expected-edges.json	Selector names updated with full trailing colons (e.g. isValidEmail → isValidEmail:, createUserWithId → createUserWithId:name:email:); one new plain-C edge (main → run, static mode) added; multi-part saveWithId: selector corrected to saveWithId:name:. Math: 4 static + 2 same-file = 6 TPs / 13 total = 46.2%, threshold set conservatively to 0.4.
tests/benchmarks/resolution/resolution-benchmark.test.ts	Thresholds for elixir (0→1.0/0.8), julia (0→1.0/0.7), and objc (0→1.0/0.4) raised to reflect actual extractor output; detailed comments explain resolved vs unresolved modes and precision ratchet. No logic changes to the test harness itself.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Extractor emits\nmodule-qualified names\ne.g. Main.run] -->|before fix| B{Manifest had\nbare names\ne.g. run}
    B --> C[Every resolved edge\n→ False Positive\nprecision = 0%]
    B --> D[Every expected edge\n→ False Negative\nrecall = 0%]
    A -->|after fix| E{Manifest now uses\nmodule-qualified names\ne.g. Main.run}
    E --> F[Cross-module calls\n→ True Positives]
    E --> G[Same-file bare calls\n→ False Negatives\nnot yet resolved]
    F --> H[Elixir: 17/21 TP\n81% recall, 100% precision]
    F --> I[Julia: 11/15 TP\n73% recall, 100% precision]
    F --> J[ObjC: 6/13 TP\n46% recall, 100% precision]

_{Reviews (5): Last reviewed commit: "Merge remote-tracking branch 'origin/fix..." | Re-trigger Greptile}

greptile-apps · 2026-06-13T01:40:46Z

+  // objc: class-method static calls and same-class calls resolve at 100% precision, 46% recall.
+  //   Expected-edges now use full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.)
+  //   matching what the ObjC extractor emits. Receiver-typed instance message sends
+  //   are not yet resolved — tracked as FNs. Precision 1.0 acts as ratchet against future FPs.


The ObjC threshold comment says only "Receiver-typed instance message sends are not yet resolved", but the constructor mode edge (run → UserService.initWithRepository:) is also unresolved. With 13 total edges, 6 are resolved (4 static + 2 same-file), giving 6/13 = 46.2% recall. If the constructor were a true positive, recall would be 7/13 = 53.8% — inconsistent with the "46% recall" stated in the comment. The missing mention of the constructor FN could mislead future contributors auditing which modes are covered.

Suggested change

// objc: class-method static calls and same-class calls resolve at 100% precision, 46% recall.

// Expected-edges now use full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.)

// matching what the ObjC extractor emits. Receiver-typed instance message sends

// are not yet resolved — tracked as FNs. Precision 1.0 acts as ratchet against future FPs.

// objc: class-method static calls and same-class calls resolve at 100% precision, 46% recall.

// Expected-edges now use full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.)

// matching what the ObjC extractor emits. Constructor message sends (initWithRepository:) and

// receiver-typed instance message sends are not yet resolved — tracked as FNs.

// Precision 1.0 acts as ratchet against future FPs.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fixed — the comment now lists both constructor message sends (initWithRepository:) and receiver-typed instance message sends as unresolved FNs, so future contributors see the complete picture of what is and isn't covered.

…bjc threshold comment The previous comment only listed receiver-typed instance message sends as unresolved false negatives, omitting the constructor edge (run → UserService.initWithRepository:). Both modes are FNs; the comment now lists them together so future contributors see the complete picture.

carlos-alm · 2026-06-13T01:45:43Z

@greptileai

…58' into fix/v3120-regressions-1446-1447

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

carlos-alm added 2 commits June 12, 2026 20:17

style(tests): expand toEqual object literals to pass Biome format check

aa25c44

Merge remote-tracking branch 'origin/fix/field-typemap-class-scope-14…

263c39b

…58' into fix/v3120-regressions-1446-1447

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bench): update elixir/julia/objc expected-edges to module-qualified names#1496

fix(bench): update elixir/julia/objc expected-edges to module-qualified names#1496
carlos-alm wants to merge 4 commits into
fix/field-typemap-class-scope-1458from
fix/v3120-regressions-1446-1447

carlos-alm commented Jun 13, 2026

Uh oh!

claude Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

carlos-alm Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Jun 13, 2026

Summary

Uh oh!

claude Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review — fix(bench): update elixir/julia/objc expected-edges to module-qualified names

Root Cause Analysis

Phase 2: Configuration Change Detection

Phase 3: Test Integrity

Phase 4: Code Quality

Phase 5: Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 13, 2026 •

edited

Loading

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading