fix(bench): update elixir/julia/objc expected-edges to module-qualified names#1496
fix(bench): update elixir/julia/objc expected-edges to module-qualified names#1496carlos-alm wants to merge 4 commits into
Conversation
…ed names
The resolution benchmark uses WASM-built graphs where the Elixir, Julia,
and Objective-C extractors emit module-qualified symbol names (Main.run,
App.main, UserService.create_user, etc.). The expected-edges manifests
were written with bare unqualified names (run, main, create_user), so
every correctly-resolved edge appeared as a false positive and every
expected edge appeared as a false negative — causing all three languages
to show 0% precision even though resolution was working correctly.
Root cause: starting in v3.12.0, cross-module call resolution began working
for these languages (via the improved receiver-dispatch and same-class
fallback in resolveByMethodOrGlobal / build-edges.ts). With 0 edges
previously resolved, the name mismatch was invisible; once edges started
resolving, the manifests showed 17 FP (elixir), 11 FP (julia), 6 FP
(objc) — all correctly resolved edges misidentified as false positives.
Fix:
- Update all three expected-edges.json manifests to use the
module-qualified names matching actual extractor output:
elixir: Main.run, UserService.create_user, Validators.validate_user, etc.
julia: App.main, Service.create_user, Repository.new_repo, etc.
objc: full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.)
plus add main -> run (plain C call correctly resolved)
- Ratchet THRESHOLDS for all three:
elixir: precision 0.0 -> 1.0, recall 0.0 -> 0.8 (17/21 resolved)
julia: precision 0.0 -> 1.0, recall 0.0 -> 0.7 (11/15 resolved)
objc: precision 0.0 -> 1.0, recall 0.0 -> 0.4 (6/13 resolved)
Remaining FNs are genuine unresolved edges (same-file bare calls in
elixir/julia, receiver-typed message sends in objc) — not regressions.
Closes #1447
|
Claude finished @carlos-alm's task —— View job PR Review — fix(bench): update elixir/julia/objc expected-edges to module-qualified names
Root Cause AnalysisProblem Being Solved: The expected-edges manifests for elixir, julia, and objc contained bare unqualified function names ( Why This Approach: The fix is surgical and correct: update the manifests to match what the extractors actually emit, then ratchet the thresholds to reflect real performance. This is the right direction — the manifests are authoritative ground truth, so they must match actual extractor behavior. The alternative (changing the extractor to emit bare names) would make Elixir/Julia/ObjC inconsistent with every other module-aware language the tool supports. Risk Assessment: Low. This is a test-data correction, not a production code change. The extractors are unchanged. The thresholds are ratcheted up (from 0.0 to 1.0 precision), which is strictly more restrictive. The PR cannot introduce runtime regressions. Phase 2: Configuration Change DetectionNo Biome rules, TypeScript flags, or build configs touched. The only Phase 3: Test IntegrityNo assertions weakened. The change makes tests more strict:
One subtlety worth verifying: the PR description says "81% recall" for elixir but sets the threshold to 0.8 (80%). That's a small but intentional buffer — defensible and consistent with how other languages are handled. No concern. The Phase 4: Code Quality
void run(void) { ... }
int main(int argc, const char *argv[]) {
@autoreleasepool { run(); }
}Both ObjC selector names verified against
Elixir manifest verified against Julia manifest follows the same module-qualification pattern consistently. ✓ Minor nit ( // elixir: cross-module qualified calls resolve at 100% precision, 81% recall.
elixir: { precision: 1.0, recall: 0.8 },The comment slightly overstates actual precision ("81%") while the threshold is set to 80%. This is intentional headroom, but the discrepancy in the comment could confuse future readers into thinking the manifest was designed to hit exactly 81% when CI only requires 80%. Not blocking, but worth a follow-up fix to say "≥80% recall" or "11/14 recall" to match the actual fraction. Same applies to julia ("73%" comment, Phase 5: Backlog Compliance
Critical ConcernsNone. The changes are well-scoped, the root cause is correctly identified, and the fix is the right approach. The thresholds are ratcheted upward (more strict), not downward. The only genuine question would be: Was the extractor actually correct before v3.12.0? The PR says v3.12.0 improved cross-module resolution, which revealed the manifest mismatch. If zero edges resolved before, the manifests at 0.0/0.0 were vacuously correct — this PR converts them to genuinely informative thresholds. That's a net improvement to test quality. Final Recommendation
|
Greptile SummaryThis PR fixes stale benchmark manifests for Elixir, Julia, and Objective-C that used bare unqualified function names (
Confidence Score: 5/5Pure benchmark manifest and threshold update — no production code paths are touched, and all 176 resolution tests pass. The changes are limited to fixture JSON files and threshold constants. The name-qualification math is internally consistent across all three languages (Elixir 17/21 = 80.95%, Julia 11/15 = 73.3%, ObjC 6/13 = 46.2%), and every threshold is set as a conservative floor below the measured value. The new ObjC plain-C edge (main → run) is a basic static resolution that the extractor handles reliably. No harness logic was modified. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Extractor emits\nmodule-qualified names\ne.g. Main.run] -->|before fix| B{Manifest had\nbare names\ne.g. run}
B --> C[Every resolved edge\n→ False Positive\nprecision = 0%]
B --> D[Every expected edge\n→ False Negative\nrecall = 0%]
A -->|after fix| E{Manifest now uses\nmodule-qualified names\ne.g. Main.run}
E --> F[Cross-module calls\n→ True Positives]
E --> G[Same-file bare calls\n→ False Negatives\nnot yet resolved]
F --> H[Elixir: 17/21 TP\n81% recall, 100% precision]
F --> I[Julia: 11/15 TP\n73% recall, 100% precision]
F --> J[ObjC: 6/13 TP\n46% recall, 100% precision]
Reviews (5): Last reviewed commit: "Merge remote-tracking branch 'origin/fix..." | Re-trigger Greptile |
| // objc: class-method static calls and same-class calls resolve at 100% precision, 46% recall. | ||
| // Expected-edges now use full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.) | ||
| // matching what the ObjC extractor emits. Receiver-typed instance message sends | ||
| // are not yet resolved — tracked as FNs. Precision 1.0 acts as ratchet against future FPs. |
There was a problem hiding this comment.
The ObjC threshold comment says only "Receiver-typed instance message sends are not yet resolved", but the
constructor mode edge (run → UserService.initWithRepository:) is also unresolved. With 13 total edges, 6 are resolved (4 static + 2 same-file), giving 6/13 = 46.2% recall. If the constructor were a true positive, recall would be 7/13 = 53.8% — inconsistent with the "46% recall" stated in the comment. The missing mention of the constructor FN could mislead future contributors auditing which modes are covered.
| // objc: class-method static calls and same-class calls resolve at 100% precision, 46% recall. | |
| // Expected-edges now use full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.) | |
| // matching what the ObjC extractor emits. Receiver-typed instance message sends | |
| // are not yet resolved — tracked as FNs. Precision 1.0 acts as ratchet against future FPs. | |
| // objc: class-method static calls and same-class calls resolve at 100% precision, 46% recall. | |
| // Expected-edges now use full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.) | |
| // matching what the ObjC extractor emits. Constructor message sends (initWithRepository:) and | |
| // receiver-typed instance message sends are not yet resolved — tracked as FNs. | |
| // Precision 1.0 acts as ratchet against future FPs. |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed — the comment now lists both constructor message sends (initWithRepository:) and receiver-typed instance message sends as unresolved FNs, so future contributors see the complete picture of what is and isn't covered.
…bjc threshold comment The previous comment only listed receiver-typed instance message sends as unresolved false negatives, omitting the constructor edge (run → UserService.initWithRepository:). Both modes are FNs; the comment now lists them together so future contributors see the complete picture.
…58' into fix/v3120-regressions-1446-1447
Summary
run,main,create_user), but the extractors emit module-qualified names (Main.run,App.main,UserService.create_user, full ObjC selectors). When v3.12.0's improved cross-module resolution started successfully resolving edges for these languages, every correct edge appeared as a false positive (name mismatch with manifest) and every expected edge appeared as false negative — driving global precision from 89.9% to 84.4%.mod.rsis clean (each language has its own extractor). The real cause was stale expected-edges manifests that never matched actual extractor output.KNOWN_REGRESSIONSas3.12.0:Full buildand3.12.0:1-file rebuild. No code change needed for that issue — the benchmark guard already accounts for it.Changes:
Closes #1447