Skip to content

fix(perf): pass symbolsOnly through parseFilesWasmInline#1489

Open
carlos-alm wants to merge 13 commits into
mainfrom
fix/symbols-only-inline-1437
Open

fix(perf): pass symbolsOnly through parseFilesWasmInline#1489
carlos-alm wants to merge 13 commits into
mainfrom
fix/symbols-only-inline-1437

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

parseFilesWasmForBackfill's symbolsOnly flag was silently ignored for batches ≤16 files because parseFilesWasmInline (the inline path) didn't accept or propagate the option. The optimization from PR #1436 only applied to the worker-pool path (>16 files).

Plumbs symbolsOnly through parseFilesWasmInline → extractor entry points, skipping the _tree assignment that triggers CFG/complexity/dataflow/AST visitor walks when only symbol+call extraction is needed. This makes the optimization effective for typical 1-file incremental rebuilds.

Closes #1437

… diff

Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh
(PostToolUse Bash): the pre-hook captures git status --porcelain to a
per-worktree temp file before each Bash call; the post-hook diffs the
before/after state and appends newly modified or created files to
.claude/session-edits.log.

This closes the gap where files written by sed -i, printf redirects,
tee, heredocs, or build tools (Cargo.lock, lockfiles) were never
recorded, causing guard-git.sh to emit false-positive BLOCKED errors.

Closes #1457
- clojure.rs: annotate lifetime-anchor assignment to silence false-positive
- cfg.rs: remove never-called start_line_of method
- complexity.rs: remove never-constructed NotHandled variant; convert
  irrefutable if-let patterns to plain let destructures
- dataflow.rs: remove never-read callee fields from CallReturn/Destructured
- incremental.rs: remove never-read lang field from CacheEntry

cargo check and cargo clippy both clean after these changes.
Adds .github/workflows/perf-canary.yml — a path-filtered workflow that
fires on PRs touching src/extractors/, src/domain/graph/, or crates/**
and runs only the incremental-benchmark suite (full build + no-op +
1-file rebuild, both engines). Catches the class of regressions that
accumulated invisibly across the Phase 8.x PRs and were only detected
at v3.12.0 publish time.

The regression guard gains BENCH_CANARY=1 mode: raises thresholds to
50%/100%/150% (standard/noisy/WASM) and skips the build, query, and
resolution suites — only incremental checks run. This absorbs shared-
runner timing variance while still blocking catastrophic regressions
(+98% full build, +1827% 1-file rebuild from v3.12.0).

Closes #1433
@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

Claude finished @carlos-alm's task —— View job


PR Review — fix(perf): pass symbolsOnly through parseFilesWasmInline

  • Gather context and read changed files
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity
  • Phase 4: Code quality
  • Phase 5: Backlog compliance
  • Final assessment

Root Cause Analysis

Problem Being Solved: parseFilesWasmForBackfill accepted a symbolsOnly flag to skip CFG/complexity/dataflow/AST visitors, but this only applied to the worker-pool path (>16 files). The inline path (parseFilesWasmInline, triggered for ≤16 files) didn't accept the parameter at all — it unconditionally set _tree on extracted symbols, which causes runAnalyses to fire all AST visitor walks. Since single-file incremental rebuilds always use the inline path, the perf optimization from PR #1436 was effectively a no-op for the most common rebuild case.

Why This Approach: The fix is minimal and correct — add symbolsOnly = false as a parameter to parseFilesWasmInline, then conditionally skip the _tree assignment. The parseFilesWasmForBackfill router already had the right interface; it just wasn't forwarding the value into the inline path. One-line caller change + ~12 lines in the callee.

Risk Assessment: Low. The change path is parseFilesWasmForBackfillparseFilesWasmInline → conditional _tree skip. The symbolsOnly callers (native-orchestrator.ts) are already passing { symbolsOnly: true } for the this/super dispatch post-pass. The only new behavior is that small-batch inline calls now skip _tree when the caller wants them to. The dropped-language backfill path correctly leaves symbolsOnly unset (calling without opts), so that path retains full analysis data as required.


Phase 2: Configuration Change Detection

.claude/settings.json (+10/-0): Adds two new PostToolUse hooks (track-bash-writes.sh, track-moves.sh, post-git-ops.sh) and wires snapshot-pre-bash.sh as a PreToolUse Bash hook. These are developer tooling enhancements to the session tracking/audit infrastructure, not linting rules relaxed or thresholds lowered. ✓

.claude/hooks/snapshot-pre-bash.sh (ADDED), .claude/hooks/track-bash-writes.sh (ADDED): New hooks that snapshot git status --porcelain before/after Bash tool calls to detect files written by Bash commands. Both exit 0 (informational only). ✓

.github/workflows/perf-canary.yml (ADDED): A new, lightweight per-PR CI workflow that runs incremental benchmarks with BENCH_CANARY=1 (50% regression threshold). Scoped to paths that caused the v3.12.0 perf regression (src/extractors/**, src/domain/graph/**, crates/**). Appropriate and additive. ✓

tests/benchmarks/regression-guard.test.ts: Adds BENCH_CANARY mode, WASM_TIMING_THRESHOLD widening, and SIZE_METRICS — structural improvements to the guard's tolerance model. No test assertions are weakened; WASM timing tolerances are widened based on documented empirical variance, while native timings keep their strict thresholds. ✓


Phase 3: Test Integrity

tests/unit/visitor.test.ts (+0/-1): Removes // biome-ignore lint/... suppression. The suppressed comment was dead weight after a prior refactor — removing it is a clean-up, not a weakening. ✓

tests/benchmarks/regression-guard.test.ts: Adding BENCH_CANARY mode is gating-additive (new CI coverage) not softening. Adding WASM_TIMING_THRESHOLD = 0.75 and SIZE_METRICS is well-reasoned with referenced empirical data (run IDs provided inline). The existing KNOWN_REGRESSIONS entries are pruned appropriately; new entries for 3.12.0:Full build and 3.12.0:1-file rebuild include detailed root-cause comments with local A/B measurements. ✓

Coverage gap: There are no new unit tests specifically for the parseFilesWasmInline(symbolsOnly=true) code path. The fix is behavioral (skipping _tree assignment) and integration-tested indirectly via the incremental benchmark canary, but a dedicated unit/integration test asserting that symbolsOnly=true produces symbols without triggering visitor walks would be valuable for regression prevention.


Phase 4: Code Quality

src/domain/parser.tsparseFilesWasmInline (lines 1207–1238):
The implementation is clean and correct. The symbolsOnly default-false parameter is typed and documented. The conditional:

if (!symbolsOnly) {
  symbols._tree = extracted.tree;
}

correctly gates the _tree assignment. The WASM tree object is still held in extracted.tree regardless and will be GC'd; no WASM memory leak introduced since _tree was only a reference and not the sole retention path.

⚠️ Minor concern: When symbolsOnly=true, the WASM tree object (extracted.tree) is parsed but immediately becomes unreachable without an explicit .delete() call. web-tree-sitter trees are backed by WASM linear memory and require explicit disposal to avoid leaks (the existing code at line 1078 in wasmExtractSymbols already handles the extractor-error case). For the symbolsOnly=true path, the tree is parsed but never assigned — it will leak until GC fires. At incremental rebuild frequency (1–5 files/rebuild) this is likely negligible, but it's inconsistent with the codebase's existing tree-disposal discipline.

crates/codegraph-core/src/ast_analysis/complexity.rs: The BranchAction::NotHandled variant removal is a clean refactor — classify_branch is now guaranteed to always return Handled, making the enum a newtype wrapper. The if let → irrefutable let change is idiomatic Rust. Behavior is unchanged. ✓

crates/codegraph-core/src/ast_analysis/cfg.rs: Removes a dead private method start_line_of. Clean-up. ✓

crates/codegraph-core/src/ast_analysis/dataflow.rs: Removes the callee field from LocalSource::CallReturn and LocalSource::Destructured variants. These appear to have been stale data (the callee was stored but the enum variants were only matched structurally LocalSource::Destructured { .. }). The removal eliminates the dead field. ✓

crates/codegraph-core/src/extractors/clojure.rs: Adds #[allow(unused_assignments)] to suppress the Rust lint for the next_ns_owned pattern used to extend a borrow lifetime within the match. The comment is accurate and the allow is scoped tightly. ✓

scripts/benchmark.ts: Adds --warmup flag handling to stabilize timing measurements. ✓

.claude/skills/titan-run/SKILL.md: Minor documentation update. ✓

.gitignore: Adds crates/codegraph-core/index.d.ts, .js, .node napi build artifacts. ✓


Phase 5: Backlog Compliance

  • Zero-dep: ✓ No new runtime dependencies added.
  • Foundation-aligned: ✓ Pure local optimization — no cloud calls, no new infrastructure, no agent behavior.
  • Problem-fit: 4/5 — Directly reduces incremental rebuild latency for AI-assisted workflows. Codegraph is its own primary user; faster incremental rebuilds mean hooks complete faster, reducing friction in the development loop. The optimization was already declared in fix(perf): consolidate extractor walks and scope native post-passes (v3.12.0 publish gate) #1436 but silently not applied to the most common path.
  • Breaking: No — Purely internal optimization; no API/CLI/DB schema change. Callers of parseFilesWasmForBackfill are unaffected; the symbolsOnly semantics are unchanged.
  • Tier: 1

Critical Concerns

1. WASM tree leak on symbolsOnly=true inline path (minor, not blocking):
In parseFilesWasmInline at src/domain/parser.ts:1224-1232, when symbolsOnly=true, wasmExtractSymbols parses a full WASM tree (via parser.parse(code)) and returns { symbols, tree, langId }. The tree object is then discarded (not assigned to symbols._tree). Since web-tree-sitter trees are WASM-backed, this is a minor leak. At 1-file-incremental scale the impact is negligible, but it's worth a if (symbolsOnly && typeof extracted.tree?.delete === 'function') extracted.tree.delete() cleanup.

2. Scope broadening of this PR:
The core fix is src/domain/parser.ts:1207-1238 (~12 lines + doc). The PR also bundles: a new CI workflow (perf-canary.yml), benchmark warmup improvements, a regression guard overhaul, two new Claude hooks, .claude/settings.json changes, Rust dead-code cleanup, Clojure lint suppression, and a gitignore update. These are all legitimate and coherent, but CLAUDE.md recommends "One PR = one concern." In practice all changes are additive/cleanup and don't muddy the correctness review; the core fix is easily isolable.


Final Recommendation

## Root Cause Analysis
Problem Being Solved: symbolsOnly flag was ignored for ≤16-file batches using the inline WASM path
Why This Approach: Minimal, correct fix — forward the flag into parseFilesWasmInline
Risk Assessment: Low — additive change, no API/behavior change for existing callers

## Backlog Compliance
- Zero-dep: ✓
- Foundation-aligned: ✓
- Problem-fit: 4/5 — directly speeds up common incremental rebuilds
- Breaking: No
- Tier: 1

## Critical Concerns
1. Minor: Parsed WASM tree is leaked when symbolsOnly=true in inline path (no .delete() call)
2. Style: PR bundles several unrelated improvements; CLAUDE.md prefers one concern per PR
   (not blocking — all changes are coherent and defensible)

## Final Recommendation
Rating: ⭐⭐⭐⭐ (4/5)
Action: APPROVE WITH CONDITIONS
Reasoning: The core fix is correct, well-scoped, and addresses a real perf gap.
The WASM tree leak is minor but worth fixing before merge. Everything else is
additive quality improvement. Consider extracting the hook/settings changes into
a separate PR on the next opportunity to stay aligned with CLAUDE.md scope discipline.

@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes the symbolsOnly flag being silently ignored for batches ≤16 files by plumbing it through parseFilesWasmInline, making the parse-only optimization effective for the common 1-file incremental rebuild case. It also adds supporting changes to improve benchmark reliability and parity between the TypeScript and Rust engines.

  • parser.ts: parseFilesWasmInline gains a symbolsOnly parameter (default false); when set, _tree is not stored and the WASM tree is explicitly disposed to prevent WASM heap leaks on every incremental rebuild.
  • build-edges.ts: Targets are sorted by confidence descending before the dedup loop, matching the Rust engine's sort_targets_by_confidence and guaranteeing highest-confidence edges win.
  • build_edges.rs: Pure refactor — groups emit_pts_alias_edges arguments into a PtsAliasCtx struct, eliminating the clippy::too_many_arguments suppression.
  • Benchmarking: Adds 2 warmup runs, increases measurement runs from 3→5, adds BENCH_CANARY mode with 50%/150% thresholds, and introduces a new per-PR perf canary workflow.

Confidence Score: 5/5

Safe to merge — the change is a targeted one-line propagation of an existing flag, with correct WASM tree disposal and a clean Rust refactor alongside it.

The core change in parser.ts is minimal and correct: a default-false parameter is threaded through, the tree is explicitly freed on the new code path, and the worker-pool path is unchanged. The build-edges.ts sort is logically sound (descending + first-wins = highest confidence wins). All surrounding benchmark and CI changes are additive and guarded behind environment flags.

No files require special attention.

Important Files Changed

Filename Overview
src/domain/parser.ts Core fix: adds symbolsOnly parameter to parseFilesWasmInline and propagates it from parseFilesWasmForBackfill; includes correct WASM tree disposal when symbolsOnly=true
src/domain/graph/builder/stages/build-edges.ts Adds sort-by-confidence before edge dedup loop to guarantee highest-confidence target wins; misleading comment about 'last processed' but implementation is correct
crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs Pure refactor: groups emit_pts_alias_edges arguments into PtsAliasCtx struct, removing clippy::too_many_arguments suppression; no logic changes
scripts/benchmark.ts Adds 2 warmup runs before no-op and 1-file rebuild measurements; increases measurement runs from 3 to 5; original file is safely restored in finally block
tests/benchmarks/regression-guard.test.ts Adds BENCH_CANARY mode with looser thresholds (50%/100%/150%) and skips build/query/resolution suites; only runs incremental checks in canary CI
.github/workflows/perf-canary.yml New lightweight per-PR performance canary workflow; now includes src/domain/parser.ts in path filter (previously flagged thread is resolved)

Sequence Diagram

sequenceDiagram
    participant Caller as Orchestrator (post-pass)
    participant BF as parseFilesWasmForBackfill
    participant IL as parseFilesWasmInline
    participant WP as parseFilesWasm (worker pool)
    participant WE as wasmExtractSymbols

    Caller->>BF: "filePaths, rootDir, { symbolsOnly: true }"
    alt "filePaths.length <= 16 (INLINE_BACKFILL_THRESHOLD)"
        BF->>IL: "filePaths, rootDir, symbolsOnly=true"
        loop each file
            IL->>WE: parsers, filePath, code
            WE-->>IL: "{ symbols, tree, langId }"
            alt "symbolsOnly=true"
                IL->>IL: tree.delete() — free WASM heap
                note over IL: _tree NOT set on symbols
            else "symbolsOnly=false"
                IL->>IL: "symbols._tree = tree"
            end
        end
        IL-->>BF: "Map<relPath, ExtractorOutput>"
    else "filePaths.length > 16"
        BF->>WP: filePaths, rootDir, EXTRACT_ONLY
        WP-->>BF: "Map<relPath, ExtractorOutput>"
    end
    BF-->>Caller: "Map<relPath, ExtractorOutput>"
Loading

Reviews (5): Last reviewed commit: "chore: merge origin/main into fix/symbol..." | Re-trigger Greptile

Comment on lines +19 to +26
paths:
- "src/extractors/**"
- "src/domain/graph/**"
- "crates/**"
- "scripts/benchmark.ts"
- "scripts/incremental-benchmark.ts"
- "scripts/lib/bench-config.ts"
- "scripts/lib/fork-engine.ts"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Path filter misses src/domain/parser.ts

parseFilesWasmForBackfill and parseFilesWasmInline live in src/domain/parser.ts, but that path is not in the paths: filter. If a future PR touches only parser.ts (exactly as this one does), the canary will silently not run, leaving the very optimization this fix plumbs — skipping _tree assignment on the inline path — unguarded against performance regressions. The PR description explicitly calls out that 1-file incremental rebuilds are the target scenario, so measuring them without guarding parser.ts changes is an incomplete safety net.

Suggested change
paths:
- "src/extractors/**"
- "src/domain/graph/**"
- "crates/**"
- "scripts/benchmark.ts"
- "scripts/incremental-benchmark.ts"
- "scripts/lib/bench-config.ts"
- "scripts/lib/fork-engine.ts"
paths:
- "src/extractors/**"
- "src/domain/graph/**"
- "src/domain/parser.ts"
- "crates/**"
- "scripts/benchmark.ts"
- "scripts/incremental-benchmark.ts"
- "scripts/lib/bench-config.ts"
- "scripts/lib/fork-engine.ts"

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added src/domain/parser.ts to the paths: filter in .github/workflows/perf-canary.yml. The inline-path optimization introduced by this PR is now guarded against future regressions in parseFilesWasmInline/parseFilesWasmForBackfill.

Also took the opportunity to fix the minor WASM tree leak raised by the Claude review: parseFilesWasmInline now calls .delete() on the parsed tree when symbolsOnly=true, matching the disposal pattern already used by the extractor-error path.

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Codegraph Impact Analysis

5 functions changed29 callers affected across 8 files

  • emit_pts_alias_edges in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:380 (16 transitive callers)
  • process_file in crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs:440 (15 transitive callers)
  • buildFileCallEdges in src/domain/graph/builder/stages/build-edges.ts:979 (3 transitive callers)
  • parseFilesWasmInline in src/domain/parser.ts:1207 (6 transitive callers)
  • parseFilesWasmForBackfill in src/domain/parser.ts:1259 (9 transitive callers)

…sOnly inline path

Guard the inline-path optimization against regressions by adding
src/domain/parser.ts to the perf-canary paths filter — without it the
canary silently skips exactly the file this PR modifies.

Also free the WASM-backed tree when symbolsOnly=true in parseFilesWasmInline.
The tree was being parsed and then discarded without an explicit .delete() call,
leaking WASM linear memory on every incremental rebuild that triggers the inline
path.  The deletion is guarded by a typeof check, matching the pattern used by
the extractor-error path at line 1078.
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Addressed both concerns from the review:

  1. WASM tree leak on symbolsOnly=true inline path: Added .delete() call in parseFilesWasmInline when symbolsOnly=true and the tree object exposes a delete method. Matches the existing disposal pattern used by the extractor-error path at line 1078.

  2. Perf canary missing src/domain/parser.ts: Added the path to the canary paths: filter in .github/workflows/perf-canary.yml so the inline-path optimization is guarded by the canary on future parser-only changes.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

Resolves add/add conflicts in .claude/hooks/snapshot-pre-bash.sh and
.claude/hooks/track-bash-writes.sh — taking the origin/main version
which adds CMD_HASH to the snapshot filename to prevent parallel Bash
calls from overwriting each other's baseline.
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Resolved merge conflicts with origin/main — two add/add conflicts in .claude/hooks/snapshot-pre-bash.sh and .claude/hooks/track-bash-writes.sh. Both resolved by taking the origin/main version, which adds CMD_HASH to the snapshot filename to prevent concurrent Bash calls from overwriting each other's baseline. PR is now mergeable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(perf): pass symbolsOnly to parseFilesWasmInline for small-batch incremental builds

1 participant