Skip to content

fix(cbm): replace O(N²) ts_node_child loop with O(N) TSTreeCursor in parse_generic_imports#107

Open
halindrome wants to merge 1 commit intoDeusData:mainfrom
halindrome:fix/parse-imports-quadratic
Open

fix(cbm): replace O(N²) ts_node_child loop with O(N) TSTreeCursor in parse_generic_imports#107
halindrome wants to merge 1 commit intoDeusData:mainfrom
halindrome:fix/parse-imports-quadratic

Conversation

@halindrome
Copy link

@halindrome halindrome commented Mar 21, 2026

Fixes #106.

Problem

parse_generic_imports iterated top-level AST nodes with indexed ts_node_child(root, i). This is O(i) per call internally (walks the child linked list from 0 to i via ts_node_child_with_descendant), making the loop O(N²) total. ts_node_next_sibling() has the same problem — it also calls ts_node_child_with_descendant internally.

On repos with generated TypeScript .d.ts files or large Perl files with 5,000–50,000 top-level AST nodes, this causes the indexer to spin at 100% CPU indefinitely.

Fix

Replace with TSTreeCursor. ts_tree_cursor_goto_next_sibling() maintains traversal state and is O(1) per step. Only the iteration mechanism changes — the loop body logic is identical.

// Before (O(N²))
uint32_t count = ts_node_child_count(ctx->root);
for (uint32_t i = 0; i < count; i++) {
    TSNode node = ts_node_child(ctx->root, i);
    ...
}

// After (O(N))
TSTreeCursor cursor = ts_tree_cursor_new(ctx->root);
if (ts_tree_cursor_goto_first_child(&cursor)) {
    do {
        TSNode node = ts_tree_cursor_current_node(&cursor);
        ...
    } while (ts_tree_cursor_goto_next_sibling(&cursor));
}
ts_tree_cursor_delete(&cursor);

Test Results

  • Unit tests: 2042/2042 passed
  • Live test: a monorepo with 6 submodules (TypeScript, Perl, Java, HTML, CSS) that previously hung indefinitely now indexes in 71 seconds — 142,908 nodes, 218,685 edges

…c_imports

ts_node_child(root, i) and ts_node_next_sibling() both call
ts_node_child_with_descendant() internally, making sibling iteration
O(N²) total on roots with many children (e.g. generated TypeScript
.d.ts files or large Perl files with 5,000–50,000 top-level nodes).

Replace with TSTreeCursor: ts_tree_cursor_goto_next_sibling() maintains
traversal state across calls and is O(1) per step, reducing total
complexity from O(N²) to O(N).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@halindrome halindrome changed the title fix(cbm): replace O(N²) indexed loop with O(N) sibling-chain in parse_generic_imports fix(cbm): replace O(N²) ts_node_child loop with O(N) TSTreeCursor in parse_generic_imports Mar 21, 2026
@halindrome
Copy link
Author

QA Round 1

Reviewer: claude-opus-4-6
Date: 2026-03-21
Verdict: ✅ PASS (one MINOR pre-existing note, not a regression)

Findings

[PASS] Cursor traversal correctness
The TSTreeCursor-based loop visits exactly the same set of nodes as the old indexed loop. ts_tree_cursor_new(ctx->root) positions at root, goto_first_child() moves to the first child, and the do { ... } while (goto_next_sibling) pattern iterates all siblings exhaustively. No nodes are skipped or double-visited. The continue on line 591 correctly invokes the while condition, advancing to the next sibling.

[PASS] Resource management
ts_tree_cursor_delete(&cursor) is called on both exit paths: early return when root has no children (lines 585–586), and loop exhaustion (line 629). No leak.

[PASS] Thread safety
TSTreeCursor cursor is stack-allocated within parse_generic_imports(). No shared state.

[PASS] Cursor root vs. child semantics
Confirmed against tree-sitter API header: ts_tree_cursor_new(node) positions on that node; goto_first_child() moves to the first child of root — matching the old ts_node_child(root, 0) behavior. Correct.

[MINOR] Pre-existing fallback logic (not a regression)
Line 609: if (ctx->result->imports.count == 0) checks the cumulative count across all nodes, not whether the current node produced an import. Once any node succeeds via field-path extraction, the text-stripping fallback is permanently disabled for all subsequent nodes. This bug existed in the old indexed-loop code and is not introduced by this PR. Recommend tracking count_before per node in a follow-up.

Summary

The cursor conversion is correct and complete. The pre-existing fallback logic issue is out of scope for this fix. No changes needed for Round 1.

@halindrome
Copy link
Author

QA Round 2

Reviewer: claude-opus-4-6
Date: 2026-03-21
Verdict: ✅ PASS

Findings

[PASS] All call sites accounted for
parse_generic_imports is static with 21 call sites, all within cbm_extract_imports() in the same file. No signature change needed. No external callers.

[PASS] Edge cases correct

  • Empty root: goto_first_child returns false → cursor deleted → early return. Correct.
  • Single child: do-while processes one child, goto_next_sibling returns false → exits cleanly. Correct.
  • do-while is semantically equivalent to the old for loop — same node set, same order, same continue behavior.

[PASS] Test coverage
Tests exist for 13+ languages dispatched through parse_generic_imports (Kotlin, Scala, C#, Dart, Groovy, Perl, Elixir, Haskell, OCaml, Erlang, Zig, Lean, Magma). No large-file performance test exists, but that is a pre-existing gap.

[INFO] Other O(N²) extractors (out of scope)
Seven language-specific extractors (parse_go_imports, parse_python_imports, parse_java_imports, parse_rust_imports, parse_c_imports, parse_ruby_imports, parse_lua_imports) still use the indexed ts_node_child(ctx->root, i) pattern. These are out of scope for this PR but could be converted in a follow-up.

Summary

No issues found. The cursor conversion is correct on all edge cases and all call sites are accounted for. The seven remaining O(N²) extractors are noted for a follow-up issue.

@halindrome
Copy link
Author

QA Complete

Both rounds returned PASS with no confirmed critical or major findings. The PR is ready for review.

Summary of QA rounds:

  • Round 1: ✅ PASS — cursor traversal correct, resource management clean, one pre-existing minor note (fallback count check is per-file not per-node; existed before this PR, out of scope)
  • Round 2: ✅ PASS — all call sites accounted for, edge cases correct, 7 other O(N²) extractors noted as follow-up candidates

No fix commits were needed. The implementation is correct as submitted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stability/performance Server crashes, OOM, hangs, high CPU/memory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parse_generic_imports: O(N²) ts_node_child() loop hangs on large AST roots

3 participants