Skip to content

fix: Improve file walker indexing speed#105

Merged
Pringled merged 9 commits into
mainfrom
fix-file-walker-ignore
May 18, 2026
Merged

fix: Improve file walker indexing speed#105
Pringled merged 9 commits into
mainfrom
fix-file-walker-ignore

Conversation

@Pringled
Copy link
Copy Markdown
Member

@Pringled Pringled commented May 18, 2026

walk_files was encoding extension filters as !*.ext negation patterns in GitIgnoreSpec, causing 64k+ regex match_file calls per repo walk instead of a simple frozenset lookup. Fixed by moving extension filtering back to item.suffix.lower() in extensions_set in _walk, leaving only directory patterns in the spec.

This also restores correct handling of explicit file inclusions: a file negated via a pattern with a file extension suffix (e.g. !special.kjs, !.yaml) now bypasses the extension filter as intended, while directory patterns (!vendor/) and suffix-less globs (!.github/) do not.

Metric v0.1.7 main this branch
NDCG@10 0.854 0.853 0.853
p50 query 1.20ms 1.21ms 1.16ms
p90 query 4.83ms 5.02ms 4.81ms
p95 query 5.46ms 5.61ms 5.39ms
p99 query 6.35ms 6.50ms 6.28ms
index_ms 839ms 1435ms 483ms

@Pringled Pringled requested a review from stephantul May 18, 2026 10:09
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/semble/index/file_walker.py 100.00% <100.00%> (ø)
src/semble/version.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Pringled Pringled merged commit 1a998b5 into main May 18, 2026
16 checks passed
@Pringled Pringled deleted the fix-file-walker-ignore branch May 18, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant