Skip to content

perf(copy): limit find depth for simple directory patterns#130

Merged
helizaga merged 2 commits intocoderabbitai:mainfrom
akirasosa:perf/limit-find-depth-for-copy-dirs
Feb 18, 2026
Merged

perf(copy): limit find depth for simple directory patterns#130
helizaga merged 2 commits intocoderabbitai:mainfrom
akirasosa:perf/limit-find-depth-for-copy-dirs

Conversation

@akirasosa
Copy link
Contributor

@akirasosa akirasosa commented Feb 18, 2026

Summary

  • Add -maxdepth 1 to find for simple basename patterns in copy_directories() to avoid scanning the entire repository tree

Problem

When gtr.copy.includeDirs contains simple basenames like .serena or .osgrep, the find . -type d -name "$pattern" command recursively scans the entire repository. In repos with large directories (e.g., node_modules at 2GB+, 3770 packages), this adds ~5-6 seconds per pattern — totaling ~11 seconds of unnecessary I/O just for directory discovery.

This is particularly impactful for worktree creation workflows where the total time budget is tight (e.g., 30-second timeouts in CI/MCP tool integrations).

Solution

For patterns without slashes (simple basenames), use find . -maxdepth 1 since gtr.copy.includeDirs entries are typically top-level directories. Patterns with slashes (e.g., vendor/bundle) retain full recursive behavior via -path.

Benchmark

Tested on a monorepo with 2GB node_modules:

Metric Before After
find per pattern ~5.8s ~0.01s
Total copy phase (2 patterns) ~10.9s ~0.03s
Full git gtr new ~34s ~24s

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved directory pattern matching to prefer shallow (immediate-child) searches first, reducing unnecessary deep scans and improving performance.
    • Added a fallback to full recursion when no shallow matches are found; note this can change which directories are matched when multiple nested levels share the same name.

When copying directories with simple basename patterns (e.g., `.serena`,
`node_modules`), `find` was scanning the entire repository tree recursively.
In repos with large directories like `node_modules` (2GB+), this caused
~11 seconds of unnecessary I/O per pattern.

Add `-maxdepth 1` for non-slash patterns since `gtr.copy.includeDirs`
entries are typically top-level directories. Patterns with slashes
(e.g., `vendor/bundle`) retain recursive behavior via `-path`.

Before: ~11s per pattern (full recursive find)
After:  ~0.03s per pattern (top-level only)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@akirasosa akirasosa requested a review from NatoBoram as a code owner February 18, 2026 00:17
@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

No actionable comments were generated in the recent review. 🎉


Walkthrough

The copy_directories function in lib/copy.sh now does a two-step search: for basename patterns it first searches shallowly using -maxdepth 1, and if no matches are found it falls back to a recursive find -type d -name search. Path-containing patterns use a similar shallow-then-recursive approach.

Changes

Cohort / File(s) Summary
Find Pattern Matching
lib/copy.sh
Reworked directory search strategy: for slash-less basenames use a shallow -maxdepth 1 find first, with a fallback to recursive -type d -name if no matches; path-containing patterns also try a shallow path search before full recursion.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through dirs both near and far,
I learned to peek just where the children are,
If nothing shows in that shallow light,
I dive deep once more into the night,
A careful hop, then onward flight. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding -maxdepth 1 to limit find depth for simple directory patterns in the copy functionality, which is the core performance optimization in this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/copy.sh`:
- Around line 334-339: The change to use "-maxdepth 1" in the directory lookup
(the find invocation that sets find_results in lib/copy.sh) silently restricts
matches to top-level directories and drops nested matches; revert or make this
depth limitation opt-in and add a single log_info when depth is restricted so
callers are warned. Specifically, update the case handling around pattern and
find_results: either remove "-maxdepth 1" for the "*" branch to preserve
recursive behavior, or add a flag/variable that preserves previous recursive
semantics for patterns without a slash; in either approach emit one log_info
(e.g., via log_info) when the non-recursive (-maxdepth 1) path is taken, and
update the function/doc comment that describes the parameter to call out the
depth restriction if you keep it. Ensure you reference and modify the find
invocation that assigns find_results and the function comment near the existing
lines.
- Around line 334-339: The change restricting basename-only finds to "find .
-maxdepth 1 -type d -name \"$pattern\"" silently drops nested matches; revert or
relax that behavior and update the function doc comment: either remove the
"-maxdepth 1" branch for the "*" case so find_results=$(find . -type d -name
"$pattern" 2>/dev/null) matches at any depth (preserving previous behavior), or
add a fallback that re-runs the non-maxdepth find when the maxdepth search
returns no results; also update the function documentation (the docstring that
mentions directory names to copy) to explicitly document the depth semantics,
and add a log_info/log_warn when find_results is empty so users are warned when
a pattern yields zero matches (use the existing pattern and find_results
variables and the surrounding function that performs the copy).
- Line 338: The find invocation that sets find_results ("find . -maxdepth 1
-type d -name \"$pattern\" 2>/dev/null") can return non-zero and break the
script under set -e; update that command to append "|| true" so failures are
ignored (consistent with other find uses such as the -path "./$pattern"
occurrences and lib/commands/clean.sh) and ensure find_results assignment and
subsequent logic still behave the same when no results or an error occurs.
- Line 338: The two unguarded find command substitutions that set find_results
inside the case statement (the occurrences assigning to find_results) can cause
the script to exit under set -e if find returns non-zero; update both
assignments so the command substitution appends "|| true" after the find
invocation (i.e., run find ... 2>/dev/null || true) to ensure the substitution
never fails and the script won't abort unexpectedly.

…ories

- Add || true to all find invocations to prevent silent exits under set -e
- Fall back to recursive find when shallow (-maxdepth 1) search finds nothing,
  preserving backward compatibility for nested directory patterns
@helizaga helizaga merged commit 0f3ef38 into coderabbitai:main Feb 18, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants