perf(copy): limit find depth for simple directory patterns#130
Conversation
When copying directories with simple basename patterns (e.g., `.serena`, `node_modules`), `find` was scanning the entire repository tree recursively. In repos with large directories like `node_modules` (2GB+), this caused ~11 seconds of unnecessary I/O per pattern. Add `-maxdepth 1` for non-slash patterns since `gtr.copy.includeDirs` entries are typically top-level directories. Patterns with slashes (e.g., `vendor/bundle`) retain recursive behavior via `-path`. Before: ~11s per pattern (full recursive find) After: ~0.03s per pattern (top-level only) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 WalkthroughThe Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/copy.sh`:
- Around line 334-339: The change to use "-maxdepth 1" in the directory lookup
(the find invocation that sets find_results in lib/copy.sh) silently restricts
matches to top-level directories and drops nested matches; revert or make this
depth limitation opt-in and add a single log_info when depth is restricted so
callers are warned. Specifically, update the case handling around pattern and
find_results: either remove "-maxdepth 1" for the "*" branch to preserve
recursive behavior, or add a flag/variable that preserves previous recursive
semantics for patterns without a slash; in either approach emit one log_info
(e.g., via log_info) when the non-recursive (-maxdepth 1) path is taken, and
update the function/doc comment that describes the parameter to call out the
depth restriction if you keep it. Ensure you reference and modify the find
invocation that assigns find_results and the function comment near the existing
lines.
- Around line 334-339: The change restricting basename-only finds to "find .
-maxdepth 1 -type d -name \"$pattern\"" silently drops nested matches; revert or
relax that behavior and update the function doc comment: either remove the
"-maxdepth 1" branch for the "*" case so find_results=$(find . -type d -name
"$pattern" 2>/dev/null) matches at any depth (preserving previous behavior), or
add a fallback that re-runs the non-maxdepth find when the maxdepth search
returns no results; also update the function documentation (the docstring that
mentions directory names to copy) to explicitly document the depth semantics,
and add a log_info/log_warn when find_results is empty so users are warned when
a pattern yields zero matches (use the existing pattern and find_results
variables and the surrounding function that performs the copy).
- Line 338: The find invocation that sets find_results ("find . -maxdepth 1
-type d -name \"$pattern\" 2>/dev/null") can return non-zero and break the
script under set -e; update that command to append "|| true" so failures are
ignored (consistent with other find uses such as the -path "./$pattern"
occurrences and lib/commands/clean.sh) and ensure find_results assignment and
subsequent logic still behave the same when no results or an error occurs.
- Line 338: The two unguarded find command substitutions that set find_results
inside the case statement (the occurrences assigning to find_results) can cause
the script to exit under set -e if find returns non-zero; update both
assignments so the command substitution appends "|| true" after the find
invocation (i.e., run find ... 2>/dev/null || true) to ensure the substitution
never fails and the script won't abort unexpectedly.
…ories - Add || true to all find invocations to prevent silent exits under set -e - Fall back to recursive find when shallow (-maxdepth 1) search finds nothing, preserving backward compatibility for nested directory patterns
Summary
-maxdepth 1tofindfor simple basename patterns incopy_directories()to avoid scanning the entire repository treeProblem
When
gtr.copy.includeDirscontains simple basenames like.serenaor.osgrep, thefind . -type d -name "$pattern"command recursively scans the entire repository. In repos with large directories (e.g.,node_modulesat 2GB+, 3770 packages), this adds ~5-6 seconds per pattern — totaling ~11 seconds of unnecessary I/O just for directory discovery.This is particularly impactful for worktree creation workflows where the total time budget is tight (e.g., 30-second timeouts in CI/MCP tool integrations).
Solution
For patterns without slashes (simple basenames), use
find . -maxdepth 1sincegtr.copy.includeDirsentries are typically top-level directories. Patterns with slashes (e.g.,vendor/bundle) retain full recursive behavior via-path.Benchmark
Tested on a monorepo with 2GB
node_modules:findper patterngit gtr new🤖 Generated with Claude Code
Summary by CodeRabbit