⚡ Bolt: Optimize DependencyGraph traversal algorithms#152
⚡ Bolt: Optimize DependencyGraph traversal algorithms#152bashandbone wants to merge 1 commit intomainfrom
Conversation
Reduces `PathBuf` cloning and memory allocation in `DependencyGraph::find_affected_files` and `topological_sort` by passing and storing borrowed `&Path` values directly, converting back to owned strings only at the end. Benchmark results show ~10% improvement in graph traversals. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideOptimizes DependencyGraph traversal APIs by switching internal traversal state from owned PathBuf values to &Path references, reducing cloning and allocations while preserving external interfaces and behavior. Class diagram for optimized DependencyGraph traversal methodsclassDiagram
class DependencyGraph {
+find_affected_files(changed_files: &RapidSet_PathBuf_) RapidSet_PathBuf_
+topological_sort(self: &DependencyGraph, files: &RapidSet_PathBuf_) Result_Vec_PathBuf__GraphError_
-visit_node(self: &DependencyGraph, file: &Path, subset: &RapidSet_PathBuf_, visited: &mut RapidSet_PathRef_, temp_mark: &mut RapidSet_PathRef_, sorted: &mut Vec_PathBuf_) Result___GraphError_
+get_dependents(file: &Path) Iterator_DependencyEdge_
+get_dependencies(file: &Path) Iterator_DependencyEdge_
}
class RapidSet_PathBuf_ {
}
class RapidSet_PathRef_ {
}
class Vec_PathBuf_ {
}
class Path {
}
class PathBuf {
+as_path() &Path
+to_path_buf() PathBuf
}
class DependencyEdge {
+from PathBuf
+to PathBuf
+effective_strength() DependencyStrength
}
class DependencyStrength {
<<enumeration>>
Strong
Weak
}
class GraphError {
}
DependencyGraph --> RapidSet_PathBuf_ : uses
DependencyGraph --> RapidSet_PathRef_ : uses
DependencyGraph --> Vec_PathBuf_ : produces
DependencyGraph --> Path : borrows
DependencyGraph --> PathBuf : returns
DependencyGraph --> DependencyEdge : traverses
DependencyEdge --> DependencyStrength
RapidSet_PathRef_ ..> Path : holds_references
RapidSet_PathBuf_ ..> PathBuf : owns_values
Vec_PathBuf_ ..> PathBuf : owns_values
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The new
topological_sortsignature ties the lifetimes of&selfand&RapidSet<PathBuf>together (<'a>(&'a self, files: &'a RapidSet<_>)), which is a stricter API than before and could break existing callers that borrowselfandfileswith different lifetimes; consider keeping the original signature and letting the compiler infer lifetimes in the body. - In
topological_sort/visit_node, you now useRapidSet<&Path>; double-check that theRapidSetinstance is guaranteed not to outlive the underlyingPathBufowners (graph storage andfilesset), and consider adding a brief comment near its initialization to document this lifetime assumption for future maintainers.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `topological_sort` signature ties the lifetimes of `&self` and `&RapidSet<PathBuf>` together (`<'a>(&'a self, files: &'a RapidSet<_>)`), which is a stricter API than before and could break existing callers that borrow `self` and `files` with different lifetimes; consider keeping the original signature and letting the compiler infer lifetimes in the body.
- In `topological_sort`/`visit_node`, you now use `RapidSet<&Path>`; double-check that the `RapidSet` instance is guaranteed not to outlive the underlying `PathBuf` owners (graph storage and `files` set), and consider adding a brief comment near its initialization to document this lifetime assumption for future maintainers.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
Optimizes DependencyGraph traversal routines to reduce PathBuf cloning during BFS/DFS by using &Path references during traversal and only allocating PathBuf values when producing final outputs.
Changes:
- Updated
find_affected_filesBFS to track visited/queue entries as&Pathand materializePathBufonly at the end. - Updated
topological_sort/visit_nodeto use&Path-keyed visited sets and pre-allocate the output vector.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn topological_sort<'a>(&'a self, files: &'a RapidSet<PathBuf>) -> Result<Vec<PathBuf>, GraphError> { | ||
| // Optimization: Pre-allocate `sorted` capacity and use `&Path` sets | ||
| // for `visited` and `temp_mark` to avoid intermediate `PathBuf` allocations. | ||
| let mut sorted = Vec::with_capacity(files.len()); | ||
| let mut visited: RapidSet<&'a Path> = thread_utilities::get_set(); | ||
| let mut temp_mark: RapidSet<&'a Path> = thread_utilities::get_set(); |
There was a problem hiding this comment.
topological_sort now ties &self and files to the same lifetime (pub fn topological_sort<'a>(&'a self, files: &'a RapidSet<PathBuf>)). This makes the function type more restrictive than before (e.g., DependencyGraph::topological_sort can no longer be used where independent lifetimes are required), which is a potential breaking change for downstream callers. Consider keeping the original signature (&self, &RapidSet<PathBuf>) and restructuring the traversal so visited/temp_mark only store references originating from files (e.g., look up edge.to via subset.get(edge.to.as_path()) and recurse on that reference) to avoid needing a shared lifetime in the public API.
| pub fn topological_sort<'a>(&'a self, files: &'a RapidSet<PathBuf>) -> Result<Vec<PathBuf>, GraphError> { | |
| // Optimization: Pre-allocate `sorted` capacity and use `&Path` sets | |
| // for `visited` and `temp_mark` to avoid intermediate `PathBuf` allocations. | |
| let mut sorted = Vec::with_capacity(files.len()); | |
| let mut visited: RapidSet<&'a Path> = thread_utilities::get_set(); | |
| let mut temp_mark: RapidSet<&'a Path> = thread_utilities::get_set(); | |
| pub fn topological_sort(&self, files: &RapidSet<PathBuf>) -> Result<Vec<PathBuf>, GraphError> { | |
| // Optimization: Pre-allocate `sorted` capacity and use `&Path` sets | |
| // for `visited` and `temp_mark` to avoid intermediate `PathBuf` allocations. | |
| let mut sorted = Vec::with_capacity(files.len()); | |
| let mut visited: RapidSet<&Path> = thread_utilities::get_set(); | |
| let mut temp_mark: RapidSet<&Path> = thread_utilities::get_set(); |
| fn visit_node<'a>( | ||
| &'a self, | ||
| file: &'a Path, | ||
| subset: &'a RapidSet<PathBuf>, | ||
| visited: &mut RapidSet<&'a Path>, | ||
| temp_mark: &mut RapidSet<&'a Path>, |
There was a problem hiding this comment.
visit_node currently uses a single lifetime parameter for self, file, subset, and the visited/temp_mark sets. This couples traversal bookkeeping lifetimes to both the graph and the subset and is what forces the public topological_sort signature to share a lifetime. If you switch the traversal to always recurse using references obtained from subset (via subset.get(...)), visited/temp_mark can be keyed by subset references and this helper can drop the explicit lifetime plumbing, reducing complexity and avoiding accidental API restrictions.
💡 What: Optimized DependencyGraph traversal (find_affected_files and topological_sort) to use
&Pathreferences instead of cloningPathBufvalues repeatedly.🎯 Why:
PathBufcloning during frequent iterative algorithms causes significant heap allocation overhead and slows down caching invalidation.📊 Impact: Benchmark results show an average ~10% performance improvement during
find_affected_filestraversals in heavily connected graphs.🔬 Measurement: Verified with
cargo bench -p thread-flow --bench bench_graph_traversal.PR created automatically by Jules for task 14065492637516231669 started by @bashandbone
Summary by Sourcery
Optimize dependency graph traversals to reduce path allocations and improve performance.
Enhancements: