Conversation
|
Detected 1 possible performance regressions:
|
alextwoods
left a comment
There was a problem hiding this comment.
Nice - this is looking good! Great test coverage and documentation. Good thread safety.
| end | ||
|
|
||
| def validate_path(path, key) | ||
| segments = path.split('/') |
There was a problem hiding this comment.
Should this be File::SEPERATOR rather than / here? I think the path at this point will come from File.join and so would have os seperator? I might be wrong about that though.
There was a problem hiding this comment.
I think I should reorder the normalization/validation here. You are correct - File.join will always use / separator based on documentation.
What I should actually do
- Validate that object key does not contain
.or..path segments first - Remove s3-prefix if applies
File.jointo combine destination directory and new key path- Update file separator if relevant
| @transferred_bytes += bytes_transferred | ||
| @transferred_files += 1 | ||
|
|
||
| @progress_callback.call(@transferred_bytes, @transferred_files) |
There was a problem hiding this comment.
I see why we're calling the progress_callback inside the synchronize block - it does ensure progress is linear for users. However, depending on what they're doing in the callback (things like IO like printing/writing to file, ect) - this could end up being a small bottleneck. I'm not sure how much that matters vs the fully ordered progress callbacks.
There was a problem hiding this comment.
Good callout. How does Java handle their directory progress?
There was a problem hiding this comment.
Java actually does not provide any directory upload/download level progress.
I guess we can leave this as is for now - but its a potential optimization - maybe we should just add a comment about that here.
richardwang1124
left a comment
There was a problem hiding this comment.
Nice! Looks pretty good overall. Left a few comments.
alextwoods
left a comment
There was a problem hiding this comment.
LGTM - just minor comments.
| class DirectoryDownloader | ||
| def initialize(options = {}) | ||
| @client = options[:client] | ||
| @executor = options[:executor] |
There was a problem hiding this comment.
I know this is API private, but should we validate that these required options are set? (I can't remember what we do for other similar cases....)
|
|
||
| private | ||
|
|
||
| def build_opts(destination, bucket, opts) |
There was a problem hiding this comment.
nit, feel free to ignore - the download and producer opts feel unrelated in this code - they aren't really sharing anything by being in the same method - I'd probably lean towards having them as seperate methods.
| module Aws | ||
| module S3 | ||
| # @api private | ||
| class DirectoryDownloader |
There was a problem hiding this comment.
The parallelization and logic of this class is fairly complex - The interaction of queue executor, producer, provided executor/file downloader, ect. It might be worth documenting that here.
|
|
||
| def process_download_queue(downloader, opts) | ||
| progress = DirectoryProgress.new(opts[:progress_callback]) if opts[:progress_callback] | ||
| queue_executor = DefaultExecutor.new |
There was a problem hiding this comment.
What are the defaults on the DefaultExector? (ie, how many threads does this create?) Do we want to hard code that default here to make it more explicit and... should this executor be configurable? Trying to think through cases where someone wants to leverage async more instead of using threads.
| @transferred_bytes += bytes_transferred | ||
| @transferred_files += 1 | ||
|
|
||
| @progress_callback.call(@transferred_bytes, @transferred_files) |
There was a problem hiding this comment.
Java actually does not provide any directory upload/download level progress.
I guess we can leave this as is for now - but its a potential optimization - maybe we should just add a comment about that here.
Adds directory upload/download to Transfer Manager.
upload_directorydownload_directoryBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
To make sure we include your contribution in the release notes, please make sure to add description entry for your changes in the "unreleased changes" section of the
CHANGELOG.mdfile (at corresponding gem). For the description entry, please make sure it lives in one line and starts withFeatureorIssuein the correct format.For generated code changes, please checkout below instructions first:
https://github.com/aws/aws-sdk-ruby/blob/version-3/CONTRIBUTING.md
Thank you for your contribution!