Skip to content

Comments

S3 Directory Support#3304

Open
jterapin wants to merge 92 commits intoversion-3from
s3-directory-support
Open

S3 Directory Support#3304
jterapin wants to merge 92 commits intoversion-3from
s3-directory-support

Conversation

@jterapin
Copy link
Contributor

@jterapin jterapin commented Oct 13, 2025

Adds directory upload/download to Transfer Manager.

upload_directory

  • Upload all files from a local directory to S3
  • Optional recursive traversal of subdirectories
  • Symlink handling with circular reference detection
  • S3 key prefix support
  • Filter callback to selectively upload files
  • Request callback to modify upload parameters per file
  • Progress callback for transfer monitoring
  • Configurable failure handling (fail-fast or continue on error)

download_directory

  • Download all objects from an S3 bucket/prefix to a local directory
  • S3 prefix stripping for clean local paths
  • Path traversal detection
  • Filter callback to selectively download objects
  • Request callback to modify download parameters per object
  • Progress callback for transfer monitoring
  • Configurable failure handling (fail-fast or continue on error)
  • Automatic directory creation for nested structures

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

  1. To make sure we include your contribution in the release notes, please make sure to add description entry for your changes in the "unreleased changes" section of the CHANGELOG.md file (at corresponding gem). For the description entry, please make sure it lives in one line and starts with Feature or Issue in the correct format.

  2. For generated code changes, please checkout below instructions first:
    https://github.com/aws/aws-sdk-ruby/blob/version-3/CONTRIBUTING.md

Thank you for your contribution!

@github-actions
Copy link

github-actions bot commented Oct 13, 2025

Detected 1 possible performance regressions:

  • aws-sdk-s3.put_object_small_allocated_kb - z-score regression: 83.98 -> 84.12. Z-score: 46.84

@jterapin jterapin marked this pull request as ready for review January 14, 2026 17:30
Copy link
Contributor

@alextwoods alextwoods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - this is looking good! Great test coverage and documentation. Good thread safety.

end

def validate_path(path, key)
segments = path.split('/')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be File::SEPERATOR rather than / here? I think the path at this point will come from File.join and so would have os seperator? I might be wrong about that though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should reorder the normalization/validation here. You are correct - File.join will always use / separator based on documentation.

What I should actually do

  1. Validate that object key does not contain . or .. path segments first
  2. Remove s3-prefix if applies
  3. File.join to combine destination directory and new key path
  4. Update file separator if relevant

@transferred_bytes += bytes_transferred
@transferred_files += 1

@progress_callback.call(@transferred_bytes, @transferred_files)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why we're calling the progress_callback inside the synchronize block - it does ensure progress is linear for users. However, depending on what they're doing in the callback (things like IO like printing/writing to file, ect) - this could end up being a small bottleneck. I'm not sure how much that matters vs the fully ordered progress callbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout. How does Java handle their directory progress?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java actually does not provide any directory upload/download level progress.

I guess we can leave this as is for now - but its a potential optimization - maybe we should just add a comment about that here.

Copy link
Contributor

@richardwang1124 richardwang1124 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looks pretty good overall. Left a few comments.

Copy link
Contributor

@alextwoods alextwoods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just minor comments.

class DirectoryDownloader
def initialize(options = {})
@client = options[:client]
@executor = options[:executor]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is API private, but should we validate that these required options are set? (I can't remember what we do for other similar cases....)


private

def build_opts(destination, bucket, opts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, feel free to ignore - the download and producer opts feel unrelated in this code - they aren't really sharing anything by being in the same method - I'd probably lean towards having them as seperate methods.

module Aws
module S3
# @api private
class DirectoryDownloader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parallelization and logic of this class is fairly complex - The interaction of queue executor, producer, provided executor/file downloader, ect. It might be worth documenting that here.


def process_download_queue(downloader, opts)
progress = DirectoryProgress.new(opts[:progress_callback]) if opts[:progress_callback]
queue_executor = DefaultExecutor.new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the defaults on the DefaultExector? (ie, how many threads does this create?) Do we want to hard code that default here to make it more explicit and... should this executor be configurable? Trying to think through cases where someone wants to leverage async more instead of using threads.

@transferred_bytes += bytes_transferred
@transferred_files += 1

@progress_callback.call(@transferred_bytes, @transferred_files)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java actually does not provide any directory upload/download level progress.

I guess we can leave this as is for now - but its a potential optimization - maybe we should just add a comment about that here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants