S3 Directory Support by jterapin · Pull Request #3304 · aws/aws-sdk-ruby

jterapin · 2025-10-13T18:50:58Z

Adds directory upload/download to Transfer Manager.

`upload_directory`

Upload all files from a local directory to S3
Optional recursive traversal of subdirectories
Symlink handling with circular reference detection
S3 key prefix support
Filter callback to selectively upload files
Request callback to modify upload parameters per file
Progress callback for transfer monitoring
Configurable failure handling (fail-fast or continue on error)

`download_directory`

Download all objects from an S3 bucket/prefix to a local directory
S3 prefix stripping for clean local paths
Path traversal detection
Filter callback to selectively download objects
Request callback to modify download parameters per object
Progress callback for transfer monitoring
Configurable failure handling (fail-fast or continue on error)
Automatic directory creation for nested structures

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

To make sure we include your contribution in the release notes, please make sure to add description entry for your changes in the "unreleased changes" section of the CHANGELOG.md file (at corresponding gem). For the description entry, please make sure it lives in one line and starts with Feature or Issue in the correct format.
For generated code changes, please checkout below instructions first:
https://github.com/aws/aws-sdk-ruby/blob/version-3/CONTRIBUTING.md

Thank you for your contribution!

github-actions · 2025-10-13T18:56:23Z

Detected 1 possible performance regressions:

aws-sdk-s3.put_object_small_allocated_kb - z-score regression: 83.98 -> 84.12. Z-score: 46.84

gems/aws-sdk-s3/lib/aws-sdk-s3/transfer_manager.rb

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_uploader.rb

alextwoods

Nice - this is looking good! Great test coverage and documentation. Good thread safety.

gems/aws-sdk-s3/spec/transfer_manager_spec_helper.rb

alextwoods · 2026-02-04T16:54:42Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        end
+
+        def validate_path(path, key)
+          segments = path.split('/')


Should this be File::SEPERATOR rather than / here? I think the path at this point will come from File.join and so would have os seperator? I might be wrong about that though.

I think I should reorder the normalization/validation here. You are correct - File.join will always use / separator based on documentation.

What I should actually do

Validate that object key does not contain . or .. path segments first

Remove s3-prefix if applies

File.join to combine destination directory and new key path

Update file separator if relevant

gems/aws-sdk-s3/lib/aws-sdk-s3/customizations.rb

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

alextwoods · 2026-02-04T17:19:58Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_progress.rb

+          @transferred_bytes += bytes_transferred
+          @transferred_files += 1
+
+          @progress_callback.call(@transferred_bytes, @transferred_files)


I see why we're calling the progress_callback inside the synchronize block - it does ensure progress is linear for users. However, depending on what they're doing in the callback (things like IO like printing/writing to file, ect) - this could end up being a small bottleneck. I'm not sure how much that matters vs the fully ordered progress callbacks.

Good callout. How does Java handle their directory progress?

Java actually does not provide any directory upload/download level progress.

I guess we can leave this as is for now - but its a potential optimization - maybe we should just add a comment about that here.

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_uploader.rb

richardwang1124

Nice! Looks pretty good overall. Left a few comments.

gems/aws-sdk-s3/lib/aws-sdk-s3/transfer_manager.rb

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_uploader.rb

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

alextwoods

LGTM - just minor comments.

alextwoods · 2026-02-24T17:45:13Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+    class DirectoryDownloader
+      def initialize(options = {})
+        @client = options[:client]
+        @executor = options[:executor]


I know this is API private, but should we validate that these required options are set? (I can't remember what we do for other similar cases....)

alextwoods · 2026-02-24T17:46:21Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+
+      private
+
+      def build_opts(destination, bucket, opts)


nit, feel free to ignore - the download and producer opts feel unrelated in this code - they aren't really sharing anything by being in the same method - I'd probably lean towards having them as seperate methods.

alextwoods · 2026-02-24T17:49:39Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+module Aws
+  module S3
+    # @api private
+    class DirectoryDownloader


The parallelization and logic of this class is fairly complex - The interaction of queue executor, producer, provided executor/file downloader, ect. It might be worth documenting that here.

alextwoods · 2026-02-24T17:51:13Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+
+      def process_download_queue(downloader, opts)
+        progress = DirectoryProgress.new(opts[:progress_callback]) if opts[:progress_callback]
+        queue_executor = DefaultExecutor.new


What are the defaults on the DefaultExector? (ie, how many threads does this create?) Do we want to hard code that default here to make it more explicit and... should this executor be configurable? Trying to think through cases where someone wants to leverage async more instead of using threads.

alextwoods · 2026-02-24T17:57:47Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_progress.rb

+          @transferred_bytes += bytes_transferred
+          @transferred_files += 1
+
+          @progress_callback.call(@transferred_bytes, @transferred_files)


Java actually does not provide any directory upload/download level progress.

I guess we can leave this as is for now - but its a potential optimization - maybe we should just add a comment about that here.

jterapin added 29 commits October 7, 2025 11:35

Add executor support

8c7ca45

Add changelog entry

c21969a

Update TM with executor changes

39ecf0a

Remove thread count support from MPU

a3f2b9f

Update Object usage of executor

3156f7c

Add documentation/remove unused methods from DefaultExecutor

84c9966

Add Default Executor specs

8e16a3b

Update TM docs and impl

db1cb62

Update streaming MPU to use executor

f907c3b

More MP Stream updates

7cb940a

Update specs

4003536

Update interfaces

7dddda9

Update specs

481f198

Update changelog

88bf44a

Minor updates

c1a25cd

Fix failing specs

7522a16

Merge branch 'version-3' into s3-executor-support

89cffe7

Feedback - address sleep in specs

9eea233

Feedback - update method name for cleanup_team_file

75b0d96

Feedback - wrap checksum callback

ad943ee

Feedback - update method name in MPU

f1fc86a

Feedback - streamline handling of progress callbacks

09eae68

Feedback - streamline docs

e824de0

Merge branch 'version-3' into s3-executor-support

c073349

Feedback - streamline opts

cd91eb7

Feedback - remove sleep from specs when possible

abf78d6

Feedback - update to use 10 threads only

04a287f

Add directory features

54b9add

Add temp changelog entry

ca6c2ae

jterapin added 8 commits January 13, 2026 17:07

More streamlining

a76bd21

Merge branch 'version-3' into s3-directory-support

a9f89c2

Update documentation

ea4f261

Clean specs

9d9f6c6

Block path traversal keys

c5238ca

Remove comments

735c2b3

Refactor

9341672

Refactors

bc13fae

jterapin marked this pull request as ready for review January 14, 2026 17:30

jterapin commented Jan 14, 2026

View reviewed changes

gems/aws-sdk-s3/lib/aws-sdk-s3/transfer_manager.rb Show resolved Hide resolved

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb Outdated Show resolved Hide resolved

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_uploader.rb Show resolved Hide resolved

jterapin added 3 commits January 14, 2026 12:41

Mini refactors

78200c9

Merge version-3 into branch

36042f2

Scope queue executor

7812d29

alextwoods reviewed Feb 4, 2026

View reviewed changes

richardwang1124 reviewed Feb 4, 2026

View reviewed changes

jterapin added 14 commits February 6, 2026 11:17

Merge from version-3

1232eeb

Address feedback - duplicate executor

25502d8

Address feedback - Update abort_requested

79cbea3

Address Feedback - rename abort method

077786c

Address feedback - correct file naming

5090606

Merge branch 'version-3' into s3-directory-support

f3c2302

Update key path building

413a52c

Refactor to use ClosedQueue

bf575d2

Refactor abort mechanism

a9b0205

Add fixes to handle edge cases

93a4238

Simplify logic

3c586af

Add tests

a4cd1b6

Final feedbacks

12c1652

Merge branch 'version-3' into s3-directory-support

7b9e0ed

alextwoods approved these changes Feb 24, 2026

View reviewed changes

Comments

Conversation

jterapin commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

upload_directory

download_directory

Uh oh!

github-actions bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alextwoods left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

richardwang1124 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alextwoods left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jterapin commented Oct 13, 2025 •

edited

Loading

`upload_directory`

`download_directory`

github-actions bot commented Oct 13, 2025 •

edited

Loading