fix: Add backpressure on S3 stream upload to prevent unbounded buffering#3405
Open
guille-moe wants to merge 2 commits into
Open
fix: Add backpressure on S3 stream upload to prevent unbounded buffering#3405guille-moe wants to merge 2 commits into
guille-moe wants to merge 2 commits into
Conversation
on IO.pipe work as expect to prevent memory leak.
969b9f3 to
1621e7f
Compare
to prevent memory or disk leaks.
1621e7f to
644899a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
The S3 stream uploader does not currently apply backpressure between reading from the input stream and uploading multipart parts.
When the input is written faster than parts can be uploaded, the uploader can read ahead and queue buffered parts indefinitely, causing unbounded memory usage or disk usage (with
tempfile: true).Context
MultipartStreamUploaderreads stream data into part buffers (StringIOorTempfile) on a dedicated thread and posts each part toDefaultExecutorfor upload when a executorThreadbecome available.Previously, the reader
loopcould read ahead without waiting for executor, so parts could accumulate in the executor queue, causing unbounded memory use withStringIOor unbounded disk use ofTempfilewithtempfile: true.MultipartFileUploaderdoes not have this issue because parts offsets are computed ahead and read inside the executor Thread.This change tracks in-flight read slots and defers further reads until a posted part completes (or fails), keeping buffered data bounded by executor concurrency.
Testing
Added an example reproducing unbounded read in
gems/aws-sdk-s3/spec/multipart_stream_uploader_spec.rb.bundle exec rspec gems/aws-sdk-s3/spec/multipart_stream_uploader_spec.rbReal-world
This issue resulted in memory leaks (near file size !) during a stream upload, to investigate we enabled the tempfile option which push to a
Tempfileto follow what happen, this what we show:Screencast.from.2026-07-02.11-06-15.webm
For context, here we just read a big local file and send chunks to the stream upload.
Changes
MultipartStreamUploaderto@executor.max_threads, so the reader does not pull data fromIO.pipefaster than the executor can upload partsmax_threadsonDefaultExecutorso the uploader can align read concurrency with executor capacity