Fix LZ4_RAW heap decompression failure on chunked BytesInput (#3478)#3486
Open
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
Open
Fix LZ4_RAW heap decompression failure on chunked BytesInput (#3478)#3486yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
Conversation
…3478) Eagerly materialize the decompressed stream for LZ4_RAW in CodecFactory, matching the existing pattern used for ZSTD. Without this, the lazy StreamBytesInput.writeInto() path reads via Channels.newChannel() in ~8KB chunks, but LZ4_RAW requires all compressed input in a single buffer for one-shot decompression. Added a regression test that compresses and decompresses a 16KB page through the CodecFactory heap path, then calls BytesInput.copy() to exercise the chunked materialization code path. The test fails without the fix and passes with it.
b1e91f5 to
5998c9d
Compare
arouel
approved these changes
Apr 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed?
Eagerly materialize the decompressed stream for
LZ4_RAWinCodecFactory, matching the existing pattern used for ZSTD. Added a regression test.Closes #3478
Why are the changes needed?
When reading
LZ4_RAW-compressed data through the heap codec path, decompression fails if the decompressed page exceeds ~8KB. The lazyStreamBytesInput.writeInto()path reads viaChannels.newChannel()in ~8KB chunks, but LZ4_RAW requires all compressed input in a single buffer for one-shot decompression. This causesMalformedInputException: all input must be consumedin production reads, particularly during dictionary filter evaluation (DictionaryPageReader.reusableCopy()).The fix extends the existing ZSTD eager-materialization pattern (added for parquet-format#398) to also cover
LZ4_RAW:How was this tested?
Added
testLz4RawHeapDecompressorCanCopyLargePagetoTestCompressionCodec:CodecFactoryheap pathBytesInput.copy()to exercise the chunked materialization pathAll existing tests continue to pass (6/6 in
TestCompressionCodec, 11/11 across all codec tests).