Skip to content

GH-50216: [C++][Parquet] Add RleBitPackedToBitmapDecoder#50217

Open
AntoinePrv wants to merge 14 commits into
apache:mainfrom
AntoinePrv:rle-bitmap
Open

GH-50216: [C++][Parquet] Add RleBitPackedToBitmapDecoder#50217
AntoinePrv wants to merge 14 commits into
apache:mainfrom
AntoinePrv:rle-bitmap

Conversation

@AntoinePrv

@AntoinePrv AntoinePrv commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Rationale for this change

Add a RleBitPackedToBitmapDecoder capable of decoding a Parquet mixed Rle / BitPacked byte stream directly into a bitmap. This is an optimization to be used when target and source bit_width = 1 that will improve decoding nullable columns in a follow-up PR.

What changes are included in this PR?

New classes, that reuse previous Runs and parser.
Their API is slighly different from the existing one as it fits a partilcular case.

  • RleRunToBitmapDecoder
  • BitPackedRunToBitmapDecoder
  • RleRunToBitmapDecoder

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@AntoinePrv AntoinePrv requested a review from pitrou as a code owner June 18, 2026 09:25
Copilot AI review requested due to automatic review settings June 18, 2026 09:25
@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #50216 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions Bot added the awaiting review Awaiting review label Jun 18, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a specialized path for decoding Parquet mixed RLE / BitPacked streams with bit_width=1 directly into Arrow bitmaps, aiming to speed up validity/definition-level decoding.

Changes:

  • Added bitmap-oriented decoders for RleRun and BitPackedRun, plus RleBitPackedToBitmapDecoder for mixed streams.
  • Refactored RleBitPackedParser to provide ParseWithCallable() and adjusted RleBitPackedDecoder to use it.
  • Added new bit utilities (Get32Bits, CopyBits) and comprehensive unit tests for the new bitmap decoders.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cpp/src/arrow/util/rle_encoding_internal.h Adds RleRun::value(), moves callable-based parsing into the parser, and updates RleBitPackedDecoder accordingly.
cpp/src/arrow/util/rle_bitmap_internal.h Introduces bitmap span + run-to-bitmap decoders and the new RleBitPackedToBitmapDecoder.
cpp/src/arrow/util/rle_bitmap_test.cc Adds tests covering alignment, chunking, reset, and truncated input scenarios for bitmap decoding.
cpp/src/arrow/util/CMakeLists.txt Registers the new test file in the util test target.
cpp/src/arrow/util/bit_util.h Adds Get32Bits and CopyBits helpers used by the new bitmap decoder implementation.
cpp/src/arrow/util/bit_util_test.cc Adds unit tests for Get32Bits and CopyBits.

Comment thread cpp/src/arrow/util/rle_encoding_internal.h Outdated
Comment thread cpp/src/arrow/util/rle_bitmap_internal.h
Comment thread cpp/src/arrow/util/bit_util.h
Comment thread cpp/src/arrow/util/rle_encoding_internal.h
@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 18, 2026
Copilot AI review requested due to automatic review settings June 18, 2026 12:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment on lines +169 to +176
void Reset(const RunType& run) noexcept {
values_left_ = run.values_count();
if (run.value_little_endian() == 0) {
value_pattern_ = uint8_t{0};
} else {
value_pattern_ = uint8_t{0xFF};
}
}
@AntoinePrv

Copy link
Copy Markdown
Collaborator Author

@pitrou this is ready

rle_size_t batch_size) -> rle_size_t {
using ControlFlow = RleBitPackedParser::ControlFlow;

if (ARROW_PREDICT_FALSE(batch_size == 0 || exhausted())) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason this became necessary?

}

// Writing full bytes
const auto n_vals = derived()->GetBatchFast(out, batch_size);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rename GetBatchFast to GetBatchFullBytes?

// If we exhausted the values in this decoder, or we filled what was required,
// then the following recursive call will return 0.
// If in the opposite case, then we will continue on a byte-aligned boundary.
return n_vals + GetBatch(out.NewStartingAt(n_vals), batch_size - n_vals);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursive call won't be terrific for performance, right? I don't expect the compiler to understand the actual semantics of this function such as to inline the recursive call.

}

// HEADER: Writing inside the first byte if caller gives a non-aligned input
if (ARROW_PREDICT_FALSE(out_bit_offset != 0)) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it ARROW_PREDICT_FALSE? Note that the compiler may consider the following code "cold" and decide it doesn't need to be optimized for speed.

/// Get batch in full bytes using memcpy.
[[nodiscard]] rle_size_t GetBatchFast(BitmapSpanMut out, rle_size_t batch_size) {
ARROW_DCHECK_EQ(out.bit_start(), 0);
const auto n_bytes = std::min(batch_size, remaining()) / 8;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should DCHECK unread_values_bit_offset() is equal to 0 here? Otherwise you can't just memcpy...

/// Bits `out[out_offset..out_offset + count]` must equal `expected[skip..skip + count]`.
/// The `out_offset` bits before them must still be zero.
void CheckDecodedBits(const std::vector<uint8_t>& out, const std::vector<bool>& expected,
rle_size_t count, rle_size_t out_offset = 0, rle_size_t skip = 0) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip could perhaps be named expected_offset or expected_skip for clarity?

rle_size_t count, rle_size_t out_offset = 0, rle_size_t skip = 0) {
ARROW_SCOPED_TRACE("out_offset = ", out_offset, ", skip = ", skip);
for (rle_size_t i = 0; i < out_offset; ++i) {
EXPECT_FALSE(bit_util::GetBit(out.data(), i)) << "clobbered bit " << i;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rudimentary way of testing that existing bits are not clobbered. An implementation that zeroes any existing bits would pass this test successfully.

/// only the aligned path runs.
template <typename Decoder>
void CheckChunkedDecode(const typename Decoder::RunType& run,
const std::vector<bool>& expected, rle_size_t chunk = 1,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_size rather than chunk?


// Decode the whole run in several chunks.
for (const rle_size_t chunk : {rle_size_t{1}, rle_size_t{3}, rle_size_t{7},
rle_size_t{8}, rle_size_t{9}, n_vals}) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also n_vals + 1 to make sure that passing a larger chunk size than the number of values is allowed?

Comment on lines +190 to +193
// The repeated boolean value of the run.
bool value;
// The number of values in the run.
rle_size_t count;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not parameterize just the count, and check both true and false values for each count?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants