GH-49266: [C++][Parquet] Optimize delta bit-packed decoding when bit-width = 0 by pitrou · Pull Request #49296 · apache/arrow

pitrou · 2026-02-16T08:49:59Z

Rationale for this change

DELTA_BINARY_PACKED decoding has limited performance due to a back-to-back dependency between the computations of value N and value N+1.

However, we can do better if we know that all deltas are 0 in a miniblock. This happens when a miniblock's delta bit width.

What changes are included in this PR?

Avoid reading and accumulating deltas when we the delta bit width is 0. Instead, use a condensed formula that allows to compute a value without waiting for the previous one.

Benchmark results on constant ranges of integers (on my local machine, AMD Zen 2 CPU):

                                                                 benchmark        baseline        contender  change %                                                                                                                                                                                                               counters
                                 BM_DeltaBitPackingDecode_Int32_Fixed/4096   3.821 GiB/sec   12.164 GiB/sec   218.323                              {'family_index': 11, 'per_family_instance_index': 1, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/4096', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70160}
                                BM_DeltaBitPackingDecode_Int32_Fixed/65536   3.897 GiB/sec   12.378 GiB/sec   217.678                              {'family_index': 11, 'per_family_instance_index': 3, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4487}
                                BM_DeltaBitPackingDecode_Int32_Fixed/32768   3.909 GiB/sec   12.325 GiB/sec   215.309                              {'family_index': 11, 'per_family_instance_index': 2, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/32768', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 9004}
                                 BM_DeltaBitPackingDecode_Int32_Fixed/1024   3.542 GiB/sec   10.468 GiB/sec   195.538                             {'family_index': 11, 'per_family_instance_index': 0, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 259535}
                                BM_DeltaBitPackingDecode_Int64_Fixed/32768   9.761 GiB/sec   14.040 GiB/sec    43.847                             {'family_index': 12, 'per_family_instance_index': 2, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/32768', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11192}
                                BM_DeltaBitPackingDecode_Int64_Fixed/65536   9.814 GiB/sec   14.056 GiB/sec    43.222                              {'family_index': 12, 'per_family_instance_index': 3, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 5617}
                                 BM_DeltaBitPackingDecode_Int64_Fixed/4096   9.672 GiB/sec   13.543 GiB/sec    40.014                              {'family_index': 12, 'per_family_instance_index': 1, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/4096', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88891}
                                 BM_DeltaBitPackingDecode_Int64_Fixed/1024   8.850 GiB/sec   12.207 GiB/sec    37.923                             {'family_index': 12, 'per_family_instance_index': 0, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 323720}

Are these changes tested?

Yes, by an additional test meant to stress this specific situation.

Are there any user-facing changes?

No.

GitHub Issue: [C++][Parquet] Optimize delta bit-packed decoding when bit-width = 0 #49266

…n bit-width = 0

pitrou · 2026-02-16T10:09:43Z

FTR @AntoinePrv :)

github-actions bot added Component: Parquet Component: C++ awaiting review Awaiting review labels Feb 16, 2026

apacheGH-49266: [C++][Parquet] Optimize delta bit-packed decoding whe…

ec8c4a3

…n bit-width = 0

pitrou force-pushed the delta-zero-opt branch from 3e048a6 to ec8c4a3 Compare February 16, 2026 09:57

pitrou marked this pull request as ready for review February 16, 2026 10:09

pitrou requested a review from wgtmac as a code owner February 16, 2026 10:09

pitrou requested review from mapleFU and rok February 16, 2026 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-49266: [C++][Parquet] Optimize delta bit-packed decoding when bit-width = 0#49296

GH-49266: [C++][Parquet] Optimize delta bit-packed decoding when bit-width = 0#49296
pitrou wants to merge 1 commit intoapache:mainfrom
pitrou:delta-zero-opt

pitrou commented Feb 16, 2026 •

edited

Loading

Uh oh!

pitrou commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pitrou commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

pitrou commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pitrou commented Feb 16, 2026 •

edited

Loading