Optimize ArrowBytesViewMap with direct value access and simplified API#21348
Optimize ArrowBytesViewMap with direct value access and simplified API#21348Dandandan wants to merge 2 commits intoapache:mainfrom
Conversation
Hash functions now write all positions including nulls (using a consistent null sentinel hash) when rehash=false (first column). This allows with_hashes to skip the buffer zero-fill, saving ~0.5µs per 8192-element batch on the no-nulls hot path. Changes: - with_hashes: use unsafe set_len instead of resize(n, 0) - hash_array_primitive/hash_array: fill with null sentinel then overwrite valid positions via valid_indices() - hash_string_view_array_inner: write null sentinel instead of skip - hash_dictionary_inner: write null sentinel for null keys/values - hash_run_array_inner: fill null run ranges with sentinel - create_hashes: zero-fill only for complex types (struct, list, map, union) whose hash functions always combine with existing values Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three optimizations for the BytesView hash map hot path: 1. Direct value bytes access: Replace values.value(i).as_ref() with direct pointer arithmetic on input_views + input_buffers, avoiding the GenericByteViewArray::value() accessor overhead on every hash table probe for >12 byte strings. 2. Skip append for inline strings: For strings <=12 bytes, the input view is self-contained. Instead of decoding to &[u8] and re-encoding via append_value -> make_view, push the input view directly. 3. Simplify make_payload_fn: Change from FnMut(Option<&[u8]>) to FnMut() since no caller uses the value bytes parameter. This eliminates unnecessary value decoding on the insert path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-bytes-view-map-intern (414fc69) to 1e93a67 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-bytes-view-map-intern (414fc69) to 1e93a67 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-bytes-view-map-intern (414fc69) to 1e93a67 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
run benchmark clickbench clickbench_extended |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-bytes-view-map-intern (414fc69) to 1e93a67 (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-bytes-view-map-intern (414fc69) to 1e93a67 (merge-base) diff using: clickbench_extended File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
File an issue against this benchmark runner |
|
run benchmark clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-bytes-view-map-intern (414fc69) to 1e93a67 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
N/A - performance optimization
Are these changes tested?
Existing tests pass. Test updated to match simplified
make_payload_fnsignature.Are there any user-facing changes?
ArrowBytesViewMap::insert_if_newhas a changedmake_payload_fnsignature (breaking API change for downstream users of this internal API).🤖 Generated with Claude Code