perf[buffer]: iteration for fallible operations with validity#8120
perf[buffer]: iteration for fallible operations with validity#8120joseph-isaacs wants to merge 21 commits into
Conversation
Merging this PR will improve performance by 16.14%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| 🆕 | Simulation | cast_i32_to_u32[65536] |
N/A | 832.9 µs | N/A |
| 🆕 | Simulation | cast_u32_to_u8[65536] |
N/A | 250.5 µs | N/A |
| 🆕 | Simulation | cast_u16_to_u32[65536] |
N/A | 210.6 µs | N/A |
| ⚡ | Simulation | patched_take_10k_dispersed |
316.3 µs | 286 µs | +10.61% |
| ⚡ | Simulation | patched_take_10k_first_chunk_only |
302.6 µs | 272.3 µs | +11.14% |
| ⚡ | Simulation | patched_take_10k_adversarial |
257.2 µs | 226.9 µs | +13.37% |
| ⚡ | Simulation | take_10k_dispersed |
284.8 µs | 239.8 µs | +18.76% |
| ⚡ | Simulation | take_10k_first_chunk_only |
271.1 µs | 226.2 µs | +19.86% |
| 🆕 | Simulation | map_with_mask_widen_u16_u32[65536] |
N/A | 189.6 µs | N/A |
| 🆕 | Simulation | try_map_masked_into_widen_u16_u32[65536] |
N/A | 190 µs | N/A |
| 🆕 | Simulation | try_map_into_narrow_u64_u32[65536] |
N/A | 424.1 µs | N/A |
| 🆕 | Simulation | try_map_masked_into_narrow_i32_u32[65536] |
N/A | 292.3 µs | N/A |
| 🆕 | Simulation | try_map_masked_in_place_narrow_i32_u32[65536] |
N/A | 172.7 µs | N/A |
| 🆕 | Simulation | map_with_mask_narrow_u64_u32[65536] |
N/A | 387.1 µs | N/A |
| 🆕 | Simulation | lanezip_checked_add_u32[65536] |
N/A | 452.7 µs | N/A |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
304.4 ns | 246.1 ns | +23.7% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ji/fast-iter-valid (fc9b5e8) with develop (a2323f1)
Footnotes
-
1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports. ↩
4b444dd to
72bca8b
Compare
|
Open question is where to put this code? |
|
Sounds like we want a crate in between the array and vortex-buffer or this could be a feature flag in vortex-buffer |
Currently use (and arrow) handle fallible operations with scalar (non-SIMD) code.
This PR add a trait and methods to have fast SIMD checked operations (includes cast) but verified else where that
checked_addbenefits