Faster true count using AVX2 and AVX512 instructions#6931
Faster true count using AVX2 and AVX512 instructions#6931robert3005 wants to merge 3 commits intodevelopfrom
Conversation
44dacba to
1acfa00
Compare
Merging this PR will improve performance by 88.24%
Performance Changes
Comparing Footnotes
|
|
Zen 5 after on Zen 3 (no avx512) after which made me realise there's discontinuity where the length threshold is reached due to feature detection. Need to figure out better structure |
ba177b2 to
f9e3725
Compare
|
before we merge this we need to figure out the slowdown due to feature detection |
Polar Signals Profiling ResultsLatest Run
Previous Runs (2)
Powered by Polar Signals Cloud |
Benchmarks: TPC-DS SF=1 on NVMESummary
VerdictNo clear signal
Statistical Summary
datafusion / vortex-file-compressed (1.081x ➖, 0↑ 45↓)
datafusion / vortex-compact (0.931x ➖, 36↑ 5↓)
datafusion / parquet (1.065x ➖, 0↑ 20↓)
duckdb / vortex-file-compressed (0.967x ➖, 19↑ 5↓)
duckdb / vortex-compact (0.987x ➖, 4↑ 5↓)
duckdb / parquet (0.935x ➖, 29↑ 1↓)
duckdb / duckdb (0.943x ➖, 15↑ 0↓)
Full attributed analysis
|
Benchmarks: PolarSignals ProfilingSummary
datafusion / vortex-file-compressed (1.012x ➖, 0↑ 0↓)
|
Benchmarks: TPC-H SF=10 on NVMESummary
VerdictNo clear signal
Statistical Summary
datafusion / vortex-file-compressed (0.898x ✅, 11↑ 0↓)
datafusion / vortex-compact (0.902x ➖, 13↑ 0↓)
datafusion / parquet (0.924x ➖, 5↑ 0↓)
datafusion / arrow (0.871x ✅, 17↑ 0↓)
duckdb / vortex-file-compressed (0.909x ➖, 6↑ 0↓)
duckdb / vortex-compact (0.924x ➖, 3↑ 0↓)
duckdb / parquet (0.956x ➖, 0↑ 0↓)
duckdb / duckdb (0.958x ➖, 1↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMESummary
VerdictNo clear signal
Statistical Summary
datafusion / vortex-file-compressed (1.005x ➖, 1↑ 0↓)
datafusion / parquet (1.001x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.998x ➖, 0↑ 0↓)
duckdb / parquet (0.998x ➖, 0↑ 0↓)
duckdb / duckdb (1.040x ➖, 0↑ 5↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on NVMESummary
VerdictNo clear signal
Statistical Summary
datafusion / vortex-file-compressed (1.009x ➖, 0↑ 2↓)
datafusion / vortex-compact (1.002x ➖, 0↑ 0↓)
datafusion / parquet (0.994x ➖, 1↑ 1↓)
datafusion / arrow (0.997x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.990x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.001x ➖, 0↑ 0↓)
duckdb / parquet (0.972x ➖, 5↑ 2↓)
duckdb / duckdb (1.003x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on S3Summary
VerdictNo clear signal
Statistical Summary
datafusion / vortex-file-compressed (1.168x ➖, 0↑ 3↓)
datafusion / vortex-compact (1.096x ➖, 0↑ 5↓)
datafusion / parquet (1.021x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.977x ➖, 0↑ 1↓)
duckdb / vortex-compact (0.962x ➖, 0↑ 0↓)
duckdb / parquet (0.977x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: FineWeb NVMeSummary
VerdictNo clear signal
Statistical Summary
datafusion / vortex-file-compressed (0.958x ➖, 2↑ 0↓)
datafusion / vortex-compact (0.987x ➖, 0↑ 0↓)
datafusion / parquet (0.920x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.975x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.953x ➖, 1↑ 0↓)
duckdb / parquet (0.966x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: FineWeb S3Summary
VerdictNo clear signal
Statistical Summary
datafusion / vortex-file-compressed (0.989x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.035x ➖, 0↑ 0↓)
datafusion / parquet (1.050x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.035x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.968x ➖, 0↑ 0↓)
duckdb / parquet (1.005x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on S3Summary
VerdictNo clear signal
Statistical Summary
datafusion / vortex-file-compressed (1.066x ➖, 0↑ 2↓)
datafusion / vortex-compact (1.057x ➖, 1↑ 2↓)
datafusion / parquet (0.978x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.002x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.034x ➖, 0↑ 0↓)
duckdb / parquet (1.053x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsSummary
VerdictNo clear signal
Statistical Summary
duckdb / vortex-file-compressed (0.952x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.978x ➖, 0↑ 0↓)
duckdb / parquet (0.970x ➖, 0↑ 0↓)
Full attributed analysis
|
f9e3725 to
970af97
Compare
Benchmarks: Random AccessSummary
unknown / unknown (0.901x ➖, 13↑ 0↓)
|
Benchmarks: CompressionSummary
unknown / unknown (0.960x ➖, 21↑ 0↓)
|
Signed-off-by: Robert Kruszewski <github@robertk.io>
970af97 to
d3c062d
Compare
Add faster true count using AVX2 and AVX512 intrinsics.
True count happens a lot in our codebase, it would definitely benefit from optimistaions