bool ree#8133
Conversation
Signed-off-by: blaginin <github@blaginin.me> Co-authored-by: Codex <codex@openai.com>
Signed-off-by: blaginin <github@blaginin.me> Co-authored-by: Codex <codex@openai.com>
Signed-off-by: blaginin <github@blaginin.me> Co-authored-by: Codex <codex@openai.com>
Signed-off-by: blaginin <github@blaginin.me> Co-authored-by: Codex <codex@openai.com>
| /// | ||
| /// The `values` array is a [`BoolArray`] for ordinary inputs. For all-invalid inputs it is a | ||
| /// single-row null [`ConstantArray`]. | ||
| pub fn runend_encode_bool( |
There was a problem hiding this comment.
We used to have a custom run-end-bool encoding at some point.
The point is you don't need to store the "values" array because you know it just flip-flops. So you could say, values starts at true, and maybe offsets are 0, 0, 4, 10, ...
There was a problem hiding this comment.
But also... we already support run-end bools because we sometimes push-down a predicate expression over non-bool run-end types. So we can use this as a compression scheme now, and update later to preferred run-end bool if we want one.
There was a problem hiding this comment.
whenever we push down expression that end up with run end bools we immediately canonicalise them. The format we have is really wasteful for bools.
Merging this PR will degrade performance by 15.9%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
187.8 µs | 225 µs | -16.52% |
| ❌ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
301.1 µs | 355.4 µs | -15.28% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing db/bool-ree (d8244cd) with develop (e065c33)
Polar Signals Profiling ResultsLatest Run
Previous Runs (1)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.064x ➖ datafusion / vortex-file-compressed (1.064x ➖, 0↑ 4↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.970x ➖, 2↑ 1↓)
datafusion / vortex-compact (0.997x ➖, 0↑ 1↓)
datafusion / parquet (1.037x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.027x ➖, 1↑ 2↓)
duckdb / vortex-compact (1.028x ➖, 0↑ 1↓)
duckdb / parquet (1.081x ➖, 0↑ 2↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.040x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.031x ➖, 0↑ 0↓)
datafusion / parquet (1.015x ➖, 0↑ 1↓)
datafusion / arrow (1.000x ➖, 1↑ 1↓)
duckdb / vortex-file-compressed (1.040x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.013x ➖, 0↑ 0↓)
duckdb / parquet (1.044x ➖, 1↑ 3↓)
duckdb / duckdb (1.016x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.994x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.040x ➖, 0↑ 0↓)
datafusion / parquet (1.055x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.121x ➖, 0↑ 1↓)
duckdb / vortex-compact (0.930x ➖, 2↑ 0↓)
duckdb / parquet (1.061x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.964x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.036x ➖, 0↑ 3↓)
datafusion / parquet (1.053x ➖, 0↑ 3↓)
duckdb / vortex-file-compressed (0.981x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.999x ➖, 0↑ 0↓)
duckdb / parquet (0.999x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.091x ➖, 0↑ 42↓)
datafusion / vortex-compact (1.035x ➖, 5↑ 19↓)
datafusion / parquet (1.026x ➖, 3↑ 15↓)
duckdb / vortex-file-compressed (1.094x ➖, 0↑ 43↓)
duckdb / vortex-compact (1.065x ➖, 0↑ 31↓)
duckdb / parquet (1.007x ➖, 3↑ 4↓)
duckdb / duckdb (1.004x ➖, 1↑ 9↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.011x ➖, 0↑ 1↓)
duckdb / vortex-compact (0.981x ➖, 0↑ 0↓)
duckdb / parquet (0.973x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsFile Size Changes (2 files changed, -0.0% overall, 0↑ 2↓)
Totals:
|
Benchmarks: Random AccessVortex (geomean): 0.913x ➖ unknown / unknown (0.957x ➖, 9↑ 0↓)
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.167x ❌, 0↑ 21↓)
datafusion / vortex-compact (1.143x ❌, 0↑ 19↓)
datafusion / parquet (1.111x ❌, 0↑ 15↓)
datafusion / arrow (1.158x ❌, 0↑ 18↓)
duckdb / vortex-file-compressed (1.120x ❌, 0↑ 16↓)
duckdb / vortex-compact (1.105x ❌, 0↑ 11↓)
duckdb / parquet (1.065x ➖, 0↑ 5↓)
duckdb / duckdb (1.070x ➖, 0↑ 2↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.046x ➖, 0↑ 5↓)
datafusion / parquet (1.040x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.052x ➖, 1↑ 7↓)
duckdb / parquet (1.016x ➖, 0↑ 1↓)
duckdb / duckdb (1.052x ➖, 0↑ 7↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (167 files changed, -0.0% overall, 166↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.963x ➖, 2↑ 3↓)
datafusion / vortex-compact (0.940x ➖, 0↑ 0↓)
datafusion / parquet (1.007x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.995x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.026x ➖, 0↑ 0↓)
duckdb / parquet (1.089x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Appian on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.954x ➖, 1↑ 0↓)
datafusion / parquet (0.995x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.990x ➖, 0↑ 0↓)
duckdb / parquet (1.004x ➖, 0↑ 0↓)
duckdb / duckdb (1.006x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Appian on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: CompressionVortex (geomean): 1.008x ➖ unknown / unknown (1.004x ➖, 1↑ 2↓)
|
No description provided.