Add MaskNullAsFalse for nullable boolean predicates#8121
Conversation
Mask execution now requires a non-nullable boolean array and errors on nullable input. The new MaskNullAsFalse executable preserves the previous null-as-false coercion for filter and pruning predicates over nullable data, where SQL semantics treat NULL as not matching. Predicate-evaluation call sites (filter, prune, dict filter, is_constant) use MaskNullAsFalse; validity-array call sites keep the stricter Mask. Signed-off-by: Claude <noreply@anthropic.com>
Remove the shared NullHandling enum and execute_mask helper in favor of a self-contained Executable impl for each target. Mask still requires a non-nullable boolean array; MaskNullAsFalse coerces nulls to false. Signed-off-by: Claude <noreply@anthropic.com>
Polar Signals Profiling ResultsLatest Run
Previous Runs (1)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.007x ➖ datafusion / vortex-file-compressed (1.007x ➖, 1↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.001x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.999x ➖, 0↑ 0↓)
datafusion / parquet (0.992x ➖, 1↑ 0↓)
datafusion / arrow (1.005x ➖, 1↑ 1↓)
duckdb / vortex-file-compressed (0.988x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.993x ➖, 0↑ 0↓)
duckdb / parquet (1.014x ➖, 1↑ 3↓)
duckdb / duckdb (0.994x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.981x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.013x ➖, 0↑ 0↓)
datafusion / parquet (0.987x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.035x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.015x ➖, 0↑ 1↓)
duckdb / parquet (1.005x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.987x ➖, 3↑ 0↓)
datafusion / vortex-compact (1.002x ➖, 1↑ 2↓)
datafusion / parquet (0.982x ➖, 4↑ 1↓)
duckdb / vortex-file-compressed (0.985x ➖, 5↑ 1↓)
duckdb / vortex-compact (0.991x ➖, 3↑ 1↓)
duckdb / parquet (0.992x ➖, 2↑ 0↓)
duckdb / duckdb (1.012x ➖, 1↑ 1↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.034x ➖, 1↑ 2↓)
datafusion / vortex-compact (1.081x ➖, 0↑ 0↓)
datafusion / parquet (1.144x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.145x ➖, 0↑ 2↓)
duckdb / vortex-compact (0.983x ➖, 1↑ 0↓)
duckdb / parquet (1.083x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.008x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.005x ➖, 0↑ 0↓)
duckdb / parquet (0.996x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.107x ❌, 0↑ 15↓)
datafusion / vortex-compact (1.003x ➖, 5↑ 2↓)
datafusion / parquet (1.072x ➖, 0↑ 6↓)
datafusion / arrow (1.056x ➖, 0↑ 6↓)
duckdb / vortex-file-compressed (1.080x ➖, 0↑ 4↓)
duckdb / vortex-compact (1.081x ➖, 0↑ 3↓)
duckdb / parquet (1.039x ➖, 0↑ 0↓)
duckdb / duckdb (1.049x ➖, 0↑ 1↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.002x ➖, 1↑ 1↓)
datafusion / parquet (0.999x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.967x ➖, 6↑ 1↓)
duckdb / parquet (0.999x ➖, 0↑ 0↓)
duckdb / duckdb (0.970x ➖, 1↑ 0↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.418x ❌, 0↑ 13↓)
datafusion / vortex-compact (1.458x ❌, 0↑ 14↓)
datafusion / parquet (1.158x ➖, 0↑ 6↓)
duckdb / vortex-file-compressed (1.101x ➖, 0↑ 3↓)
duckdb / vortex-compact (1.150x ➖, 0↑ 4↓)
duckdb / parquet (1.123x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 0.993x ➖ unknown / unknown (0.991x ➖, 4↑ 1↓)
|
Benchmarks: Random AccessVortex (geomean): 0.802x ✅ unknown / unknown (0.824x ✅, 34↑ 0↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.976x ➖, 2↑ 1↓)
datafusion / vortex-compact (0.988x ➖, 3↑ 7↓)
datafusion / parquet (1.082x ➖, 2↑ 7↓)
duckdb / vortex-file-compressed (0.963x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.991x ➖, 0↑ 0↓)
duckdb / parquet (0.901x ➖, 1↑ 0↓)
Full attributed analysis
|
…ing-1eKcy Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk> # Conflicts: # vortex-array/public-api.lock # vortex/public-api.lock
|
Why is this better than before? |
|
First the semantics of the |
|
I'm not sure the semantics are that surprising...? We can remove the alloc/& by checking the nullability of the dtype. |
|
Mapping null to false is a hidden default (useful only in layouts) as seen in this PR. |
|
What are cases where you would want to discard validity? |
|
I really don't understand the alternative. Don't execute to a mask? Or force the user to call array.null_as_false().execute::(). execute::() just feels clunky to me. |
|
I think the only possible interpretation of execute to mask is that we fold nulls as false. The problem you run into is that it's not always correct to do and you should execute to BoolArray. If you know you have non nullable bool array like validity you just get a shortcut. |
Summary
This PR introduces
MaskNullAsFalse, a newExecutabletarget that executes boolean arrays intoMaskobjects while coercing null elements tofalse. This addresses the previous TODO comment about handling nullable boolean arrays in mask execution.