Skip to content

perf: Reorder predicates in conjuncts via simple heuristic#22343

Open
neilconway wants to merge 6 commits into
apache:mainfrom
neilconway:neilc/perf-predicate-reorder
Open

perf: Reorder predicates in conjuncts via simple heuristic#22343
neilconway wants to merge 6 commits into
apache:mainfrom
neilconway:neilc/perf-predicate-reorder

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented May 18, 2026

Which issue does this PR close?

Rationale for this change

If a filter consists of a mix of cheap and expensive predicates, evaluating the cheap predicates first can improve performance, because it reduces the number of rows that the expensive predicate must be evaluated on. This PR implements this idea, by reordering predicates in a conjunction to place "cheap" predicates first.

Predicates are assessed as "cheap" or "expensive" using an intentionally simple heuristic: "cheap" predicates are expressions that consist of only cheap operations like binary comparisons, negations, and casts, and "expensive" predicates are everything else (e.g., LIKE, regexp matching, subqueries, and function calls). Importantly, we use a stable sort when reordering predicates, which means that the user-specified order of operations is preserved within these two classes.

Arbitrarily more sophisticated schemes for predicting predicate evaluation cost (and selectivity) are possible, but a simple approach seems like a good place to start.

We avoid reordering predicates if the filter contains a volatile expression, to be safe. We could be a bit fancier and reorder conjuncts in the prefix of the filter list before the volatile expression, but we don't attempt to do that for now.

We don't reorder operands to OR: I believe this would be worth doing if #22342 is implemented.

On ClickBench, this improves performance by ~10-13% on Q21 and ~5% on Q22, in both cases by reordering simple comparisons to run before LIKE predicates.

What changes are included in this PR?

  • Add a new reorder_predicates helper
  • Invoke reorder_predicates as part of the PushDownFilter rewrite pass
  • Add unit tests for reorder_predicates
  • Update expected query plans in SLT
  • Add migration guide note for change to predicate evaluation order

Are these changes tested?

Yes. Added new unit tests for predicate reordering behavior, updated some expected EXPLAIN output.

Are there any user-facing changes?

Yes. Users that expect their predicates to be evaluated in a strictly left-to-right manner might see changes in performance and/or behavior. Performance changes could be improvements or regressions. Behavioral changes are possible if the query includes fallible operations like certain casts or division by zero. Note that the SQL standard is clear that implementations are allowed to evaluate predicates in any order, so user queries that depend on an evaluation order are fundamentally fragile. Users can rewrite predicates using CASE if they need to enforce an evaluation order.

DataFusion's vectorized AND evaluator already short-circuits the
right-hand side when the LHS keeps few rows. Until now the order of
conjuncts in a Filter was whatever the user wrote, so expensive
predicates like LIKE and regex could run on the full batch even when a
cheap comparison would have filtered most rows first.

This change classifies each conjunct as cheap or expensive (LIKE,
SIMILAR TO, regex operators, scalar functions, and subqueries are
expensive; everything else is cheap) and does a stable partition that
puts cheap predicates first. The helper reports whether any reorder
actually happened so the caller skips rebuilding the conjunction when
the input was already cheap-first.

On ClickBench (hits_partitioned, 5 iterations) the reorder yields
+13-16% on Q21 and +7-9% on Q22, the two queries that mix LIKE with a
cheap `<>` predicate; other queries are unchanged within noise.
@github-actions github-actions Bot added documentation Improvements or additions to documentation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels May 18, 2026
let LogicalPlan::Filter(mut filter) = plan else {
return Ok(Transformed::no(plan));
};
let plan_schema = Arc::clone(filter.input.schema());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the core change but should save a few cycles.

@neilconway
Copy link
Copy Markdown
Contributor Author

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4480099627-184-cfkxz 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4480099627-183-ppdg8 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4480099627-185-c22sx 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_perf-predicate-reorder
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃   neilc_perf-predicate-reorder ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 38.37 / 39.92 ±1.76 / 43.12 ms │ 38.94 / 40.45 ±0.84 / 41.37 ms │ no change │
│ QQuery 2  │ 20.26 / 20.48 ±0.22 / 20.91 ms │ 20.45 / 20.62 ±0.16 / 20.91 ms │ no change │
│ QQuery 3  │ 32.98 / 34.99 ±2.20 / 39.02 ms │ 32.95 / 34.82 ±2.04 / 38.52 ms │ no change │
│ QQuery 4  │ 17.47 / 18.16 ±0.77 / 19.28 ms │ 17.34 / 17.62 ±0.27 / 18.12 ms │ no change │
│ QQuery 5  │ 42.03 / 43.02 ±1.12 / 45.14 ms │ 41.36 / 42.59 ±0.74 / 43.60 ms │ no change │
│ QQuery 6  │ 16.42 / 16.52 ±0.08 / 16.64 ms │ 16.47 / 17.15 ±0.87 / 18.87 ms │ no change │
│ QQuery 7  │ 46.80 / 47.63 ±1.07 / 49.75 ms │ 47.19 / 48.65 ±1.20 / 50.15 ms │ no change │
│ QQuery 8  │ 44.94 / 45.37 ±0.42 / 46.16 ms │ 45.02 / 45.25 ±0.19 / 45.50 ms │ no change │
│ QQuery 9  │ 49.14 / 50.67 ±1.07 / 52.24 ms │ 49.97 / 50.52 ±0.40 / 51.06 ms │ no change │
│ QQuery 10 │ 63.50 / 63.66 ±0.18 / 63.99 ms │ 63.44 / 63.59 ±0.15 / 63.86 ms │ no change │
│ QQuery 11 │ 13.30 / 13.63 ±0.20 / 13.90 ms │ 13.29 / 13.51 ±0.17 / 13.82 ms │ no change │
│ QQuery 12 │ 24.46 / 25.18 ±1.05 / 27.25 ms │ 24.62 / 25.21 ±1.01 / 27.22 ms │ no change │
│ QQuery 13 │ 33.62 / 35.39 ±1.88 / 39.04 ms │ 34.22 / 35.93 ±1.81 / 39.45 ms │ no change │
│ QQuery 14 │ 25.49 / 25.80 ±0.20 / 26.10 ms │ 25.57 / 26.03 ±0.61 / 27.23 ms │ no change │
│ QQuery 15 │ 31.59 / 32.07 ±0.49 / 32.77 ms │ 31.57 / 31.91 ±0.30 / 32.46 ms │ no change │
│ QQuery 16 │ 14.78 / 14.98 ±0.12 / 15.14 ms │ 14.99 / 15.24 ±0.21 / 15.57 ms │ no change │
│ QQuery 17 │ 73.97 / 74.76 ±0.45 / 75.17 ms │ 74.37 / 75.67 ±1.30 / 77.72 ms │ no change │
│ QQuery 18 │ 62.69 / 63.82 ±0.88 / 65.23 ms │ 62.24 / 65.09 ±3.19 / 71.24 ms │ no change │
│ QQuery 19 │ 35.24 / 35.71 ±0.54 / 36.75 ms │ 35.34 / 35.93 ±0.90 / 37.72 ms │ no change │
│ QQuery 20 │ 37.71 / 38.08 ±0.47 / 38.94 ms │ 37.90 / 38.27 ±0.48 / 39.20 ms │ no change │
│ QQuery 21 │ 56.23 / 56.73 ±0.38 / 57.38 ms │ 57.17 / 59.04 ±1.51 / 61.01 ms │ no change │
│ QQuery 22 │ 23.45 / 25.05 ±1.77 / 28.05 ms │ 23.50 / 23.92 ±0.39 / 24.43 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                           ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 821.62ms │
│ Total Time (neilc_perf-predicate-reorder)   │ 827.01ms │
│ Average Time (HEAD)                         │  37.35ms │
│ Average Time (neilc_perf-predicate-reorder) │  37.59ms │
│ Queries Faster                              │        0 │
│ Queries Slower                              │        0 │
│ Queries with No Change                      │       22 │
│ Queries with Failure                        │        0 │
└─────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 5.5 GiB
Avg memory 5.0 GiB
CPU user 29.7s
CPU sys 2.1s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 5.7 GiB
Avg memory 5.1 GiB
CPU user 29.8s
CPU sys 2.1s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_perf-predicate-reorder
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃          neilc_perf-predicate-reorder ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           6.46 / 7.00 ±0.81 / 8.61 ms │           6.59 / 7.05 ±0.82 / 8.69 ms │     no change │
│ QQuery 2  │        82.95 / 83.26 ±0.29 / 83.63 ms │        83.29 / 83.57 ±0.23 / 83.86 ms │     no change │
│ QQuery 3  │        29.76 / 30.04 ±0.27 / 30.52 ms │        29.23 / 29.56 ±0.19 / 29.83 ms │     no change │
│ QQuery 4  │     545.51 / 555.45 ±7.21 / 563.95 ms │     542.26 / 551.85 ±5.48 / 556.72 ms │     no change │
│ QQuery 5  │        53.62 / 54.17 ±0.47 / 54.99 ms │        52.92 / 53.52 ±0.48 / 54.34 ms │     no change │
│ QQuery 6  │        36.91 / 37.44 ±0.33 / 37.80 ms │        37.10 / 37.51 ±0.44 / 38.31 ms │     no change │
│ QQuery 7  │     109.89 / 110.73 ±0.70 / 111.74 ms │     108.84 / 109.97 ±1.03 / 111.82 ms │     no change │
│ QQuery 8  │        39.95 / 40.26 ±0.20 / 40.52 ms │        39.61 / 39.88 ±0.27 / 40.35 ms │     no change │
│ QQuery 9  │        53.20 / 55.41 ±1.43 / 57.13 ms │        54.87 / 56.04 ±0.94 / 56.87 ms │     no change │
│ QQuery 10 │        82.81 / 84.02 ±1.78 / 87.53 ms │        83.19 / 84.18 ±1.47 / 87.06 ms │     no change │
│ QQuery 11 │     349.81 / 355.08 ±4.01 / 361.81 ms │     343.89 / 348.48 ±2.95 / 352.78 ms │     no change │
│ QQuery 12 │        29.74 / 30.09 ±0.27 / 30.51 ms │        29.38 / 30.06 ±0.35 / 30.31 ms │     no change │
│ QQuery 13 │     129.77 / 130.38 ±0.71 / 131.77 ms │     129.16 / 129.86 ±0.53 / 130.50 ms │     no change │
│ QQuery 14 │     514.42 / 515.64 ±0.96 / 517.09 ms │     514.20 / 516.22 ±1.55 / 517.98 ms │     no change │
│ QQuery 15 │        64.67 / 65.34 ±0.46 / 65.95 ms │        63.95 / 64.85 ±0.91 / 66.11 ms │     no change │
│ QQuery 16 │           7.41 / 7.60 ±0.14 / 7.78 ms │           7.43 / 7.51 ±0.10 / 7.71 ms │     no change │
│ QQuery 17 │        83.17 / 83.90 ±0.93 / 85.74 ms │        82.46 / 83.55 ±1.39 / 86.25 ms │     no change │
│ QQuery 18 │     154.27 / 156.11 ±1.00 / 157.23 ms │     155.12 / 155.70 ±0.50 / 156.64 ms │     no change │
│ QQuery 19 │        42.21 / 42.97 ±1.06 / 45.06 ms │        42.29 / 42.49 ±0.20 / 42.86 ms │     no change │
│ QQuery 20 │        36.18 / 36.86 ±0.37 / 37.22 ms │        36.46 / 37.09 ±0.39 / 37.58 ms │     no change │
│ QQuery 21 │        18.99 / 19.12 ±0.11 / 19.26 ms │        18.47 / 18.82 ±0.28 / 19.14 ms │     no change │
│ QQuery 22 │        64.77 / 65.58 ±0.76 / 66.63 ms │        64.44 / 65.61 ±0.79 / 66.83 ms │     no change │
│ QQuery 23 │     497.38 / 502.15 ±3.41 / 505.65 ms │     497.01 / 500.31 ±2.61 / 503.87 ms │     no change │
│ QQuery 24 │     238.15 / 241.22 ±4.87 / 250.86 ms │     236.73 / 239.38 ±2.03 / 241.56 ms │     no change │
│ QQuery 25 │     115.62 / 118.26 ±2.77 / 123.33 ms │     114.54 / 115.64 ±0.99 / 117.38 ms │     no change │
│ QQuery 26 │        72.31 / 73.15 ±0.49 / 73.83 ms │        71.84 / 73.50 ±2.51 / 78.46 ms │     no change │
│ QQuery 27 │           7.32 / 7.51 ±0.13 / 7.70 ms │           7.45 / 7.55 ±0.12 / 7.76 ms │     no change │
│ QQuery 28 │        63.46 / 64.19 ±0.82 / 65.71 ms │        58.99 / 61.60 ±2.19 / 64.16 ms │     no change │
│ QQuery 29 │     100.71 / 101.12 ±0.29 / 101.58 ms │      99.84 / 101.54 ±1.43 / 104.11 ms │     no change │
│ QQuery 30 │        31.84 / 32.26 ±0.42 / 32.94 ms │        31.72 / 33.18 ±1.41 / 35.48 ms │     no change │
│ QQuery 31 │     114.51 / 116.31 ±1.56 / 118.52 ms │     114.53 / 115.03 ±0.33 / 115.38 ms │     no change │
│ QQuery 32 │        22.41 / 22.65 ±0.20 / 22.88 ms │        21.57 / 22.47 ±1.10 / 24.58 ms │     no change │
│ QQuery 33 │        40.20 / 41.04 ±0.64 / 41.91 ms │        39.86 / 40.28 ±0.25 / 40.62 ms │     no change │
│ QQuery 34 │        10.31 / 10.50 ±0.26 / 11.01 ms │        10.47 / 10.73 ±0.18 / 11.01 ms │     no change │
│ QQuery 35 │        83.09 / 83.72 ±0.55 / 84.67 ms │        82.56 / 83.96 ±1.97 / 87.88 ms │     no change │
│ QQuery 36 │           6.89 / 7.03 ±0.10 / 7.18 ms │           6.64 / 6.83 ±0.16 / 7.08 ms │     no change │
│ QQuery 37 │           7.77 / 7.96 ±0.16 / 8.16 ms │           7.42 / 7.61 ±0.13 / 7.80 ms │     no change │
│ QQuery 38 │        71.83 / 73.16 ±0.99 / 74.49 ms │        71.32 / 72.19 ±0.75 / 73.44 ms │     no change │
│ QQuery 39 │     104.30 / 104.97 ±0.80 / 106.47 ms │     103.35 / 104.38 ±1.31 / 106.92 ms │     no change │
│ QQuery 40 │        24.20 / 25.24 ±0.95 / 27.03 ms │        23.95 / 24.62 ±0.41 / 25.07 ms │     no change │
│ QQuery 41 │        15.12 / 15.51 ±0.44 / 16.32 ms │        14.79 / 14.92 ±0.08 / 14.98 ms │     no change │
│ QQuery 42 │        24.60 / 24.90 ±0.26 / 25.38 ms │        24.50 / 24.62 ±0.11 / 24.78 ms │     no change │
│ QQuery 43 │           5.58 / 5.68 ±0.11 / 5.89 ms │           5.44 / 5.55 ±0.11 / 5.77 ms │     no change │
│ QQuery 44 │        11.79 / 11.91 ±0.15 / 12.19 ms │        11.29 / 11.53 ±0.14 / 11.70 ms │     no change │
│ QQuery 45 │        44.09 / 44.44 ±0.39 / 45.11 ms │        43.10 / 43.84 ±0.53 / 44.61 ms │     no change │
│ QQuery 46 │        14.13 / 14.40 ±0.19 / 14.60 ms │        14.29 / 14.51 ±0.22 / 14.84 ms │     no change │
│ QQuery 47 │     246.92 / 250.95 ±2.78 / 254.26 ms │     247.66 / 249.87 ±1.67 / 252.28 ms │     no change │
│ QQuery 48 │     105.51 / 107.11 ±1.40 / 108.95 ms │     105.60 / 106.08 ±0.70 / 107.43 ms │     no change │
│ QQuery 49 │        81.35 / 81.88 ±0.48 / 82.77 ms │        81.78 / 83.00 ±1.19 / 84.99 ms │     no change │
│ QQuery 50 │        61.08 / 63.04 ±2.47 / 67.54 ms │        60.83 / 61.58 ±0.48 / 62.03 ms │     no change │
│ QQuery 51 │       91.92 / 95.87 ±2.68 / 100.23 ms │        92.21 / 95.20 ±2.12 / 98.40 ms │     no change │
│ QQuery 52 │        24.76 / 25.19 ±0.28 / 25.51 ms │        24.46 / 24.68 ±0.12 / 24.82 ms │     no change │
│ QQuery 53 │        31.34 / 31.49 ±0.14 / 31.67 ms │        30.85 / 31.13 ±0.15 / 31.28 ms │     no change │
│ QQuery 54 │        56.47 / 58.07 ±2.35 / 62.73 ms │        55.60 / 56.48 ±0.63 / 57.22 ms │     no change │
│ QQuery 55 │        24.10 / 24.37 ±0.24 / 24.75 ms │        23.99 / 24.93 ±1.19 / 27.24 ms │     no change │
│ QQuery 56 │        41.50 / 41.69 ±0.21 / 42.07 ms │        40.43 / 40.96 ±0.55 / 41.97 ms │     no change │
│ QQuery 57 │     182.97 / 187.00 ±4.47 / 195.47 ms │     182.33 / 184.30 ±1.42 / 185.96 ms │     no change │
│ QQuery 58 │     121.88 / 123.01 ±0.66 / 123.94 ms │     118.32 / 118.81 ±0.44 / 119.43 ms │     no change │
│ QQuery 59 │     119.94 / 121.35 ±1.30 / 123.60 ms │     120.05 / 121.08 ±0.54 / 121.58 ms │     no change │
│ QQuery 60 │        41.03 / 41.49 ±0.34 / 41.96 ms │        40.60 / 41.11 ±0.41 / 41.63 ms │     no change │
│ QQuery 61 │        14.59 / 14.84 ±0.16 / 15.03 ms │        14.25 / 14.42 ±0.15 / 14.69 ms │     no change │
│ QQuery 62 │        48.34 / 49.74 ±2.01 / 53.72 ms │        46.76 / 47.31 ±0.38 / 47.94 ms │     no change │
│ QQuery 63 │        31.51 / 31.66 ±0.13 / 31.86 ms │        31.18 / 31.37 ±0.12 / 31.56 ms │     no change │
│ QQuery 64 │     472.81 / 475.03 ±2.39 / 479.33 ms │     469.27 / 473.93 ±3.87 / 480.69 ms │     no change │
│ QQuery 65 │     148.12 / 150.45 ±2.50 / 155.04 ms │     145.47 / 148.09 ±1.63 / 150.05 ms │     no change │
│ QQuery 66 │        86.21 / 86.72 ±0.49 / 87.54 ms │        84.14 / 86.77 ±3.32 / 93.25 ms │     no change │
│ QQuery 67 │     263.65 / 268.45 ±5.14 / 276.94 ms │     265.24 / 268.40 ±3.41 / 274.38 ms │     no change │
│ QQuery 68 │        14.52 / 14.63 ±0.07 / 14.72 ms │        14.41 / 14.59 ±0.11 / 14.74 ms │     no change │
│ QQuery 69 │        78.63 / 80.31 ±2.61 / 85.47 ms │        78.16 / 78.78 ±0.71 / 80.15 ms │     no change │
│ QQuery 70 │     110.40 / 111.99 ±2.23 / 116.39 ms │     106.52 / 113.35 ±5.99 / 122.62 ms │     no change │
│ QQuery 71 │        36.58 / 36.65 ±0.08 / 36.81 ms │        36.06 / 36.89 ±1.20 / 39.27 ms │     no change │
│ QQuery 72 │ 2124.12 / 2213.61 ±52.37 / 2274.74 ms │ 2128.60 / 2246.57 ±86.51 / 2376.59 ms │     no change │
│ QQuery 73 │        10.16 / 12.84 ±4.76 / 22.33 ms │        10.19 / 10.34 ±0.14 / 10.56 ms │ +1.24x faster │
│ QQuery 74 │     199.80 / 201.22 ±1.44 / 203.76 ms │     191.46 / 194.95 ±3.51 / 201.50 ms │     no change │
│ QQuery 75 │     150.81 / 151.60 ±0.86 / 153.28 ms │     150.47 / 151.76 ±1.61 / 154.89 ms │     no change │
│ QQuery 76 │        36.12 / 36.64 ±0.31 / 37.09 ms │        35.69 / 36.18 ±0.26 / 36.50 ms │     no change │
│ QQuery 77 │        62.90 / 63.66 ±0.71 / 64.76 ms │        62.60 / 64.50 ±2.30 / 68.90 ms │     no change │
│ QQuery 78 │     195.26 / 197.04 ±1.08 / 198.30 ms │     194.61 / 196.69 ±2.28 / 200.53 ms │     no change │
│ QQuery 79 │        69.58 / 70.35 ±1.34 / 73.02 ms │        68.30 / 68.77 ±0.40 / 69.47 ms │     no change │
│ QQuery 80 │     102.98 / 105.49 ±2.91 / 111.13 ms │     102.14 / 103.77 ±1.11 / 104.90 ms │     no change │
│ QQuery 81 │        25.68 / 25.94 ±0.20 / 26.16 ms │        25.60 / 26.79 ±1.36 / 29.40 ms │     no change │
│ QQuery 82 │        17.16 / 17.64 ±0.56 / 18.75 ms │        17.46 / 17.77 ±0.20 / 18.10 ms │     no change │
│ QQuery 83 │        38.74 / 39.74 ±1.58 / 42.90 ms │        38.51 / 39.18 ±0.48 / 39.94 ms │     no change │
│ QQuery 84 │        44.42 / 44.79 ±0.28 / 45.24 ms │        44.45 / 45.42 ±1.34 / 48.06 ms │     no change │
│ QQuery 85 │     138.86 / 140.46 ±1.59 / 142.99 ms │     139.50 / 140.45 ±0.69 / 141.52 ms │     no change │
│ QQuery 86 │        26.00 / 26.16 ±0.20 / 26.53 ms │        25.83 / 26.26 ±0.48 / 27.18 ms │     no change │
│ QQuery 87 │        71.92 / 72.75 ±0.65 / 73.88 ms │        71.98 / 73.19 ±0.78 / 74.29 ms │     no change │
│ QQuery 88 │        66.48 / 67.90 ±2.33 / 72.54 ms │        66.38 / 67.13 ±0.41 / 67.54 ms │     no change │
│ QQuery 89 │        36.94 / 37.35 ±0.32 / 37.86 ms │        37.34 / 37.97 ±0.53 / 38.94 ms │     no change │
│ QQuery 90 │        18.66 / 20.03 ±2.36 / 24.75 ms │        18.37 / 18.74 ±0.25 / 18.99 ms │ +1.07x faster │
│ QQuery 91 │        53.43 / 54.01 ±0.37 / 54.49 ms │        53.32 / 54.21 ±0.86 / 55.81 ms │     no change │
│ QQuery 92 │        30.22 / 30.48 ±0.30 / 30.98 ms │        29.94 / 30.37 ±0.29 / 30.73 ms │     no change │
│ QQuery 93 │        51.27 / 51.95 ±0.47 / 52.58 ms │        51.59 / 51.84 ±0.20 / 52.21 ms │     no change │
│ QQuery 94 │        38.73 / 39.16 ±0.30 / 39.59 ms │        39.33 / 39.71 ±0.39 / 40.46 ms │     no change │
│ QQuery 95 │        86.08 / 86.97 ±1.13 / 89.09 ms │        85.89 / 87.13 ±1.08 / 89.05 ms │     no change │
│ QQuery 96 │        25.20 / 25.68 ±0.39 / 26.34 ms │        25.14 / 25.36 ±0.18 / 25.63 ms │     no change │
│ QQuery 97 │        47.21 / 47.48 ±0.18 / 47.76 ms │        46.99 / 47.62 ±0.48 / 48.38 ms │     no change │
│ QQuery 98 │        42.78 / 44.13 ±0.88 / 45.46 ms │        43.48 / 44.16 ±0.79 / 45.65 ms │     no change │
│ QQuery 99 │        72.18 / 72.46 ±0.23 / 72.77 ms │        70.81 / 71.74 ±0.65 / 72.53 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 10886.26ms │
│ Total Time (neilc_perf-predicate-reorder)   │ 10860.81ms │
│ Average Time (HEAD)                         │   109.96ms │
│ Average Time (neilc_perf-predicate-reorder) │   109.71ms │
│ Queries Faster                              │          2 │
│ Queries Slower                              │          0 │
│ Queries with No Change                      │         97 │
│ Queries with Failure                        │          0 │
└─────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 55.0s
Peak memory 6.6 GiB
Avg memory 6.0 GiB
CPU user 244.9s
CPU sys 5.3s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 55.0s
Peak memory 6.8 GiB
Avg memory 6.2 GiB
CPU user 241.2s
CPU sys 5.4s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and neilc_perf-predicate-reorder
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃          neilc_perf-predicate-reorder ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.22 / 4.74 ±6.91 / 18.56 ms │          1.25 / 4.81 ±6.93 / 18.67 ms │     no change │
│ QQuery 1  │        12.37 / 12.77 ±0.25 / 13.07 ms │        12.58 / 13.04 ±0.29 / 13.36 ms │     no change │
│ QQuery 2  │        37.17 / 37.55 ±0.38 / 38.07 ms │        35.95 / 36.20 ±0.38 / 36.95 ms │     no change │
│ QQuery 3  │        31.03 / 31.79 ±0.54 / 32.61 ms │        31.88 / 32.84 ±0.83 / 34.34 ms │     no change │
│ QQuery 4  │    222.56 / 236.76 ±12.87 / 259.32 ms │     240.83 / 245.21 ±3.72 / 249.89 ms │     no change │
│ QQuery 5  │     268.66 / 280.28 ±8.27 / 292.42 ms │     288.03 / 295.85 ±4.28 / 299.28 ms │  1.06x slower │
│ QQuery 6  │           6.28 / 6.71 ±0.29 / 7.05 ms │           6.14 / 6.63 ±0.55 / 7.63 ms │     no change │
│ QQuery 7  │        13.66 / 13.77 ±0.09 / 13.87 ms │        14.30 / 14.59 ±0.20 / 14.89 ms │  1.06x slower │
│ QQuery 8  │    321.47 / 333.23 ±12.39 / 350.16 ms │     321.88 / 331.11 ±9.96 / 344.81 ms │     no change │
│ QQuery 9  │     458.83 / 467.84 ±8.18 / 482.64 ms │     456.02 / 464.06 ±8.41 / 478.35 ms │     no change │
│ QQuery 10 │        69.71 / 71.87 ±2.04 / 75.62 ms │        70.19 / 72.66 ±1.52 / 74.97 ms │     no change │
│ QQuery 11 │        81.30 / 85.67 ±4.61 / 93.63 ms │        84.86 / 85.48 ±0.46 / 86.07 ms │     no change │
│ QQuery 12 │    263.44 / 273.85 ±11.80 / 296.32 ms │     285.50 / 290.38 ±5.13 / 299.42 ms │  1.06x slower │
│ QQuery 13 │    358.03 / 375.84 ±15.07 / 395.05 ms │     368.44 / 378.59 ±7.33 / 387.11 ms │     no change │
│ QQuery 14 │    276.98 / 287.98 ±11.44 / 307.13 ms │     283.64 / 288.24 ±4.80 / 296.63 ms │     no change │
│ QQuery 15 │    273.21 / 291.43 ±12.39 / 312.00 ms │    267.62 / 284.47 ±12.61 / 298.99 ms │     no change │
│ QQuery 16 │     618.62 / 637.72 ±9.77 / 646.04 ms │    609.40 / 637.13 ±19.89 / 657.85 ms │     no change │
│ QQuery 17 │     638.36 / 643.07 ±3.72 / 648.67 ms │    621.87 / 642.25 ±11.97 / 659.27 ms │     no change │
│ QQuery 18 │ 1311.64 / 1332.85 ±13.92 / 1350.76 ms │ 1280.47 / 1301.21 ±20.06 / 1331.61 ms │     no change │
│ QQuery 19 │        28.71 / 32.34 ±3.96 / 37.75 ms │        27.90 / 30.31 ±4.63 / 39.58 ms │ +1.07x faster │
│ QQuery 20 │    525.47 / 540.51 ±15.98 / 568.87 ms │     520.45 / 525.64 ±3.61 / 530.08 ms │     no change │
│ QQuery 21 │     600.54 / 602.49 ±1.28 / 604.28 ms │     523.74 / 530.97 ±6.28 / 542.27 ms │ +1.13x faster │
│ QQuery 22 │ 1067.19 / 1078.47 ±11.06 / 1098.24 ms │ 1003.29 / 1019.80 ±13.00 / 1038.00 ms │ +1.06x faster │
│ QQuery 23 │ 3300.58 / 3323.51 ±14.12 / 3342.32 ms │ 3201.35 / 3237.89 ±33.05 / 3291.06 ms │     no change │
│ QQuery 24 │        42.48 / 42.84 ±0.25 / 43.18 ms │        42.73 / 46.60 ±4.07 / 54.29 ms │  1.09x slower │
│ QQuery 25 │     112.77 / 118.77 ±6.76 / 131.94 ms │     111.16 / 115.54 ±2.72 / 118.42 ms │     no change │
│ QQuery 26 │        42.53 / 43.52 ±0.64 / 44.23 ms │        43.72 / 44.74 ±0.93 / 46.15 ms │     no change │
│ QQuery 27 │    667.92 / 684.46 ±10.62 / 700.94 ms │     682.06 / 687.57 ±2.89 / 690.26 ms │     no change │
│ QQuery 28 │ 3040.07 / 3074.66 ±22.96 / 3107.51 ms │ 3046.17 / 3085.16 ±28.67 / 3122.35 ms │     no change │
│ QQuery 29 │        42.87 / 52.04 ±7.54 / 60.33 ms │       41.55 / 59.10 ±13.93 / 80.53 ms │  1.14x slower │
│ QQuery 30 │     311.80 / 320.81 ±5.78 / 329.40 ms │     301.57 / 314.79 ±9.17 / 325.38 ms │     no change │
│ QQuery 31 │     288.02 / 297.61 ±8.23 / 309.39 ms │    288.80 / 301.87 ±11.18 / 317.31 ms │     no change │
│ QQuery 32 │   943.55 / 967.83 ±24.90 / 1005.96 ms │   975.64 / 987.07 ±10.39 / 1000.10 ms │     no change │
│ QQuery 33 │ 1503.31 / 1515.49 ±10.58 / 1530.55 ms │ 1414.56 / 1468.40 ±48.86 / 1550.63 ms │     no change │
│ QQuery 34 │ 1421.13 / 1504.85 ±50.78 / 1581.24 ms │ 1497.60 / 1518.62 ±18.77 / 1546.10 ms │     no change │
│ QQuery 35 │    302.24 / 326.70 ±37.12 / 400.16 ms │    297.34 / 306.06 ±10.18 / 325.93 ms │ +1.07x faster │
│ QQuery 36 │        71.84 / 77.86 ±5.97 / 86.60 ms │        67.74 / 72.53 ±5.42 / 82.60 ms │ +1.07x faster │
│ QQuery 37 │        37.34 / 38.34 ±1.13 / 40.37 ms │        36.42 / 38.71 ±2.87 / 44.15 ms │     no change │
│ QQuery 38 │        41.04 / 43.54 ±3.00 / 48.79 ms │        42.70 / 49.91 ±6.17 / 59.73 ms │  1.15x slower │
│ QQuery 39 │     144.06 / 154.30 ±7.01 / 161.17 ms │     143.14 / 156.98 ±9.25 / 172.22 ms │     no change │
│ QQuery 40 │        15.12 / 15.46 ±0.38 / 16.14 ms │        14.88 / 17.23 ±2.78 / 22.47 ms │  1.11x slower │
│ QQuery 41 │        14.41 / 17.27 ±5.19 / 27.65 ms │        14.51 / 18.83 ±4.97 / 26.98 ms │  1.09x slower │
│ QQuery 42 │        14.02 / 14.26 ±0.18 / 14.57 ms │        14.02 / 14.14 ±0.11 / 14.32 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 20313.62ms │
│ Total Time (neilc_perf-predicate-reorder)   │ 20073.22ms │
│ Average Time (HEAD)                         │   472.41ms │
│ Average Time (neilc_perf-predicate-reorder) │   466.82ms │
│ Queries Faster                              │          5 │
│ Queries Slower                              │          8 │
│ Queries with No Change                      │         30 │
│ Queries with Failure                        │          0 │
└─────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.8 GiB
Avg memory 23.1 GiB
CPU user 1052.2s
CPU sys 69.3s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 105.0s
Peak memory 31.6 GiB
Avg memory 23.3 GiB
CPU user 1040.0s
CPU sys 69.7s
Peak spill 0 B

File an issue against this benchmark runner

@asolimando
Copy link
Copy Markdown
Member

I agree on the general idea, but considering short-circuiting and also conjuncts' selectivity, this can actually backfire in practice, so a config knob seems important to have (I might have missed it, but I couldn't see it in the PR).

Ideally reorder_predicates should be over ridable, to give downstream systems a chance to change behavior if needed, but also to complement knowledge about UDFs which core DF can't know.

@neilconway
Copy link
Copy Markdown
Contributor Author

neilconway commented May 18, 2026

@asolimando Thanks for the feedback!

I agree on the general idea, but considering short-circuiting and also conjuncts' selectivity, this can actually backfire in practice, so a config knob seems important to have (I might have missed it, but I couldn't see it in the PR).

It's possible to add a knob if we feel like there is a need for one, but I'd rather not add one reflexively. I made the definition of "cheap" vs. "expensive" very conservative partly in hopes of avoiding a config knob. Looking more closely at the previous rewriting logic, we actually have not been respecting the predicate order in the query text for a while: simplify_predicates already reordered predicates quite freely, and AFAIK no one has complained about that. Users that need to fix an evaluation order should very likely be using CASE ... WHEN anyway.

Ideally reorder_predicates should be over ridable, to give downstream systems a chance to change behavior if needed, but also to complement knowledge about UDFs which core DF can't know.

I think per-UDF extensibility to express some notion of cost or selectivity could definitely make sense, although that's a much bigger task to take on.

@neilconway
Copy link
Copy Markdown
Contributor Author

BTW I checked all of the ClickBench queries were we see minor regressions (40, 41, 38, 24, 12, 5, and 7), and none of them have predicates that will be reordered by this PR. So I suspect those regressions are just noise.

@adriangb
Copy link
Copy Markdown
Contributor

adriangb commented May 18, 2026

Agreed that the regressions look like noise. But also the only real win seems Q73 in tpcds? What is your intuition for where the win is coming from? I'm wondering if it's just happening to hit a positive case that would be handled by #22144 already or if it's completely unrelated (e.g. in a complex join key).

@neilconway
Copy link
Copy Markdown
Contributor Author

neilconway commented May 19, 2026

@adriangb

But also the only real win seems Q73 in tpcds? What is your intuition for where the win is coming from? I'm wondering if it's just happening to hit a positive case that would be handled by #22144 already or if it's completely unrelated (e.g. in a complex join key).

I see consistent improvements on ClickBench Q21 (~10-13%) and Q22 (~5%), which are both cases where we now reorder LIKE predicates after simple comparisons.

Interestingly, this PR does not fire for Q73 in TPC-DS, so I'm not sure what is going on there 😊 I couldn't repro an improvement locally, so I guess it is just benchmark noise.

To see improvements for a broader class of queries, we'd need to extend the heuristics to consider more criteria.

@2010YOUY01
Copy link
Copy Markdown
Contributor

One challenge is that "cheap" means different things depending on where the predicate is evaluated:

  • In FilterExec, for in-memory evaluation, I think this is a great default heuristic.
  • In Parquet decoding with late materialization, things can get trickier. For example, given (c1 LIKE '%foo%bar%') AND (c2 > 0) AND (c3 > 0), if the regex is very selective while the other predicates are not selective and c2 / c3 are heavily compressed, we might want to decode and evaluate regex conjunct first.

Perhaps this kind of reordering could be implemented as a runtime optimization inside FilterExec: for the first batch, track each conjunct's evaluation time and selectivity, then decide the order dynamically. One nice benefit of this approach is that we don't have to hardcode whether an expression is "expensive" or "cheap".

@adriangb
Copy link
Copy Markdown
Contributor

Perhaps this kind of reordering could be implemented as a runtime optimization inside FilterExec: for the first batch, track each conjunct's evaluation time and selectivity, then decide the order dynamically. One nice benefit of this approach is that we don't have to hardcode whether an expression is "expensive" or "cheap".

That is exactly what #22144 does 😃. I think we could re-use pretty much the exact same machinery. It took a lot of iterations to arrive at the right metrics: you want to take into account time spent on compute no just selectivity, etc.

Someone please correct me if I'm wrong but IIRC currently because of the tree structure we compute each side of a binary expression and apply the slice to the array, then compute the next side, etc. I wonder if an approach like apache/arrow-rs#9659 might be helpful to mitigate overheads from non-selective masks?

@asolimando
Copy link
Copy Markdown
Member

Perhaps this kind of reordering could be implemented as a runtime optimization inside FilterExec: for the first batch, track each conjunct's evaluation time and selectivity, then decide the order dynamically. One nice benefit of this approach is that we don't have to hardcode whether an expression is "expensive" or "cheap".

I think it's still useful to be able to re-order "statically" as you might want to use statistics for that, which might be more stable then dynamic approaches, which are usually sensitive to the "shape" of the first part of the data, and the choice is usually not revisited (and even in that case, it might fluctuate, while in some cases the static order could be the optimal one).

I think it's good to have multiple options, as long as downstream users can mix and match what works best for them, and they can "easily" correct course for problematic queries without the need of code changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reorder boolean expressions (including filter predicates) according to evaluation cost / selectivity

5 participants