perf: Reorder predicates in conjuncts via simple heuristic#22343
perf: Reorder predicates in conjuncts via simple heuristic#22343neilconway wants to merge 6 commits into
Conversation
DataFusion's vectorized AND evaluator already short-circuits the right-hand side when the LHS keeps few rows. Until now the order of conjuncts in a Filter was whatever the user wrote, so expensive predicates like LIKE and regex could run on the full batch even when a cheap comparison would have filtered most rows first. This change classifies each conjunct as cheap or expensive (LIKE, SIMILAR TO, regex operators, scalar functions, and subqueries are expensive; everything else is cheap) and does a stable partition that puts cheap predicates first. The helper reports whether any reorder actually happened so the caller skips rebuilding the conjunction when the input was already cheap-first. On ClickBench (hits_partitioned, 5 iterations) the reorder yields +13-16% on Q21 and +7-9% on Q22, the two queries that mix LIKE with a cheap `<>` predicate; other queries are unchanged within noise.
| let LogicalPlan::Filter(mut filter) = plan else { | ||
| return Ok(Transformed::no(plan)); | ||
| }; | ||
| let plan_schema = Arc::clone(filter.input.schema()); |
There was a problem hiding this comment.
Unrelated to the core change but should save a few cycles.
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing neilc/perf-predicate-reorder (7284fe1) to dc80bd7 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
I agree on the general idea, but considering short-circuiting and also conjuncts' selectivity, this can actually backfire in practice, so a config knob seems important to have (I might have missed it, but I couldn't see it in the PR). Ideally |
|
@asolimando Thanks for the feedback!
It's possible to add a knob if we feel like there is a need for one, but I'd rather not add one reflexively. I made the definition of "cheap" vs. "expensive" very conservative partly in hopes of avoiding a config knob. Looking more closely at the previous rewriting logic, we actually have not been respecting the predicate order in the query text for a while:
I think per-UDF extensibility to express some notion of cost or selectivity could definitely make sense, although that's a much bigger task to take on. |
|
BTW I checked all of the ClickBench queries were we see minor regressions (40, 41, 38, 24, 12, 5, and 7), and none of them have predicates that will be reordered by this PR. So I suspect those regressions are just noise. |
|
Agreed that the regressions look like noise. But also the only real win seems |
I see consistent improvements on ClickBench Q21 (~10-13%) and Q22 (~5%), which are both cases where we now reorder Interestingly, this PR does not fire for Q73 in TPC-DS, so I'm not sure what is going on there 😊 I couldn't repro an improvement locally, so I guess it is just benchmark noise. To see improvements for a broader class of queries, we'd need to extend the heuristics to consider more criteria. |
|
One challenge is that "cheap" means different things depending on where the predicate is evaluated:
Perhaps this kind of reordering could be implemented as a runtime optimization inside |
That is exactly what #22144 does 😃. I think we could re-use pretty much the exact same machinery. It took a lot of iterations to arrive at the right metrics: you want to take into account time spent on compute no just selectivity, etc. Someone please correct me if I'm wrong but IIRC currently because of the tree structure we compute each side of a binary expression and apply the slice to the array, then compute the next side, etc. I wonder if an approach like apache/arrow-rs#9659 might be helpful to mitigate overheads from non-selective masks? |
I think it's still useful to be able to re-order "statically" as you might want to use statistics for that, which might be more stable then dynamic approaches, which are usually sensitive to the "shape" of the first part of the data, and the choice is usually not revisited (and even in that case, it might fluctuate, while in some cases the static order could be the optimal one). I think it's good to have multiple options, as long as downstream users can mix and match what works best for them, and they can "easily" correct course for problematic queries without the need of code changes. |
Which issue does this PR close?
Rationale for this change
If a filter consists of a mix of cheap and expensive predicates, evaluating the cheap predicates first can improve performance, because it reduces the number of rows that the expensive predicate must be evaluated on. This PR implements this idea, by reordering predicates in a conjunction to place "cheap" predicates first.
Predicates are assessed as "cheap" or "expensive" using an intentionally simple heuristic: "cheap" predicates are expressions that consist of only cheap operations like binary comparisons, negations, and casts, and "expensive" predicates are everything else (e.g.,
LIKE, regexp matching, subqueries, and function calls). Importantly, we use a stable sort when reordering predicates, which means that the user-specified order of operations is preserved within these two classes.Arbitrarily more sophisticated schemes for predicting predicate evaluation cost (and selectivity) are possible, but a simple approach seems like a good place to start.
We avoid reordering predicates if the filter contains a volatile expression, to be safe. We could be a bit fancier and reorder conjuncts in the prefix of the filter list before the volatile expression, but we don't attempt to do that for now.
We don't reorder operands to
OR: I believe this would be worth doing if #22342 is implemented.On ClickBench, this improves performance by ~10-13% on Q21 and ~5% on Q22, in both cases by reordering simple comparisons to run before
LIKEpredicates.What changes are included in this PR?
reorder_predicateshelperreorder_predicatesas part of thePushDownFilterrewrite passreorder_predicatesAre these changes tested?
Yes. Added new unit tests for predicate reordering behavior, updated some expected
EXPLAINoutput.Are there any user-facing changes?
Yes. Users that expect their predicates to be evaluated in a strictly left-to-right manner might see changes in performance and/or behavior. Performance changes could be improvements or regressions. Behavioral changes are possible if the query includes fallible operations like certain casts or division by zero. Note that the SQL standard is clear that implementations are allowed to evaluate predicates in any order, so user queries that depend on an evaluation order are fundamentally fragile. Users can rewrite predicates using
CASEif they need to enforce an evaluation order.