Defer task spawning in SortPreservingMergeExec to first poll#21328
Defer task spawning in SortPreservingMergeExec to first poll#21328Dandandan wants to merge 1 commit intoapache:mainfrom
Conversation
Previously, SortPreservingMergeExec eagerly executed all input partitions and spawned buffered tasks in execute(). This meant that even if the output stream was never polled, all tasks would be spawned. This changes the multi-partition path to use a lazy stream that defers spawning and building the streaming merge until first poll_next(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-sort-preserving-merge (6eff550) to 4084a18 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-sort-preserving-merge (6eff550) to 4084a18 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-sort-preserving-merge (6eff550) to 4084a18 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-sort-preserving-merge (6eff550) to 4084a18 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-sort-preserving-merge (6eff550) to 4084a18 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-sort-preserving-merge (6eff550) to 4084a18 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
N/A
Rationale for this change
SortPreservingMergeExec::execute()eagerly executes all input partitions and spawns buffered tasks immediately, even before any batch is polled from the output stream. This wastes resources when streams are created but never consumed (e.g., query cancelled before first poll), and creates an unnecessary burst of concurrent tasks.What changes are included in this PR?
Introduces a
LazySortPreservingMergeStreamthat defers executing input partitions, spawning buffered tasks, and building the streaming merge until the firstpoll_next()call. The single-partition and zero-partition paths are unchanged.How are these changes tested?
All existing tests pass (17/17), covering:
🤖 Generated with Claude Code