Defer task spawning in CoalescePartitionsExec to first poll#21326
Defer task spawning in CoalescePartitionsExec to first poll#21326Dandandan wants to merge 1 commit intoapache:mainfrom
Conversation
Previously, CoalescePartitionsExec eagerly spawned tasks for all input partitions in execute(). This meant that even if the output stream was never polled (e.g., query cancelled), all tasks would be spawned. This changes the multi-partition path to use a lazy stream that defers spawning until first poll_next(), reducing unnecessary work when streams are created but not consumed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-coalesce-partitions (e2339fb) to 4084a18 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-coalesce-partitions (e2339fb) to 4084a18 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-coalesce-partitions (e2339fb) to 4084a18 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
run benchmark tpcds10 tpch10 |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-coalesce-partitions (e2339fb) to 4084a18 (merge-base) diff using: tpch10 File an issue against this benchmark runner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-coalesce-partitions (e2339fb) to 4084a18 (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
run benchmark tpcds10 |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lazy-coalesce-partitions (e2339fb) to 4084a18 (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch10 — base (merge-base)
tpch10 — branch
File an issue against this benchmark runner |
Which issue does this PR close?
N/A
Rationale for this change
CoalescePartitionsExec::execute()eagerly spawns tasks for all input partitions immediately, even before any batch is polled from the output stream. This wastes resources when streams are created but never consumed (e.g., query cancelled before first poll), and increases peak memory usage.I am benchmarking it to see the current perf impact.
What changes are included in this PR?
Introduces a
CoalescePartitionsStreamthat defers spawning input partition tasks until the firstpoll_next()call, rather than inexecute(). The single-partition fast path is unchanged.How are these changes tested?
All existing tests pass (8/8), covering:
🤖 Generated with Claude Code