fix(datafusion): keep hot runs in one session by kumarUjjawal · Pull Request #833 · ClickHouse/ClickBench

kumarUjjawal · 2026-04-20T15:29:46Z

Summary

This fixes the partitioned DataFusion runner so hot runs are really hot. See apache/datafusion#21696

Before this change, the script started a new datafusion-cli process for every try. That made each try use a cold process, so tries 2 and 3 did not keep the warmed metadata cache.

What changed

drop OS cache once before each query
build one SQL file with create.sql and the same query 3 times
run that file in one datafusion-cli session
skip the setup Elapsed lines from create.sql
keep failed later tries as null instead of mixing in setup timings

This follows the same main idea used by the DuckDB partitioned runner: one process per query, with repeated runs inside that process.

Thank you for review 🙏

Thank You for Your Contribution!

We appreciate your effort and contribution to the project. To ensure that your Pull Request (PR) adheres to our guidelines, please ensure to review the rules mentioned in our contribution guidelines:

ClickHouse/ClickBench Contribution Rules

Thank you for your attention to these details and for helping us maintain the quality and integrity of the project.

CLAassistant · 2026-04-20T15:29:53Z

All committers have signed the CLA.

kumarUjjawal · 2026-04-22T06:25:17Z

@qoega If I could get your eyes on this?

alamb · 2026-04-27T14:58:03Z

Hi @kumarUjjawal -- thank you. How did you test this change?

Maybe it would be a good exercise to try and get DataFusion 53 numbers with this script? That way we both double check the script works and update the results for DF 53

alamb · 2026-04-27T14:58:42Z

I can try and help do this (run for DF 53) if needed

kumarUjjawal · 2026-04-27T15:19:13Z

Hi @kumarUjjawal -- thank you. How did you test this change?

Maybe it would be a good exercise to try and get DataFusion 53 numbers with this script? That way we both double check the script works and update the results for DF 53

I asked Codex to build a small bash script for local test.

It used a fake datafusion-cli output:

one datafusion-cli session per query
setup timings from create.sql are skipped
if the shared session fails after try 1, the output is [t1, null, null]

If you could help run the DF 53 bench that would be great. 🙏

alamb · 2026-04-27T15:39:13Z

Will try later this week -- basically we need to run it on aws.

kumarUjjawal · 2026-04-27T15:54:38Z

Thank you!

alamb · 2026-04-30T18:48:54Z

I am somewhat overwhelmed this week with other things. I'll see what I can do later.

Maybe @pmcgleenon has some time to review / rerun the numbers (he used to be our star number runner 🦾 )

AdamGS · 2026-05-01T11:58:11Z

I'm running the benchmarks now, I'll test this change out.

AdamGS · 2026-05-01T12:03:38Z

@kumarUjjawal this seems to also be true for the non-partitioned version.
FWIW, a lot of the difference an OS level cache that does persist between runs. running the current setup definitely shows a difference between the first run of every query and the other two.

[0.066, 0.001, 0.001],
[0.142, 0.036, 0.035],
[0.193, 0.060, 0.059],
...

AdamGS · 2026-05-01T12:56:02Z

One weird thing that this change has caused consistently (for both partitioned and single) - the last iteration of q23 (which has a dynamic filter) becomes a lot slower! Something like [52.698, 8.065, 37.231] where the current submissions (and my updated ones with the current scripts) all have a similar patterns of [~55, ~9, ~9].

kumarUjjawal · 2026-05-01T14:51:53Z

Is there some deeper issue here as q23 shouldn't behave like this right?

AdamGS · 2026-05-01T14:55:00Z

yeah definitely, might be already fixed upstream. For now I'm still running everything with the existing scripts, but once I get through that I'll go back and check if it reproduces with DataFusion's latest main.

alamb · 2026-05-01T20:56:00Z

One weird thing that this change has caused consistently (for both partitioned and single) - the last iteration of q23 (which has a dynamic filter) becomes a lot slower! Something like [52.698, 8.065, 37.231] where the current submissions (and my updated ones with the current scripts) all have a similar patterns of [~55, ~9, ~9].

I think they are very dependent on timing and when the file is opened vs when the top value is seen

If the last value is seen quickly enough then it can skip several files entirely, but if the top value isnt' seen fast enough then the files get processed

rschu1ze

I think that for symmetry, it would make sense to apply the same changes to datafusion/run.sh.

rschu1ze · 2026-05-03T16:45:26Z

+
 echo $1
-cat queries.sql | while read -r query; do
+while IFS= read -r query || [ -n "$query" ]; do


It is not clear what IFS means.

rschu1ze · 2026-05-03T16:48:00Z

    echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null

-    echo "$query" > /tmp/query.sql
+    QUERY_FILE="${TMP_DIR}/query-${QUERY_NUM}.sql"


There is too much copying going on here, at least for my taste, and there are no code comments that explain the rationale. Please try to simplify.

fix(datafusion): keep hot runs in one session

b66bef5

kumarUjjawal mentioned this pull request Apr 20, 2026

[EPIC] Make DataFusion the top of the ClickBench Parquet leaderboard apache/datafusion#18489

Open

16 tasks

rschu1ze reviewed May 3, 2026

View reviewed changes

Conversation

kumarUjjawal commented Apr 20, 2026

Summary

What changed

Uh oh!

CLAassistant commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kumarUjjawal commented Apr 22, 2026

Uh oh!

alamb commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Apr 27, 2026

Uh oh!

kumarUjjawal commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Apr 27, 2026

Uh oh!

kumarUjjawal commented Apr 27, 2026

Uh oh!

alamb commented Apr 30, 2026

Uh oh!

AdamGS commented May 1, 2026

Uh oh!

AdamGS commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdamGS commented May 1, 2026

Uh oh!

kumarUjjawal commented May 1, 2026

Uh oh!

AdamGS commented May 1, 2026

Uh oh!

alamb commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rschu1ze left a comment

Choose a reason for hiding this comment

Uh oh!

rschu1ze May 3, 2026

Choose a reason for hiding this comment

Uh oh!

rschu1ze May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented Apr 20, 2026 •

edited

Loading

alamb commented Apr 27, 2026 •

edited

Loading

kumarUjjawal commented Apr 27, 2026 •

edited

Loading

AdamGS commented May 1, 2026 •

edited

Loading

alamb commented May 1, 2026 •

edited

Loading