Skip to content

In general the multi group by option seems to make certain scenarios worse. #17850

@alamb

Description

@alamb

In general the multi group by option seems to make certain scenarios worse.
I added a flag for easy control of whether it's used in this commit: ashdnazg@6a0b13b (see enable-gby.patch for patch)
and then checked:

create or replace table foo as select (random() * 4)::integer as int_val, (random() * 4)::integer as int_val2, (random() * 4)::integer as int_val3  from generate_series(1, 1000000000);
set datafusion.execution.enable_multi_group_by to false;
select int_val, int_val2, int_val3 from foo GROUP BY int_val, int_val2, int_val3;
set datafusion.execution.enable_multi_group_by to true;
select int_val, int_val2, int_val3 from foo GROUP BY int_val, int_val2, int_val3;

I get Elapsed 1.037 seconds. with the flag set to false and Elapsed 1.382 seconds. with the flag set to true.

Perhaps the logic for when to use it should be more careful than just "whenever it's supported". But I suspect this PR is not the right spot for that discussion.

Originally posted by @ashdnazg in #17726 (comment)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceMake DataFusion faster
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions