add more projection pushdown slt tests#20015
Conversation
| ### | ||
| # Test 11.4: Core augmentation with column reordering | ||
| # Projects: s['value'], s['label'], id (note 'id' is LAST in the SELECT list) | ||
| # Predicate: id > 1 | ||
| # When 'id' is added for filtering, columns are reordered to match input schema order. | ||
| # Result: 'id' moves from SELECT position 3 to physical plan index 2. | ||
| # Physical plan shows: FilterExec: id@2 > 1 (note the index after reordering). | ||
| ### | ||
|
|
||
| query TT | ||
| EXPLAIN SELECT s['value'], s['label'], id FROM simple_struct WHERE id > 1; | ||
| ---- | ||
| logical_plan | ||
| 01)Projection: get_field(simple_struct.s, Utf8("value")), get_field(simple_struct.s, Utf8("label")), simple_struct.id | ||
| 02)--Filter: simple_struct.id > Int64(1) | ||
| 03)----TableScan: simple_struct projection=[id, s], partial_filters=[simple_struct.id > Int64(1)] | ||
| physical_plan | ||
| 01)ProjectionExec: expr=[get_field(s@1, value) as simple_struct.s[value], get_field(s@1, label) as simple_struct.s[label], id@0 as id] | ||
| 02)--FilterExec: id@0 > 1 | ||
| 03)----DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/projection_pushdown/simple.parquet]]}, projection=[id, s], file_type=parquet, predicate=id@0 > 1, pruning_predicate=id_null_count@1 != row_count@2 AND id_max@0 > 1, required_guarantees=[] |
There was a problem hiding this comment.
Could you help explain this test to me? Comments state
Physical plan shows: FilterExec: id@2 > 1 (note the index after reordering).
But the plan shows
02)--FilterExec: id@0 > 1
Is this expected?
There was a problem hiding this comment.
Probably should just remove the comment. These were pulled out of another PR and updated, the index will change once the ProjectionExec is pushed through the FilterExec.
Remove comments that imply projections will be pushed down, keeping only generic descriptions of what each test covers. This reflects that these tests are now regression tests rather than optimization development tests. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
|
||
| ### | ||
| # Test 4.4: Projection with duplicate column through Sort | ||
| # The projection expands the logical output (3→4 columns) but reduces physical columns |
There was a problem hiding this comment.
I don't understand the comment "reduces physical columns" as there are three physical column and all three are scanned 🤔
There was a problem hiding this comment.
The point is that after the projection there are 4 columns but before the projection there are 3 columns. But I agree the comment is confusing, I'll reword it
|
|
||
| # Verify correctness | ||
| query IIII | ||
| SELECT col_a, col_b, col_c, col_b as col_b_dup FROM three_cols ORDER BY col_a; |
There was a problem hiding this comment.
given this data was inserted in order of col_a it might be better to check a different sort order (e.g. sort by col_a desc)
Adds more interesting test cases extracted from #19538
The reasoning is that it's (1) better to split the additional tests out from that PR to make that PR easier to review and (2) it's a good thing to document current behavior with complex edge cases like these, so even if that PR doesn't get merged there is still value in having these as regression tests.