Add FixedSizeList struct support for nested schema evolution (planner/runtime parity)#21284
Open
kosiew wants to merge 7 commits intoapache:mainfrom
Open
Add FixedSizeList struct support for nested schema evolution (planner/runtime parity)#21284kosiew wants to merge 7 commits intoapache:mainfrom
kosiew wants to merge 7 commits intoapache:mainfrom
Conversation
Implement FixedSizeList casting and validation logic. Add 5 regression tests for schema evolution scenarios. Ensure full compatibility checks with existing functionality. All tests pass with no regressions.
Add SLT tests for fixed-size list schema evolution. Converted tests include additive nullable fields, NULL value handling, and field reordering for comprehensive coverage of schema changes.
Extract shared recursive child-cast for list-like containers. Add runtime size check for FixedSizeList to enforce consistency with planning. Update SQL tests to use proper DDL syntax, add DESCRIBE assertions, and document the fixed-size assumptions. Note that no SET cleanup is required.
Add reusable downcast helper in nested_struct.rs to reduce casting boilerplate. Implement planning-side parity test in cast.rs for FixedSizeList<Struct> size mismatch rejection. Enhance SQL coverage in schema_evolution_fixed_size_list.slt with additional all-null FixedSizeList rows and a negative case for schema compatibility checks.
Inline redundant casts, merge identical logic for casting. Shorten comments and reduce test duplication by using local closures for FixedSizeList setup and simplifying error assertions. Clean up repetitive commentary in SLT file while retaining SQL statements and assertions.
…ma evolution tests
Extend the expr_adapter.rs to accommodate FixedSizeList alongside List and LargeList. Update shared helpers and generated test macro for these list types. Integrate the FixedSizeList happy-path case into the existing schema evolution tests and remove the duplicate test file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
List<Struct>/ nested container types in Parquet scans #20835Rationale for this change
DataFusion already supports recursive schema evolution and casting for nested container types such as List, LargeList, ListView, and Dictionary. However, FixedSizeList was missing equivalent support, leading to inconsistencies when working with nested Struct types inside FixedSizeList columns.
This gap caused:
This PR brings FixedSizeList to feature parity with other nested container types and ensures consistent behavior across planning and execution.
What changes are included in this PR?
Core functionality
cast_columnviacast_fixed_size_list_columndowncast_list_arrayto reduce duplication across list-like casting pathsValidation and compatibility
Extended
validate_data_type_compatibilityto:Planner/runtime parity
requires_nested_struct_castto include FixedSizeListTest coverage
Added regression tests for:
Extended parquet + schema evolution tests to include FixedSizeList variant
Generalized test macros to cover List, LargeList, and FixedSizeList uniformly
SQL logic tests
Are these changes tested?
Yes. This PR includes comprehensive test coverage:
Unit tests for:
Integration tests:
SQL logic tests:
These tests:
Are there any user-facing changes?
Yes.
Users can now:
Additionally:
No breaking API changes are introduced.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.