Skip to content

Add FixedSizeList struct support for nested schema evolution (planner/runtime parity)#21284

Open
kosiew wants to merge 7 commits intoapache:mainfrom
kosiew:schema-02-20835
Open

Add FixedSizeList struct support for nested schema evolution (planner/runtime parity)#21284
kosiew wants to merge 7 commits intoapache:mainfrom
kosiew:schema-02-20835

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented Mar 31, 2026

Which issue does this PR close?


Rationale for this change

DataFusion already supports recursive schema evolution and casting for nested container types such as List, LargeList, ListView, and Dictionary. However, FixedSizeList was missing equivalent support, leading to inconsistencies when working with nested Struct types inside FixedSizeList columns.

This gap caused:

  • Failure to adapt evolved schemas containing FixedSizeList
  • Missing validation for size mismatches at planning and runtime
  • Inconsistent behavior compared to other list-like container types

This PR brings FixedSizeList to feature parity with other nested container types and ensures consistent behavior across planning and execution.


What changes are included in this PR?

Core functionality

  • Added FixedSizeList support in cast_column via cast_fixed_size_list_column
  • Introduced shared helper downcast_list_array to reduce duplication across list-like casting paths

Validation and compatibility

  • Extended validate_data_type_compatibility to:

    • Enforce FixedSizeList size equality
    • Validate nested field compatibility

Planner/runtime parity

  • Updated requires_nested_struct_cast to include FixedSizeList
  • Ensured planner checks align with runtime casting behavior
  • Added explicit planning-time failure for size mismatch cases

Test coverage

  • Added regression tests for:

    • Successful schema evolution with additive nullable fields
    • Incompatible nested type changes
    • Non-nullable field addition failures
    • FixedSizeList size mismatch failures
  • Extended parquet + schema evolution tests to include FixedSizeList variant

  • Generalized test macros to cover List, LargeList, and FixedSizeList uniformly

SQL logic tests

  • Added end-to-end FixedSizeList schema evolution coverage in sqllogictests

Are these changes tested?

Yes. This PR includes comprehensive test coverage:

  • Unit tests for:

    • Casting behavior
    • Validation logic
    • Planner/runtime mismatch scenarios
  • Integration tests:

    • Parquet schema evolution with FixedSizeList
  • SQL logic tests:

    • End-to-end validation of schema evolution behavior

These tests:

  • Capture previous failure modes
  • Verify correct handling of valid evolution scenarios
  • Ensure invalid transformations fail consistently

Are there any user-facing changes?

Yes.

Users can now:

  • Perform schema evolution on FixedSizeList columns containing Struct values
  • Reliably query datasets where FixedSizeList schemas evolve with additional nullable fields

Additionally:

  • More consistent error messages are returned for invalid casts (e.g., size mismatch)
  • Behavior is now aligned with other list-like container types

No breaking API changes are introduced.


LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 5 commits March 31, 2026 17:35
Implement FixedSizeList casting and validation logic.
Add 5 regression tests for schema evolution scenarios.
Ensure full compatibility checks with existing functionality.
All tests pass with no regressions.
Add SLT tests for fixed-size list schema evolution.
Converted tests include additive nullable fields,
NULL value handling, and field reordering for
comprehensive coverage of schema changes.
Extract shared recursive child-cast for list-like containers.
Add runtime size check for FixedSizeList to enforce consistency
with planning. Update SQL tests to use proper DDL syntax,
add DESCRIBE assertions, and document the fixed-size
assumptions. Note that no SET cleanup is required.
Add reusable downcast helper in nested_struct.rs
to reduce casting boilerplate. Implement planning-side
parity test in cast.rs for FixedSizeList<Struct>
size mismatch rejection. Enhance SQL coverage
in schema_evolution_fixed_size_list.slt with
additional all-null FixedSizeList rows and a
negative case for schema compatibility checks.
Inline redundant casts, merge identical logic for casting.
Shorten comments and reduce test duplication by using
local closures for FixedSizeList setup and simplifying
error assertions. Clean up repetitive commentary in SLT
file while retaining SQL statements and assertions.
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Mar 31, 2026
@kosiew kosiew marked this pull request as ready for review March 31, 2026 12:48
Extend the expr_adapter.rs to accommodate FixedSizeList
alongside List and LargeList. Update shared helpers and
generated test macro for these list types. Integrate the
FixedSizeList happy-path case into the existing schema
evolution tests and remove the duplicate test file.
@github-actions github-actions bot added the core Core DataFusion crate label Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant