Skip to content

feat: UNNEST with ordinality and offset support#22310

Open
MartinSahlen wants to merge 3 commits into
apache:mainfrom
MartinSahlen:feat/unnest-with-ordinality-and-offset
Open

feat: UNNEST with ordinality and offset support#22310
MartinSahlen wants to merge 3 commits into
apache:mainfrom
MartinSahlen:feat/unnest-with-ordinality-and-offset

Conversation

@MartinSahlen
Copy link
Copy Markdown

Which issue does this PR close?

  • Closes #.

Rationale for this change

UNNEST as a table factor previously rejected both WITH ORDINALITY (Postgres / SQL standard) and WITH OFFSET (BigQuery) with not_impl_err!. Both spellings express the same semantic need — emit a per-element position
alongside the unnested value — and the SQL parser (sqlparser-rs) already produces a single AST that distinguishes them by keyword.

Supporting both is justified by:

  • They're syntactic siblings, not alternatives. sqlparser-rs parses both into the same TableFactor::UNNEST. The physical execution and logical-plan shape are identical modulo a constant index base.
  • No SQL dialect supports both at once; each dialect picks one. Accepting both makes DataFusion a clean target for queries written against either Postgres/Trino (WITH ORDINALITY, 1-indexed) or BigQuery (WITH OFFSET,
    0-indexed), with no rewriting required.
  • The two keywords carry their standard semantics, not a configurable flag: WITH ORDINALITY is always 1-indexed (SQL:2003), WITH OFFSET is always 0-indexed (BigQuery). This mirrors BigQuery's own precedent for array
    indexing (arr[OFFSET(0)] vs arr[ORDINAL(1)]) — keyword carries the semantics, no surprise.
  • The two are mutually exclusive in the same statement; the planner rejects them combined.

What changes are included in this PR?

  • datafusion-common: IndexBase enum (Zero / One), PositionColumn { name, base }, UnnestOptions.position: Option, builder method with_position.
  • Logical plan: Unnest::try_new appends a nullable Int64 field to the output schema when options.position is set.
  • SQL planner (relation/mod.rs): handles WITH ORDINALITY and WITH OFFSET [alias], defaults position column to "ordinality" / "offset", rejects the both-at-once combination. Postgres-style AS t(v, ord) column-list aliasing
    works through the existing alias mechanism.
  • try_process_unnest threads the position option through to unnest_columns_with_options and projects the position column on the outer SELECT.
  • Physical plan (UnnestExec): create_position_indices materializes the position column at the leaf unnest level, using the supplied IndexBase.
  • Proto: new IndexBase enum and PositionColumn message; UnnestOptions.position round-trips end-to-end through to_proto / from_proto.
  • Not included: unparser support (kept best-effort; can be added later based on feedback).

Are these changes tested?

Yes:

  • datafusion/sqllogictest/test_files/unnest.slt: execution result tests for WITH OFFSET, WITH OFFSET , WITH ORDINALITY, WITH ORDINALITY AS t(v, ord), plus both error cases.
  • datafusion/physical-plan/src/unnest.rs: unit tests for create_position_indices (0-indexed and 1-indexed).
  • datafusion/proto/tests/cases/roundtrip_logical_plan.rs: logical-plan proto round-trip tests for both spellings.
  • The prior expected-failure cases at unnest.slt:425-430 are now positive cases.

Are there any user-facing changes?

Yes — the SQL surface accepts UNNEST(...) WITH ORDINALITY [AS t(value_alias, ord_alias)] and UNNEST(...) WITH OFFSET [alias] in the FROM clause. The UnnestOptions public type gains a position field (additive, default
None); existing DataFrame callers are source-compatible.

@github-actions github-actions Bot added sql SQL Planner logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) common Related to common crate proto Related to proto crate physical-plan Changes to the physical-plan crate labels May 17, 2026
@github-actions
Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-common v53.1.0 (current)
       Built [  35.770s] (current)
     Parsing datafusion-common v53.1.0 (current)
      Parsed [   0.060s] (current)
    Building datafusion-common v53.1.0 (baseline)
       Built [  31.541s] (baseline)
     Parsing datafusion-common v53.1.0 (baseline)
      Parsed [   0.060s] (baseline)
    Checking datafusion-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.895s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field UnnestOptions.position in /home/runner/work/datafusion/datafusion/datafusion/common/src/unnest.rs:84

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  70.232s] datafusion-common
    Building datafusion-expr v53.1.0 (current)
       Built [  25.156s] (current)
     Parsing datafusion-expr v53.1.0 (current)
      Parsed [   0.075s] (current)
    Building datafusion-expr v53.1.0 (baseline)
       Built [  25.146s] (baseline)
     Parsing datafusion-expr v53.1.0 (baseline)
      Parsed [   0.077s] (baseline)
    Checking datafusion-expr v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   1.706s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  54.090s] datafusion-expr
    Building datafusion-physical-plan v53.1.0 (current)
       Built [  31.056s] (current)
     Parsing datafusion-physical-plan v53.1.0 (current)
      Parsed [   0.129s] (current)
    Building datafusion-physical-plan v53.1.0 (baseline)
       Built [  31.083s] (baseline)
     Parsing datafusion-physical-plan v53.1.0 (baseline)
      Parsed [   0.132s] (baseline)
    Checking datafusion-physical-plan v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.844s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  65.478s] datafusion-physical-plan
    Building datafusion-proto v53.1.0 (current)
       Built [  52.352s] (current)
     Parsing datafusion-proto v53.1.0 (current)
      Parsed [   0.142s] (current)
    Building datafusion-proto v53.1.0 (baseline)
       Built [  52.127s] (baseline)
     Parsing datafusion-proto v53.1.0 (baseline)
      Parsed [   0.142s] (baseline)
    Checking datafusion-proto v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   2.463s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field UnnestOptions.position in /home/runner/work/datafusion/datafusion/datafusion/proto/src/generated/prost.rs:538
  field UnnestOptions.position in /home/runner/work/datafusion/datafusion/datafusion/proto/src/generated/prost.rs:538

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [ 110.120s] datafusion-proto
    Building datafusion-sql v53.1.0 (current)
       Built [  38.989s] (current)
     Parsing datafusion-sql v53.1.0 (current)
      Parsed [   0.031s] (current)
    Building datafusion-sql v53.1.0 (baseline)
       Built [  38.759s] (baseline)
     Parsing datafusion-sql v53.1.0 (baseline)
      Parsed [   0.033s] (baseline)
    Checking datafusion-sql v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.315s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  79.826s] datafusion-sql
    Building datafusion-sqllogictest v53.1.0 (current)
       Built [ 165.278s] (current)
     Parsing datafusion-sqllogictest v53.1.0 (current)
      Parsed [   0.022s] (current)
    Building datafusion-sqllogictest v53.1.0 (baseline)
       Built [ 165.332s] (baseline)
     Parsing datafusion-sqllogictest v53.1.0 (baseline)
      Parsed [   0.023s] (baseline)
    Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.114s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 334.648s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change common Related to common crate logical-expr Logical plan and expressions physical-plan Changes to the physical-plan crate proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant