Skip to content

[spark] Case-insensitive field matching when casting nested struct on write.#7743

Open
JunRuiLee wants to merge 1 commit intoapache:masterfrom
JunRuiLee:spark-case-insensitive-nested-struct
Open

[spark] Case-insensitive field matching when casting nested struct on write.#7743
JunRuiLee wants to merge 1 commit intoapache:masterfrom
JunRuiLee:spark-case-insensitive-nested-struct

Conversation

@JunRuiLee
Copy link
Copy Markdown
Contributor

Purpose

When writing BY NAME into a Paimon table whose nested struct field names differ from the source only in case, the insert fails with "field does not exist", because the lookup is case-sensitive and ignores spark.sql.caseSensitive.

Example:

CREATE TABLE t1 (id INT, info STRUCT<Age: INT, Name: STRING>);
CREATE TABLE t2 (id INT, info STRUCT<name: STRING, age: INT>);
INSERT INTO t1 VALUES (1, struct(30, 'Alice'));
INSERT INTO t2 BY NAME SELECT * FROM t1;  -- fails

Tests

  • insert by name with case-insensitive nested struct field matching
  • insert by name with case-insensitive matching inside array<struct<...>>
  • insert by name with case-insensitive matching inside nested struct
  • insert by name rejects ambiguous source nested struct fields
  • insert by name rejects nested struct target fields colliding under resolver

… write

  Writing BY NAME into a Paimon table whose nested struct field names
  differ from the source only in case failed with "field does not exist",
  because addCastToStructByName used case-sensitive lookups and ignored
  spark.sql.caseSensitive.

  - Use conf.resolver for source-field lookup and the extra-field guard.
  - Reject ambiguous source matches and conflicting target field names
    under the active resolver, mirroring the top-level column-name-conflict
    behavior.
  - Add a struct length check to schemaCompatible so paimonWriteResolved
    no longer short-circuits past the new nested validations when source
    and target struct arities differ.
  - Stop casting the intermediate struct before recursing; Spark casts
    structs positionally, which would silently misread fields when the
    nested layout is reordered. Descend over the original source layout
    and cast only at leaf fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant