Skip to content

cast expression audit follow-ups (from #4493) #4501

@andygrove

Description

@andygrove

Tracking issue for follow-up work surfaced by the cast expression audit in #4493. Each item is a support-level / serde consistency fix that the audit deliberately deferred because CometCast.scala is load-bearing.

The relevant file is spark/src/main/scala/org/apache/comet/expressions/CometCast.scala.

High priority

1. canCastFromFloat / canCastFromDouble / canCastFromDecimal ignore evalMode

canCastFromFloat (CometCast.scala:376-385), canCastFromDouble (CometCast.scala:387-395), and canCastFromDecimal (CometCast.scala:397-403) take no evalMode argument, even though their callers in isSupported (CometCast.scala:176, 188, 190) thread evalMode through every other arm. Spark's overflow / null behaviour for float and double to integral casts differs between LEGACY, ANSI, and TRY, and the native side has explicit per-evalMode branches for these narrowing casts. The support level should reflect the eval mode so that EXPLAIN and the auto-generated compatibility doc carry the right reason per mode, and so the support matrix matches what the native code actually does.

2. getUnsupportedReasons / getIncompatibleReasons do not enumerate per-pair reasons

getUnsupportedReasons (CometCast.scala:55-56) and getIncompatibleReasons (CometCast.scala:51-53) return a single generic sentence each. The per-pair isSupported matrix returns specific Unsupported(Some(reason)) / Incompatible(Some(reason)) values that never reach the static API, so the auto-generated compatibility docs and the dispatcher cannot surface them. The static reasons should enumerate the actual per-pair reasons (or the API should be changed so callers get the per-pair reason directly).

3. case _: Incompatible() with no reason in the array-of-binary branch

CometCast.scala:154-155 returns a bare Incompatible() for CAST(ARRAY<BINARY> AS STRING) with no reason string. The audit skill requires every Incompatible to carry a concise reason that names the divergence, so EXPLAIN and the compatibility doc can show why. The reason here is the same String::from_utf8_unchecked UB tracked in #4488; fill it in and cross-reference that issue.

Medium priority

4. Literal short-circuit may swallow per-pair Incompatible reasons

getSupportLevel (CometCast.scala:58-66) short-circuits any cast.child.isInstanceOf[Literal] to Compatible() and defers to CometLiteral. For cases that the per-pair matrix would have marked Incompatible(Some(...)) (for example float-to-decimal rounding under #1371), the short-circuit hides the reason from EXPLAIN even though the runtime semantics still differ. Worth confirming whether CometLiteral re-checks the (from, to) pair or whether literal casts genuinely bypass these divergences, and if not, route through isSupported instead.


Surfaced by the audit-comet-expression skill run in #4493.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions