Tracking issue for follow-up work surfaced by the cast expression audit in #4493. Each item is a support-level / serde consistency fix that the audit deliberately deferred because CometCast.scala is load-bearing.
The relevant file is spark/src/main/scala/org/apache/comet/expressions/CometCast.scala.
High priority
1. canCastFromFloat / canCastFromDouble / canCastFromDecimal ignore evalMode
canCastFromFloat (CometCast.scala:376-385), canCastFromDouble (CometCast.scala:387-395), and canCastFromDecimal (CometCast.scala:397-403) take no evalMode argument, even though their callers in isSupported (CometCast.scala:176, 188, 190) thread evalMode through every other arm. Spark's overflow / null behaviour for float and double to integral casts differs between LEGACY, ANSI, and TRY, and the native side has explicit per-evalMode branches for these narrowing casts. The support level should reflect the eval mode so that EXPLAIN and the auto-generated compatibility doc carry the right reason per mode, and so the support matrix matches what the native code actually does.
2. getUnsupportedReasons / getIncompatibleReasons do not enumerate per-pair reasons
getUnsupportedReasons (CometCast.scala:55-56) and getIncompatibleReasons (CometCast.scala:51-53) return a single generic sentence each. The per-pair isSupported matrix returns specific Unsupported(Some(reason)) / Incompatible(Some(reason)) values that never reach the static API, so the auto-generated compatibility docs and the dispatcher cannot surface them. The static reasons should enumerate the actual per-pair reasons (or the API should be changed so callers get the per-pair reason directly).
3. case _: Incompatible() with no reason in the array-of-binary branch
CometCast.scala:154-155 returns a bare Incompatible() for CAST(ARRAY<BINARY> AS STRING) with no reason string. The audit skill requires every Incompatible to carry a concise reason that names the divergence, so EXPLAIN and the compatibility doc can show why. The reason here is the same String::from_utf8_unchecked UB tracked in #4488; fill it in and cross-reference that issue.
Medium priority
4. Literal short-circuit may swallow per-pair Incompatible reasons
getSupportLevel (CometCast.scala:58-66) short-circuits any cast.child.isInstanceOf[Literal] to Compatible() and defers to CometLiteral. For cases that the per-pair matrix would have marked Incompatible(Some(...)) (for example float-to-decimal rounding under #1371), the short-circuit hides the reason from EXPLAIN even though the runtime semantics still differ. Worth confirming whether CometLiteral re-checks the (from, to) pair or whether literal casts genuinely bypass these divergences, and if not, route through isSupported instead.
Surfaced by the audit-comet-expression skill run in #4493.
Tracking issue for follow-up work surfaced by the cast expression audit in #4493. Each item is a support-level / serde consistency fix that the audit deliberately deferred because
CometCast.scalais load-bearing.The relevant file is
spark/src/main/scala/org/apache/comet/expressions/CometCast.scala.High priority
1.
canCastFromFloat/canCastFromDouble/canCastFromDecimalignoreevalModecanCastFromFloat(CometCast.scala:376-385),canCastFromDouble(CometCast.scala:387-395), andcanCastFromDecimal(CometCast.scala:397-403) take noevalModeargument, even though their callers inisSupported(CometCast.scala:176, 188, 190) threadevalModethrough every other arm. Spark's overflow / null behaviour for float and double to integral casts differs betweenLEGACY,ANSI, andTRY, and the native side has explicit per-evalModebranches for these narrowing casts. The support level should reflect the eval mode so that EXPLAIN and the auto-generated compatibility doc carry the right reason per mode, and so the support matrix matches what the native code actually does.2.
getUnsupportedReasons/getIncompatibleReasonsdo not enumerate per-pair reasonsgetUnsupportedReasons(CometCast.scala:55-56) andgetIncompatibleReasons(CometCast.scala:51-53) return a single generic sentence each. The per-pairisSupportedmatrix returns specificUnsupported(Some(reason))/Incompatible(Some(reason))values that never reach the static API, so the auto-generated compatibility docs and the dispatcher cannot surface them. The static reasons should enumerate the actual per-pair reasons (or the API should be changed so callers get the per-pair reason directly).3.
case _: Incompatible()with no reason in the array-of-binary branchCometCast.scala:154-155returns a bareIncompatible()forCAST(ARRAY<BINARY> AS STRING)with no reason string. The audit skill requires everyIncompatibleto carry a concise reason that names the divergence, so EXPLAIN and the compatibility doc can show why. The reason here is the sameString::from_utf8_uncheckedUB tracked in #4488; fill it in and cross-reference that issue.Medium priority
4. Literal short-circuit may swallow per-pair
IncompatiblereasonsgetSupportLevel(CometCast.scala:58-66) short-circuits anycast.child.isInstanceOf[Literal]toCompatible()and defers toCometLiteral. For cases that the per-pair matrix would have markedIncompatible(Some(...))(for example float-to-decimal rounding under #1371), the short-circuit hides the reason from EXPLAIN even though the runtime semantics still differ. Worth confirming whetherCometLiteralre-checks the (from, to) pair or whether literal casts genuinely bypass these divergences, and if not, route throughisSupportedinstead.Surfaced by the
audit-comet-expressionskill run in #4493.