Tracking issue for follow-up work surfaced by the math expression audit in #4486. Each item below is either a support-level / serde correctness fix that the audit deliberately deferred, or a coverage gap the audit documented but did not implement.
High priority
1. Lift convert-time withInfo fallbacks in arithmetic.scala into getSupportLevel
CometAdd, CometSubtract, CometMultiply, CometDivide, CometIntegralDivide, CometRemainder, CometUnaryMinus, and CometRound (spark/src/main/scala/org/apache/comet/serde/arithmetic.scala:94-271, 300-321, 355) currently bail out from convert with withInfo(...) + return None for cases that are knowable from the expression alone (unsupported datatype, EvalMode.TRY on Remainder, negative-scale Decimal on Round, Float / Double on Round, etc.).
The audit-comet-expression skill requires expression-shape restrictions to be declared as Unsupported(Some(reason)) / Incompatible(Some(reason)) branches in getSupportLevel so that:
- EXPLAIN surfaces the reason at planning time
- the auto-generated compatibility doc picks them up
- the dispatcher can route around them
This was deferred from #4486 because arithmetic.scala is load-bearing and the fix is mechanical but wide.
2. UnaryPositive has no serde on Spark 3.4 / 3.5
Projections containing +col silently disable Comet for the enclosing projection on Spark 3.4 / 3.5 because there is no CometUnaryPositive registration. On Spark 4.0+ UnaryPositive is RuntimeReplaceable and the optimizer removes it before serde, so the gap is transparent there. Wire a passthrough serde for the 3.4 / 3.5 shim path so the projection stays native.
3. hex / unhex Spark 4.x collation propagation
Spark 4.x widens Hex / Unhex input types to StringTypeWithCollation and preserves collation in dataType. CometHex / CometUnhex pass expr.dataType to the native SparkHex / spark_unhex UDFs, which always return Utf8, so collation does not propagate. Per the audit skill, the support level should be Incompatible(Some(reason)) for non-default collations rather than the current Compatible. Cross-reference #2190 / #4496.
Medium priority (robustness / coverage)
4. greatest / least do not gate input types
Registered as bare CometScalarFunction(\"greatest\") / CometScalarFunction(\"least\") in QueryPlanSerde.scala. Spark accepts interval inputs (and a few Spark-specific orderings) that the native UDF may not handle; there is no explicit fallback path, so a runtime error from the native UDF is the failure mode. Add an input-type guard that falls back to Spark for the unsupported cases.
5. ceil(expr, scale) / floor(expr, scale) (RoundCeil / RoundFloor) not wired
The two-argument forms always fall back to Spark. Wire them via the same path as Round.
6. BRound (HALF_EVEN rounding) not wired
Listed as [ ] bround in the support doc. Banker's rounding is independent of the BigDecimal-via-toString issue that blocks Float / Double Round, so the integer / decimal paths could be supported.
7. signum rejects interval inputs
Spark Signum accepts DayTimeIntervalType / YearMonthIntervalType via inputTypes; Comet only handles DoubleType. Either add an explicit fallback or extend the native path.
Surfaced by the audit-comet-expression skill run in #4486.
Tracking issue for follow-up work surfaced by the math expression audit in #4486. Each item below is either a support-level / serde correctness fix that the audit deliberately deferred, or a coverage gap the audit documented but did not implement.
High priority
1. Lift
convert-timewithInfofallbacks inarithmetic.scalaintogetSupportLevelCometAdd,CometSubtract,CometMultiply,CometDivide,CometIntegralDivide,CometRemainder,CometUnaryMinus, andCometRound(spark/src/main/scala/org/apache/comet/serde/arithmetic.scala:94-271, 300-321, 355) currently bail out fromconvertwithwithInfo(...) + return Nonefor cases that are knowable from the expression alone (unsupported datatype,EvalMode.TRYonRemainder, negative-scaleDecimalonRound,Float/DoubleonRound, etc.).The
audit-comet-expressionskill requires expression-shape restrictions to be declared asUnsupported(Some(reason))/Incompatible(Some(reason))branches ingetSupportLevelso that:This was deferred from #4486 because
arithmetic.scalais load-bearing and the fix is mechanical but wide.2.
UnaryPositivehas no serde on Spark 3.4 / 3.5Projections containing
+colsilently disable Comet for the enclosing projection on Spark 3.4 / 3.5 because there is noCometUnaryPositiveregistration. On Spark 4.0+UnaryPositiveisRuntimeReplaceableand the optimizer removes it before serde, so the gap is transparent there. Wire a passthrough serde for the 3.4 / 3.5 shim path so the projection stays native.3.
hex/unhexSpark 4.x collation propagationSpark 4.x widens
Hex/Unhexinput types toStringTypeWithCollationand preserves collation indataType.CometHex/CometUnhexpassexpr.dataTypeto the nativeSparkHex/spark_unhexUDFs, which always returnUtf8, so collation does not propagate. Per the audit skill, the support level should beIncompatible(Some(reason))for non-default collations rather than the currentCompatible. Cross-reference #2190 / #4496.Medium priority (robustness / coverage)
4.
greatest/leastdo not gate input typesRegistered as bare
CometScalarFunction(\"greatest\")/CometScalarFunction(\"least\")inQueryPlanSerde.scala. Spark accepts interval inputs (and a few Spark-specific orderings) that the native UDF may not handle; there is no explicit fallback path, so a runtime error from the native UDF is the failure mode. Add an input-type guard that falls back to Spark for the unsupported cases.5.
ceil(expr, scale)/floor(expr, scale)(RoundCeil / RoundFloor) not wiredThe two-argument forms always fall back to Spark. Wire them via the same path as
Round.6.
BRound(HALF_EVEN rounding) not wiredListed as
[ ] broundin the support doc. Banker's rounding is independent of the BigDecimal-via-toString issue that blocks Float / DoubleRound, so the integer / decimal paths could be supported.7.
signumrejects interval inputsSpark
SignumacceptsDayTimeIntervalType/YearMonthIntervalTypeviainputTypes; Comet only handlesDoubleType. Either add an explicit fallback or extend the native path.Surfaced by the
audit-comet-expressionskill run in #4486.