Skip to content

Math expression audit follow-ups (from #4486) #4500

@andygrove

Description

@andygrove

Tracking issue for follow-up work surfaced by the math expression audit in #4486. Each item below is either a support-level / serde correctness fix that the audit deliberately deferred, or a coverage gap the audit documented but did not implement.

High priority

1. Lift convert-time withInfo fallbacks in arithmetic.scala into getSupportLevel

CometAdd, CometSubtract, CometMultiply, CometDivide, CometIntegralDivide, CometRemainder, CometUnaryMinus, and CometRound (spark/src/main/scala/org/apache/comet/serde/arithmetic.scala:94-271, 300-321, 355) currently bail out from convert with withInfo(...) + return None for cases that are knowable from the expression alone (unsupported datatype, EvalMode.TRY on Remainder, negative-scale Decimal on Round, Float / Double on Round, etc.).

The audit-comet-expression skill requires expression-shape restrictions to be declared as Unsupported(Some(reason)) / Incompatible(Some(reason)) branches in getSupportLevel so that:

  • EXPLAIN surfaces the reason at planning time
  • the auto-generated compatibility doc picks them up
  • the dispatcher can route around them

This was deferred from #4486 because arithmetic.scala is load-bearing and the fix is mechanical but wide.

2. UnaryPositive has no serde on Spark 3.4 / 3.5

Projections containing +col silently disable Comet for the enclosing projection on Spark 3.4 / 3.5 because there is no CometUnaryPositive registration. On Spark 4.0+ UnaryPositive is RuntimeReplaceable and the optimizer removes it before serde, so the gap is transparent there. Wire a passthrough serde for the 3.4 / 3.5 shim path so the projection stays native.

3. hex / unhex Spark 4.x collation propagation

Spark 4.x widens Hex / Unhex input types to StringTypeWithCollation and preserves collation in dataType. CometHex / CometUnhex pass expr.dataType to the native SparkHex / spark_unhex UDFs, which always return Utf8, so collation does not propagate. Per the audit skill, the support level should be Incompatible(Some(reason)) for non-default collations rather than the current Compatible. Cross-reference #2190 / #4496.

Medium priority (robustness / coverage)

4. greatest / least do not gate input types

Registered as bare CometScalarFunction(\"greatest\") / CometScalarFunction(\"least\") in QueryPlanSerde.scala. Spark accepts interval inputs (and a few Spark-specific orderings) that the native UDF may not handle; there is no explicit fallback path, so a runtime error from the native UDF is the failure mode. Add an input-type guard that falls back to Spark for the unsupported cases.

5. ceil(expr, scale) / floor(expr, scale) (RoundCeil / RoundFloor) not wired

The two-argument forms always fall back to Spark. Wire them via the same path as Round.

6. BRound (HALF_EVEN rounding) not wired

Listed as [ ] bround in the support doc. Banker's rounding is independent of the BigDecimal-via-toString issue that blocks Float / Double Round, so the integer / decimal paths could be supported.

7. signum rejects interval inputs

Spark Signum accepts DayTimeIntervalType / YearMonthIntervalType via inputTypes; Comet only handles DoubleType. Either add an explicit fallback or extend the native path.


Surfaced by the audit-comet-expression skill run in #4486.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions