Tracking issue for follow-up work surfaced by the date/time expression audit in #4448. Each item below is either a support-level / serde consistency fix that the audit deliberately deferred (because it needs a semantics decision rather than a mechanical edit), or a divergence the audit documented but did not file separately. Findings already covered by an issue filed during the PR (#4449 ANSI next_day, #4450 next_day trim, #4451 ANSI make_date, #3180 Hour/Minute/Second TimestampNTZ, #2013 legacy timezone forms, #2649 non-UTC date_trunc, #16594 / #16577 from_unixtime upstream) are intentionally excluded.
High priority
1. CometUnixTimestamp: getUnsupportedReasons and isSupportedInputType disagree on TimestampNTZType
spark/src/main/scala/org/apache/comet/serde/datetime.scala:309-362. getUnsupportedReasons claims TimestampNTZType is not supported because Comet incorrectly applies timezone conversion to TimestampNTZ values, but isSupportedInputType returns true for TimestampNTZType and getSupportLevel reports Compatible() for it. Either the predicate or the reason is wrong. Per the audit-comet-expression skill, the EXPLAIN-time reason must match the runtime behaviour: if the native path really does apply session-timezone conversion to TIMESTAMP_NTZ (as the documented divergence on unix_timestamp in spark_expressions_support.md suggests), the support level for that input type should be Incompatible(Some(...)) and cross-reference #3180; if it does not, the reason text should be removed.
2. CometDateFormat: gate the Compatible decision in getSupportLevel, not convert
spark/src/main/scala/org/apache/comet/serde/datetime.scala:609-700. getSupportLevel returns Compatible() unconditionally, while convert reads CometConf.isExprAllowIncompat(...) and branches on UTC vs non-UTC session timezone, on whether the format is a whitelisted literal, and on whether the codegen dispatcher is enabled. The audit-comet-expression skill (rule 10) requires expression-shape restrictions to be declared in getSupportLevel so EXPLAIN, the auto-generated compatibility doc, and the dispatcher can all see them. Moving this gating changes the dispatch flow slightly (non-UTC + whitelisted format becomes Incompatible rather than silently routed through the codegen dispatcher) and needs a separate review.
3. Group A codegen-dispatched datetime expressions report Compatible while silently falling back when spark.comet.exec.scalaUDF.codegen.enabled=false
For the codegen-dispatched datetime expressions (notably to_unix_timestamp and date_format per the documented divergence in docs/source/contributor-guide/spark_expressions_support.md), getSupportLevel returns Compatible() while convert returns None when the Scala UDF codegen dispatcher is disabled. The dispatcher flag is not surfaced in EXPLAIN or in the auto-generated compatibility guide, so users see Compatible while the operator actually falls back to Spark. Per the audit skill, this should either be reported as Incompatible(Some(\"requires spark.comet.exec.scalaUDF.codegen.enabled=true\")) or the dispatcher requirement should be surfaced in getCompatibleNotes.
4. CometMakeTimestamp does not honour spark.sql.ansi.enabled
MakeTimestamp honours spark.sql.ansi.enabled (throws on invalid (year, month, day, hour, min, sec[, timezone]), else NULL) in Spark 3.4 / 3.5 / 4.0. The Comet wiring does not check failOnError, so under ANSI mode Comet returns NULL where Spark would throw ansiDateTimeArgumentOutOfRange / invalidFractionOfSecondError. Same shape as the make_date ANSI divergence captured in #4451 / make_date_ansi.sql. Capture an analogous make_timestamp_ansi.sql regression test once a fix lands, and gate the support level appropriately in the meantime.
Medium priority
5. Spark 4.0 StringTypeWithCollation not gated for datetime expressions
The audit documents (in docs/source/contributor-guide/spark_expressions_support.md) that Spark 4.0 widens many string-typed inputTypes on datetime expressions (convert_timezone, date_format, date_trunc, from_unixtime, make_timestamp, next_day, to_unix_timestamp, trunc, unix_timestamp) to StringTypeWithCollation(supportsTrimCollation = true). Per the audit skill (rule 11), non-default collations should flip the support level to Incompatible(Some(...)). Today the serdes accept those inputs without distinguishing collation, so non-default collations are silently treated as compatible. Cross-references #2190 and #4496.
6. CometDateDiff non-wrapping i32 - panics in debug builds on extreme inputs
Documented divergence (line 30 of the PR diff): the native date_diff impl subtracts with i32 -, which Rust panics on in debug builds when the result overflows; Spark wraps. Practically unreachable for DateType inputs (the valid range is far smaller than i32::MAX), but it is an unmasked panic path. Either switch to wrapping_sub to match Spark, or document that release builds wrap (they do) and add a debug-assert pre-check.
Surfaced by the audit-comet-expression skill run in #4448.
Tracking issue for follow-up work surfaced by the date/time expression audit in #4448. Each item below is either a support-level / serde consistency fix that the audit deliberately deferred (because it needs a semantics decision rather than a mechanical edit), or a divergence the audit documented but did not file separately. Findings already covered by an issue filed during the PR (
#4449ANSInext_day,#4450next_daytrim,#4451ANSImake_date,#3180Hour/Minute/SecondTimestampNTZ,#2013legacy timezone forms,#2649non-UTCdate_trunc,#16594/#16577from_unixtimeupstream) are intentionally excluded.High priority
1.
CometUnixTimestamp:getUnsupportedReasonsandisSupportedInputTypedisagree onTimestampNTZTypespark/src/main/scala/org/apache/comet/serde/datetime.scala:309-362.getUnsupportedReasonsclaimsTimestampNTZType is not supported because Comet incorrectly applies timezone conversion to TimestampNTZ values, butisSupportedInputTypereturnstrueforTimestampNTZTypeandgetSupportLevelreportsCompatible()for it. Either the predicate or the reason is wrong. Per the audit-comet-expression skill, the EXPLAIN-time reason must match the runtime behaviour: if the native path really does apply session-timezone conversion toTIMESTAMP_NTZ(as the documented divergence onunix_timestampinspark_expressions_support.mdsuggests), the support level for that input type should beIncompatible(Some(...))and cross-reference #3180; if it does not, the reason text should be removed.2.
CometDateFormat: gate theCompatibledecision ingetSupportLevel, notconvertspark/src/main/scala/org/apache/comet/serde/datetime.scala:609-700.getSupportLevelreturnsCompatible()unconditionally, whileconvertreadsCometConf.isExprAllowIncompat(...)and branches on UTC vs non-UTC session timezone, on whether the format is a whitelisted literal, and on whether the codegen dispatcher is enabled. The audit-comet-expression skill (rule 10) requires expression-shape restrictions to be declared ingetSupportLevelso EXPLAIN, the auto-generated compatibility doc, and the dispatcher can all see them. Moving this gating changes the dispatch flow slightly (non-UTC + whitelisted format becomesIncompatiblerather than silently routed through the codegen dispatcher) and needs a separate review.3. Group A codegen-dispatched datetime expressions report
Compatiblewhile silently falling back whenspark.comet.exec.scalaUDF.codegen.enabled=falseFor the codegen-dispatched datetime expressions (notably
to_unix_timestampanddate_formatper the documented divergence indocs/source/contributor-guide/spark_expressions_support.md),getSupportLevelreturnsCompatible()whileconvertreturnsNonewhen the Scala UDF codegen dispatcher is disabled. The dispatcher flag is not surfaced in EXPLAIN or in the auto-generated compatibility guide, so users seeCompatiblewhile the operator actually falls back to Spark. Per the audit skill, this should either be reported asIncompatible(Some(\"requires spark.comet.exec.scalaUDF.codegen.enabled=true\"))or the dispatcher requirement should be surfaced ingetCompatibleNotes.4.
CometMakeTimestampdoes not honourspark.sql.ansi.enabledMakeTimestamphonoursspark.sql.ansi.enabled(throws on invalid(year, month, day, hour, min, sec[, timezone]), else NULL) in Spark 3.4 / 3.5 / 4.0. The Comet wiring does not checkfailOnError, so under ANSI mode Comet returns NULL where Spark would throwansiDateTimeArgumentOutOfRange/invalidFractionOfSecondError. Same shape as themake_dateANSI divergence captured in #4451 /make_date_ansi.sql. Capture an analogousmake_timestamp_ansi.sqlregression test once a fix lands, and gate the support level appropriately in the meantime.Medium priority
5. Spark 4.0
StringTypeWithCollationnot gated for datetime expressionsThe audit documents (in
docs/source/contributor-guide/spark_expressions_support.md) that Spark 4.0 widens many string-typedinputTypeson datetime expressions (convert_timezone,date_format,date_trunc,from_unixtime,make_timestamp,next_day,to_unix_timestamp,trunc,unix_timestamp) toStringTypeWithCollation(supportsTrimCollation = true). Per the audit skill (rule 11), non-default collations should flip the support level toIncompatible(Some(...)). Today the serdes accept those inputs without distinguishing collation, so non-default collations are silently treated as compatible. Cross-references #2190 and #4496.6.
CometDateDiffnon-wrappingi32 -panics in debug builds on extreme inputsDocumented divergence (line 30 of the PR diff): the native
date_diffimpl subtracts withi32 -, which Rust panics on in debug builds when the result overflows; Spark wraps. Practically unreachable forDateTypeinputs (the valid range is far smaller thani32::MAX), but it is an unmasked panic path. Either switch towrapping_subto match Spark, or document that release builds wrap (they do) and add a debug-assert pre-check.Surfaced by the
audit-comet-expressionskill run in #4448.