build: change default Maven profile to Spark 4.1 / Scala 2.13 by andygrove · Pull Request #4140 · apache/datafusion-comet

andygrove · 2026-04-29T00:30:09Z

Summary

Changes the default Maven build profile from Spark 3.5 / Scala 2.12 to Spark 4.1 / Scala 2.13
Updates spark-3.4 and spark-3.5 profiles to explicitly set scala.binary.version=2.12, shims.majorVerSrc=spark-3.x, and semanticdb.version=4.8.8 (previously inherited from defaults)
Populates the scala-2.12 profile with explicit properties and empties scala-2.13 (now matches defaults)
Removes FIXME comment from spark-4.0 profile
Updates Dockerfile and Docker publish workflow to Spark 4.1 / Scala 2.13 / JDK 17
Updates all user guide and contributor guide documentation to reflect the new defaults

Which issue does this PR close?

N/A

Test plan

Existing tests run across all Spark versions

Update the default build configuration from Spark 3.5 / Scala 2.12 to Spark 4.0 / Scala 2.13. The spark-3.4 and spark-3.5 profiles now explicitly set scala.binary.version, shims.majorVerSrc, and semanticdb.version since those defaults have changed. The scala-2.12 profile is populated and scala-2.13 is now empty (matching defaults). Also updates Dockerfile, Docker publish workflow, and all documentation to reflect the new defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…K to 17 The scala-2.13 profile must retain its properties so that `-Pspark-3.x -Pscala-2.13` correctly overrides the Spark profile's scala.binary.version=2.12. Without this, Iceberg CI builds produce _2.12 artifacts when _2.13 is expected. The TPC-DS/TPC-H verification jobs used JDK 11 with no explicit Spark profile, so they now inherit the Spark 4.0 default which requires JDK 17. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The spark/pom.xml Iceberg dependency profiles use activeByDefault to provide the right Iceberg version when no -Pspark-* is passed. Since the default is now Spark 4.0, the activeByDefault must be on the spark-4.0 profile (Iceberg 1.10.0) rather than spark-3.5 (Iceberg 1.8.1), otherwise Maven resolves the non-existent artifact iceberg-spark-runtime-4.0_2.13:1.8.1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Updates root pom.xml defaults to Spark 4.1.1, Scala 2.13.17, Parquet 1.16.0, slf4j 2.0.17, and shim sources to spark-4.1. Moves activeByDefault from spark-4.0 to spark-4.1 in spark/pom.xml. Bumps kube/Dockerfile to apache/spark:4.1.1, the docker-publish image tag, and all docs and example commands that the previous default-bump PR had updated to spark4.0_2.13.

coderfender · 2026-05-06T15:15:35Z

Seems like the benchmark queries failed : q60 *** FAILED ***

andygrove · 2026-05-06T15:37:04Z

Seems like the benchmark queries failed : q60 *** FAILED ***

Ah, these queries contain union so we need to merge #4207 first

comphead

Thanks @andygrove mostly LGTM
pending CI

The TPC-H data generation step launches Spark via `mvnw exec:java`, which falls outside surefire's argLine, so the --add-opens flags from pom.xml do not apply. With this PR bumping these jobs from JDK 11 to JDK 17, GenTPCHData hangs in shuffle when Kryo reflectively probes java.nio.ByteBuffer.hb. Set JAVA_TOOL_OPTIONS at the job level so both exec:java and surefire forks get the required flags.

…DS jobs The previous flag set only opened java.lang and java.nio, but with the default profile now Spark 4.1 / Scala 2.13 the TPC-H GenTPCHData run trips on SerializedLambda.capturingClass and dies with InaccessibleObjectException for java.lang.invoke. Mirror the full list from pom.xml's extraJavaTestArgs so any other reflection-driven access into java.base also has the matching --add-opens grant.

andygrove and others added 3 commits April 28, 2026 18:29

andygrove marked this pull request as ready for review April 29, 2026 13:19

Merge branch 'main' into default-spark-4.0

6235590

andygrove added this to the 0.16.0 milestone May 6, 2026

andygrove changed the title ~~build: change default Maven profile to Spark 4.0 / Scala 2.13~~ build: change default Maven profile to Spark 4.1 / Scala 2.13 May 6, 2026

andygrove requested review from comphead, mbutrovich and parthchandra May 6, 2026 15:00

comphead reviewed May 6, 2026

View reviewed changes

Comment thread docs/source/contributor-guide/benchmarking_spark_sql_perf.md

comphead reviewed May 6, 2026

View reviewed changes

Comment thread docs/source/user-guide/latest/source.md

comphead approved these changes May 6, 2026

View reviewed changes

andygrove moved this to In progress in Comet Development May 6, 2026

andygrove added this to Comet Development May 6, 2026

andygrove removed this from Comet Development May 6, 2026

andygrove added 2 commits May 6, 2026 14:08

Merge branch 'main' into default-spark-4.0

3271879

andygrove added this to Comet Development May 6, 2026

github-project-automation Bot moved this to Todo in Comet Development May 6, 2026

andygrove moved this from Todo to In progress in Comet Development May 6, 2026

andygrove merged commit 9feb58c into apache:main May 7, 2026
140 checks passed

github-project-automation Bot moved this from In progress to Done in Comet Development May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: change default Maven profile to Spark 4.1 / Scala 2.13#4140

build: change default Maven profile to Spark 4.1 / Scala 2.13#4140
andygrove merged 8 commits intoapache:mainfrom
andygrove:default-spark-4.0

andygrove commented Apr 29, 2026 •

edited

Loading

Uh oh!

coderfender commented May 6, 2026

Uh oh!

andygrove commented May 6, 2026

Uh oh!

Uh oh!

Uh oh!

comphead left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andygrove commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Which issue does this PR close?

Test plan

Uh oh!

coderfender commented May 6, 2026

Uh oh!

andygrove commented May 6, 2026

Uh oh!

Uh oh!

Uh oh!

comphead left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andygrove commented Apr 29, 2026 •

edited

Loading

comphead left a comment •

edited

Loading