Skip to content

build: change default Maven profile to Spark 4.1 / Scala 2.13#4140

Merged
andygrove merged 8 commits intoapache:mainfrom
andygrove:default-spark-4.0
May 7, 2026
Merged

build: change default Maven profile to Spark 4.1 / Scala 2.13#4140
andygrove merged 8 commits intoapache:mainfrom
andygrove:default-spark-4.0

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 29, 2026

Summary

  • Changes the default Maven build profile from Spark 3.5 / Scala 2.12 to Spark 4.1 / Scala 2.13
  • Updates spark-3.4 and spark-3.5 profiles to explicitly set scala.binary.version=2.12, shims.majorVerSrc=spark-3.x, and semanticdb.version=4.8.8 (previously inherited from defaults)
  • Populates the scala-2.12 profile with explicit properties and empties scala-2.13 (now matches defaults)
  • Removes FIXME comment from spark-4.0 profile
  • Updates Dockerfile and Docker publish workflow to Spark 4.1 / Scala 2.13 / JDK 17
  • Updates all user guide and contributor guide documentation to reflect the new defaults

Which issue does this PR close?

N/A

Test plan

Existing tests run across all Spark versions

andygrove and others added 3 commits April 28, 2026 18:29
Update the default build configuration from Spark 3.5 / Scala 2.12 to
Spark 4.0 / Scala 2.13. The spark-3.4 and spark-3.5 profiles now
explicitly set scala.binary.version, shims.majorVerSrc, and
semanticdb.version since those defaults have changed. The scala-2.12
profile is populated and scala-2.13 is now empty (matching defaults).

Also updates Dockerfile, Docker publish workflow, and all documentation
to reflect the new defaults.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…K to 17

The scala-2.13 profile must retain its properties so that
`-Pspark-3.x -Pscala-2.13` correctly overrides the Spark profile's
scala.binary.version=2.12. Without this, Iceberg CI builds produce
_2.12 artifacts when _2.13 is expected.

The TPC-DS/TPC-H verification jobs used JDK 11 with no explicit Spark
profile, so they now inherit the Spark 4.0 default which requires
JDK 17.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The spark/pom.xml Iceberg dependency profiles use activeByDefault to
provide the right Iceberg version when no -Pspark-* is passed. Since
the default is now Spark 4.0, the activeByDefault must be on the
spark-4.0 profile (Iceberg 1.10.0) rather than spark-3.5 (Iceberg
1.8.1), otherwise Maven resolves the non-existent artifact
iceberg-spark-runtime-4.0_2.13:1.8.1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@andygrove andygrove marked this pull request as ready for review April 29, 2026 13:19
@andygrove andygrove added this to the 0.16.0 milestone May 6, 2026
Updates root pom.xml defaults to Spark 4.1.1, Scala 2.13.17, Parquet
1.16.0, slf4j 2.0.17, and shim sources to spark-4.1. Moves
activeByDefault from spark-4.0 to spark-4.1 in spark/pom.xml. Bumps
kube/Dockerfile to apache/spark:4.1.1, the docker-publish image tag,
and all docs and example commands that the previous default-bump PR
had updated to spark4.0_2.13.
@andygrove andygrove changed the title build: change default Maven profile to Spark 4.0 / Scala 2.13 build: change default Maven profile to Spark 4.1 / Scala 2.13 May 6, 2026
@coderfender
Copy link
Copy Markdown
Contributor

Seems like the benchmark queries failed : q60 *** FAILED ***

@andygrove
Copy link
Copy Markdown
Member Author

Seems like the benchmark queries failed : q60 *** FAILED ***

Ah, these queries contain union so we need to merge #4207 first

Comment thread docs/source/contributor-guide/benchmarking_spark_sql_perf.md
Comment thread docs/source/user-guide/latest/source.md
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove mostly LGTM
pending CI

andygrove added 2 commits May 6, 2026 14:08
The TPC-H data generation step launches Spark via `mvnw exec:java`,
which falls outside surefire's argLine, so the --add-opens flags from
pom.xml do not apply. With this PR bumping these jobs from JDK 11 to
JDK 17, GenTPCHData hangs in shuffle when Kryo reflectively probes
java.nio.ByteBuffer.hb. Set JAVA_TOOL_OPTIONS at the job level so
both exec:java and surefire forks get the required flags.
@andygrove andygrove moved this from Todo to In progress in Comet Development May 6, 2026
…DS jobs

The previous flag set only opened java.lang and java.nio, but with the
default profile now Spark 4.1 / Scala 2.13 the TPC-H GenTPCHData run
trips on SerializedLambda.capturingClass and dies with
InaccessibleObjectException for java.lang.invoke. Mirror the full list
from pom.xml's extraJavaTestArgs so any other reflection-driven access
into java.base also has the matching --add-opens grant.
@andygrove andygrove merged commit 9feb58c into apache:main May 7, 2026
140 checks passed
@github-project-automation github-project-automation Bot moved this from In progress to Done in Comet Development May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants