Skip to content

chore: remove dead native_iceberg_compat code path#4363

Merged
andygrove merged 22 commits into
apache:mainfrom
andygrove:followup/remove-scan-impl-constants
May 19, 2026
Merged

chore: remove dead native_iceberg_compat code path#4363
andygrove merged 22 commits into
apache:mainfrom
andygrove:followup/remove-scan-impl-constants

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented May 19, 2026

Which issue does this PR close?

Part of #4020.

Rationale for this change

With #4019 already restricting V1 Parquet scans to native_datafusion, the rest of the native_iceberg_compat machinery is dead: the scan-impl constants/config, the JVM-mediated V1 read path (CometScanWrapper + CometParquetFileFormat + JVM column readers), the column-level Native JNI methods, the corresponding Rust modules, and the @IcebergApi surface (Iceberg no longer integrates against Comet).

What changes are included in this PR?

  • Config / constants: drop COMET_NATIVE_SCAN_IMPL, SCAN_NATIVE_DATAFUSION, SCAN_NATIVE_ICEBERG_COMPAT, SCAN_AUTO, the scanImpl parameter, and all test/benchmark withSQLConf usages.
  • JVM read path: CometExecRule V1 fallback now reverts to wrapped FileSourceScanExec instead of CometScanWrapper; CometScanExec becomes a planning-only intermediate (execution methods throw); file-format swap to CometParquetFileFormat removed.
  • Deleted Scala/Java: CometParquetFileFormat, the JVM Parquet reader chain (NativeBatchReader, NativeColumnReader, ColumnReader, RowGroupReader, FileReader, plus all page/index/bloom/options/utility classes), IcebergCometNativeBatchReader, CometLazyVector, IcebergApi, CometSchemaImporter, AbstractCometSchemaImporter. Native.java trimmed to the Arrow-native scan APIs; @IcebergApi stripped from CometVector.
  • Deleted Rust: column-reader JNI methods (initColumnReader, setPageV1/V2, setDictionaryPage, readBatch, skipBatch, currentBatch, closeColumnReader, resetBatch); the parquet::{read, mutable_vector, data_type} modules; parquet::util::{bit_packing, buffer, memory, test_common}; the entire common/ module; the parquet_read / bit_util / parquet_decode benches.
  • Tests: drop orphaned TestFileReader, TestUtils, TestCometInputFile; drop the closeColumnReader NPE assertion in CometNativeSuite.

Docs cleanup is in #4362.

Follow-ups: merging CometScanExec into CometNativeScanExec (still kept as a planning intermediate).

How are these changes tested?

./mvnw -pl spark -am test-compile -Pspark-3.5 and cargo test --no-run --all both pass. CI will run on this branch.

andygrove added 14 commits May 11, 2026 12:39
COMET_NATIVE_SCAN_IMPL config now only allows auto or native_datafusion.
CometScanRule always uses native_datafusion. Tests no longer parameterize
on scan impl and iceberg_compat-specific tests are removed.

Golden files are regenerated in a follow-up commit.
Plan-stability suites no longer parameterize on scan impl, so each query
has a single golden directory. native_datafusion dirs are renamed to the
unsuffixed name, and native_iceberg_compat dirs are removed.
# Conflicts:
#	.github/workflows/pr_build_linux.yml
#	.github/workflows/spark_sql_test.yml
# Conflicts:
#	.github/workflows/spark_sql_test.yml
Removes the four deprecated symbols from CometConf along with all
references in main code, tests, and benchmarks:

- COMET_NATIVE_SCAN_IMPL (spark.comet.scan.impl)
- SCAN_NATIVE_DATAFUSION
- SCAN_NATIVE_ICEBERG_COMPAT
- SCAN_AUTO

With a single Parquet scan implementation, the scanImpl field on
CometScanExec is dropped, CometScanTypeChecker becomes parameterless,
and per-impl conditionals in CometScanRule and CometExecRule collapse.
…impl-constants

# Conflicts:
#	spark/src/main/scala/org/apache/comet/CometConf.scala
#	spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala
#	spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala
#	spark/src/test/scala/org/apache/comet/CometCsvExpressionSuite.scala
#	spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala
#	spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala
#	spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala
#	spark/src/test/scala/org/apache/comet/rules/CometScanRuleSuite.scala
#	spark/src/test/scala/org/apache/spark/sql/benchmark/CometBenchmarkBase.scala
#	spark/src/test/scala/org/apache/spark/sql/comet/ParquetEncryptionITCase.scala
Now that native_iceberg_compat is gone, the JVM-mediated V1 Parquet read
path (CometScanExec → CometScanWrapper → CometParquetFileFormat →
NativeBatchReader / column-level Native APIs) is unreachable. Iceberg
also no longer integrates against Comet's @IcebergApi surface.

- CometExecRule: V1 CometScanExec fallback now reverts to wrapped
  FileSourceScanExec (Spark) instead of CometScanWrapper.
- CometScanExec: keep as planning intermediate, throw on
  doExecute/doExecuteColumnar/executeCollect/inputRDDs.
- Drop file-format swap to CometParquetFileFormat in both
  CometScanExec.apply and CometNativeScanExec.apply.
- Delete CometParquetFileFormat and the JVM Parquet reader chain
  (NativeBatchReader, NativeColumnReader, ColumnReader, RowGroupReader,
  FileReader, plus all supporting page/index/options/utility classes).
- Delete IcebergCometNativeBatchReader and CometLazyVector.
- Trim Native.java to the Arrow-native scan APIs.
- Delete IcebergApi annotation, CometSchemaImporter,
  AbstractCometSchemaImporter; strip @IcebergApi from CometVector.
- Delete now-orphaned tests (TestFileReader, TestUtils,
  TestCometInputFile); drop closeColumnReader NPE assertion in
  CometNativeSuite.
The JVM-side Parquet column readers were deleted in the previous commit;
the Rust JNI methods and supporting modules they called are now dead.

- parquet/mod.rs: drop JNI methods initColumnReader, setDictionaryPage,
  setPageV1, setPageV2, resetBatch, readBatch, skipBatch, currentBatch,
  closeColumnReader, plus the Context/get_reader helpers.
- Delete parquet/read/ (column, levels, values, mod), mutable_vector.rs,
  data_type.rs, and the util sub-modules bit_packing, buffer, memory,
  test_common that only existed for the column readers.
- Trim parquet/util/jni.rs to deserialize_schema (the only helper used
  by the surviving Arrow-native scan path).
- Delete native/core/src/common/ entirely; bit.rs and buffer.rs were
  only consumed by the deleted parquet column-reader chain.
- Drop the parquet_read, bit_util, and parquet_decode benches that
  exercised the removed code; deregister them from Cargo.toml.
@andygrove andygrove changed the title chore: remove COMET_NATIVE_SCAN_IMPL and related scan-impl constants chore: remove dead native_iceberg_compat code path May 19, 2026
andygrove added 6 commits May 18, 2026 20:03
Updates the Spark test diffs to compile against the trimmed CometConf
API and the new 10-arg CometScanExec signature:

- Drop references to COMET_NATIVE_SCAN_IMPL, SCAN_NATIVE_DATAFUSION,
  SCAN_NATIVE_ICEBERG_COMPAT, and SCAN_AUTO in patched Spark sources.
- Fix the CometScanExec extractor pattern in SubquerySuite to use 10
  placeholders now that scanImpl is gone.
- Collapse the SQLTestUtils per-impl branching to a single check for
  IgnoreCometNativeDataFusion / IgnoreCometNativeScan, since
  native_datafusion is the only Parquet scan impl.
- Remove the unused IgnoreCometNativeIcebergCompat tag.
With native_datafusion as the only Parquet scan implementation, the
per-impl tag variants are equivalent to the plain IgnoreComet tag.

- Replace all IgnoreCometNativeDataFusion and IgnoreCometNativeScan
  call sites with IgnoreComet.
- Drop the corresponding case-class definitions from IgnoreComet.scala.
- Remove the now-redundant secondary check in SQLTestUtils; the
  existing IgnoreComet check handles all usages.
CometScanExec is a planning-only intermediate after the
native_iceberg_compat removal, so it never reaches the executed plan
inspected by ParquetRowIndexSuite. Its inputRDD lazy val is also gone,
breaking the sql_core/sql_hive test compile.
@andygrove andygrove marked this pull request as ready for review May 19, 2026 14:21
andygrove added 2 commits May 19, 2026 08:48
CometScanRule.validateObjectStoreConfig had no production caller; only
NativeConfigSuite invoked it. Drop the Scala wrapper, its configValidity
cache, the Native.java JNI declaration (which leaves the file empty),
the backing Rust JNI impl, and the tests that exercised the wrapper.
Keep the extractObjectStoreOptions tests since that utility is still
used by the native scan paths.
ParquetFilters and its helper SourceFilterSerde have no instantiation
sites left after the scan-impl cleanups; CometReaderThreadPool /
CometFileReaderThreadPool likewise has no callers. Drop all three.
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove CI pending

Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending CI. I will get #4201 into reviewable shape after this merges.

@andygrove andygrove merged commit 0fd52b7 into apache:main May 19, 2026
199 of 202 checks passed
@andygrove andygrove deleted the followup/remove-scan-impl-constants branch May 19, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants