Skip to content

chore: remove dead vector classes left over from native_iceberg_compat removal#4416

Open
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:worktree-remove-dead-vector-classes
Open

chore: remove dead vector classes left over from native_iceberg_compat removal#4416
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:worktree-remove-dead-vector-classes

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

CometSelectionVector and CometDelegateVector became unreachable when the native_iceberg_compat JVM Parquet reader was removed (commits 7203fb5 and 0fd52b7). The Iceberg row-deletion path that produced selection vectors is now handled natively via iceberg-rust in CometIcebergNativeScanExec, and CometDelegateVector had zero callers or subclasses.

With no live creator for these classes, several other surfaces became dead:

  • The hasSelectionVectors / exportSelectionIndices JNI methods on CometBatchIterator and their callers in ScanExec::get_next always took the no-op path.
  • The CometSelectionVector case in NativeUtil.exportBatch and the orphan exportSingleVector helper were unreachable.
  • The setNumNulls / setNumValues mutation API on CometVector / CometDecodedVector / CometPlainVector had its only callers inside the two deleted classes.
  • The isReused field on CometPlainVector is never set to true (no constructor passes it, no caller of setReused), so the if (!cometPlainColumnVector.isReused()) branch in the CometColumnarToRowExec SPARK-50235 cleanup loop always executed.

Removing these tightens the API surface and makes it clearer that CometVector instances are immutable after construction in the live native execution path.

What changes are included in this PR?

  • Delete CometSelectionVector.java and CometDelegateVector.java.
  • Remove hasSelectionVectors / exportSelectionIndices from CometBatchIterator, the matching JNI bindings in native/jni-bridge/src/batch_iterator.rs, and the get_selection_indices branch in native/core/src/execution/operators/scan.rs.
  • Remove the CometSelectionVector case and the now-orphan exportSingleVector from NativeUtil.scala.
  • Remove setNumNulls / setNumValues from CometVector, CometDecodedVector, and CometPlainVector. Mark the now-final numNulls / numValues / hasNull fields on CometDecodedVector final.
  • Remove isReused / setReused (and the 4-arg constructor) from CometPlainVector, and simplify the SPARK-50235 cleanup loop in CometColumnarToRowExec to always close per-batch vectors that are neither WritableColumnVector nor ConstantColumnVector.
  • Add class-level Javadoc to the remaining vector classes (CometVector, CometDecodedVector, CometPlainVector, CometDictionaryVector, CometDictionary, CometListVector, CometMapVector, CometStructVector) explaining why a custom hierarchy is needed (shaded Arrow vs. Spark's ArrowColumnVector, FFI lifecycle, dictionary lazy decoding).

How are these changes tested?

Existing tests cover the affected paths:

  • CometExecSuite "TopK operator should return correct results on dictionary column with nulls" exercises CometDictionaryVector end-to-end.
  • CometExecSuite "CometBroadcastExchange could be converted to rows using CometColumnarToRow" exercises the simplified CometColumnarToRowExec cleanup loop.
  • CometExecSuite "project + filter on arrays" exercises CometListVector through native execution.
  • TestColumnReader constructs CometPlainVector directly via the surviving 2-arg constructor.

All four pass locally. Native cargo build and cargo clippy --all-targets --workspace -- -D warnings are clean.

…t removal

CometSelectionVector and CometDelegateVector became unreachable when the
native_iceberg_compat JVM Parquet reader was removed (7203fb5, 0fd52b7).
Nothing on the live execution path constructs them, so the matching JNI
methods, codegen branches, and vector-mutation API are also dead.

- Delete CometSelectionVector and CometDelegateVector
- Remove hasSelectionVectors/exportSelectionIndices JNI methods and the
  get_selection_indices branch in the native ScanExec::get_next path
- Remove the now-orphaned setNumNulls/setNumValues API from CometVector,
  CometDecodedVector, and CometPlainVector
- Remove the isReused/setReused field and accessors on CometPlainVector
  (no constructor sets it to true and no caller invokes setReused), and
  simplify the SPARK-50235 cleanup loop in CometColumnarToRowExec to
  always close per-batch vectors
- Add class-level documentation to the remaining vector classes explaining
  why a custom hierarchy is needed (Arrow shading, FFI lifecycle, lazy
  dictionary decoding)
@andygrove andygrove force-pushed the worktree-remove-dead-vector-classes branch from 1398407 to 050221b Compare May 23, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant