chore: remove dead vector classes left over from native_iceberg_compat removal#4416
Open
andygrove wants to merge 1 commit into
Open
chore: remove dead vector classes left over from native_iceberg_compat removal#4416andygrove wants to merge 1 commit into
andygrove wants to merge 1 commit into
Conversation
…t removal CometSelectionVector and CometDelegateVector became unreachable when the native_iceberg_compat JVM Parquet reader was removed (7203fb5, 0fd52b7). Nothing on the live execution path constructs them, so the matching JNI methods, codegen branches, and vector-mutation API are also dead. - Delete CometSelectionVector and CometDelegateVector - Remove hasSelectionVectors/exportSelectionIndices JNI methods and the get_selection_indices branch in the native ScanExec::get_next path - Remove the now-orphaned setNumNulls/setNumValues API from CometVector, CometDecodedVector, and CometPlainVector - Remove the isReused/setReused field and accessors on CometPlainVector (no constructor sets it to true and no caller invokes setReused), and simplify the SPARK-50235 cleanup loop in CometColumnarToRowExec to always close per-batch vectors - Add class-level documentation to the remaining vector classes explaining why a custom hierarchy is needed (Arrow shading, FFI lifecycle, lazy dictionary decoding)
1398407 to
050221b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
CometSelectionVectorandCometDelegateVectorbecame unreachable when thenative_iceberg_compatJVM Parquet reader was removed (commits 7203fb5 and 0fd52b7). The Iceberg row-deletion path that produced selection vectors is now handled natively viaiceberg-rustinCometIcebergNativeScanExec, andCometDelegateVectorhad zero callers or subclasses.With no live creator for these classes, several other surfaces became dead:
hasSelectionVectors/exportSelectionIndicesJNI methods onCometBatchIteratorand their callers inScanExec::get_nextalways took the no-op path.CometSelectionVectorcase inNativeUtil.exportBatchand the orphanexportSingleVectorhelper were unreachable.setNumNulls/setNumValuesmutation API onCometVector/CometDecodedVector/CometPlainVectorhad its only callers inside the two deleted classes.isReusedfield onCometPlainVectoris never set totrue(no constructor passes it, no caller ofsetReused), so theif (!cometPlainColumnVector.isReused())branch in theCometColumnarToRowExecSPARK-50235 cleanup loop always executed.Removing these tightens the API surface and makes it clearer that
CometVectorinstances are immutable after construction in the live native execution path.What changes are included in this PR?
CometSelectionVector.javaandCometDelegateVector.java.hasSelectionVectors/exportSelectionIndicesfromCometBatchIterator, the matching JNI bindings innative/jni-bridge/src/batch_iterator.rs, and theget_selection_indicesbranch innative/core/src/execution/operators/scan.rs.CometSelectionVectorcase and the now-orphanexportSingleVectorfromNativeUtil.scala.setNumNulls/setNumValuesfromCometVector,CometDecodedVector, andCometPlainVector. Mark the now-finalnumNulls/numValues/hasNullfields onCometDecodedVectorfinal.isReused/setReused(and the 4-arg constructor) fromCometPlainVector, and simplify the SPARK-50235 cleanup loop inCometColumnarToRowExecto always close per-batch vectors that are neitherWritableColumnVectornorConstantColumnVector.CometVector,CometDecodedVector,CometPlainVector,CometDictionaryVector,CometDictionary,CometListVector,CometMapVector,CometStructVector) explaining why a custom hierarchy is needed (shaded Arrow vs. Spark'sArrowColumnVector, FFI lifecycle, dictionary lazy decoding).How are these changes tested?
Existing tests cover the affected paths:
CometExecSuite"TopK operator should return correct results on dictionary column with nulls" exercisesCometDictionaryVectorend-to-end.CometExecSuite"CometBroadcastExchange could be converted to rows using CometColumnarToRow" exercises the simplifiedCometColumnarToRowExeccleanup loop.CometExecSuite"project + filter on arrays" exercisesCometListVectorthrough native execution.TestColumnReaderconstructsCometPlainVectordirectly via the surviving 2-arg constructor.All four pass locally. Native
cargo buildandcargo clippy --all-targets --workspace -- -D warningsare clean.