Make record extractors self-contained; drop BaseRecordExtractor.convert dispatcher#18436
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18436 +/- ##
============================================
+ Coverage 63.57% 63.58% +0.01%
- Complexity 1717 1729 +12
============================================
Files 3252 3252
Lines 199132 199111 -21
Branches 30875 30874 -1
============================================
+ Hits 126595 126603 +8
+ Misses 62461 62427 -34
- Partials 10076 10081 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR refactors Pinot’s record-extractor layer to remove the shared BaseRecordExtractor.convert(...) dispatcher and related helpers, making each input-format extractor responsible for its own type/shape conversion while keeping BaseRecordExtractor focused on include-list resolution (init) and shared map-key stringification (stringifyMapKey).
Changes:
- Removed the conversion dispatcher and helper methods from
BaseRecordExtractor, retaining onlyinit/include-list handling andstringifyMapKey. - Inlined conversion logic into
JSONRecordExtractor,CLPLogRecordExtractor, andThriftRecordExtractor. - Updated tests to reflect the new ownership boundaries (base tests slimmed; format tests updated) and aligned
CSVRecordExtractor.convertnullability with its behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-spi/src/test/java/org/apache/pinot/spi/data/readers/BaseRecordExtractorTest.java | Slimmed test coverage to init include-list behavior and stringifyMapKey. |
| pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/BaseRecordExtractor.java | Removed shared conversion dispatcher; keeps include-list resolution and stringifyMapKey. |
| pinot-plugins/pinot-input-format/pinot-thrift/src/test/java/org/apache/pinot/plugin/inputformat/thrift/ThriftRecordExtractorTest.java | Updated commentary to match Thrift extractor’s new self-contained conversion path. |
| pinot-plugins/pinot-input-format/pinot-thrift/src/main/java/org/apache/pinot/plugin/inputformat/thrift/ThriftRecordExtractor.java | Added Thrift-local conversion for collections/maps/nested structs and scalar normalization. |
| pinot-plugins/pinot-input-format/pinot-json/src/test/java/org/apache/pinot/plugin/inputformat/json/JSONRecordExtractorTest.java | Updated commentary to reflect extractor-owned (not base-owned) BigInteger widening. |
| pinot-plugins/pinot-input-format/pinot-json/src/main/java/org/apache/pinot/plugin/inputformat/json/JSONRecordExtractor.java | Added JSON-local conversion for BigInteger/List/Map recursion. |
| pinot-plugins/pinot-input-format/pinot-csv/src/main/java/org/apache/pinot/plugin/inputformat/csv/CSVRecordExtractor.java | Marked convert as @Nullable and simplified empty/null check via StringUtils.isEmpty. |
| pinot-plugins/pinot-input-format/pinot-clp-log/src/main/java/org/apache/pinot/plugin/inputformat/clplog/CLPLogRecordExtractor.java | Added JSON-like local conversion for non-CLP-encoded fields and refreshed class documentation. |
xiangfu0
left a comment
There was a problem hiding this comment.
Found one high-signal SPI compatibility issue; see inline comment.
ccd1a3a to
8cd9073
Compare
…rt dispatcher Remove the shared `BaseRecordExtractor.convert` dispatcher and its helpers (`convertSingleValue` / `convertMultiValue` / `convertCollection` / `convertArray` / `convertPrimitiveArray` / `convertMap` / `convertRecord` / `isMultiValue` / `isMap` / `isRecord`). Each format extractor now owns its own conversion logic. Inline format-specific `convert` (single-value, list, map, record dispatch order) in `JSONRecordExtractor`, `CLPLogRecordExtractor`, `ThriftRecordExtractor`. The other extractors (Avro / Parquet native + avro / Arrow / ORC / ProtoBuf / CSV) were already self-contained. `BaseRecordExtractor` now provides only: - `init` / `_fields` / `_extractAll` / `initConfig` — include-list resolution - `stringifyMapKey` — shared map-key contract serialization `@Nullable` annotations on the per-format `convert` methods now reflect each implementation's actual null-return behavior (none of the inlined ones return null; CSV's gains `@Nullable` since empty input → null).
8cd9073 to
113244b
Compare
|
Opened a follow-up docs PR for this change: pinot-contrib/pinot-docs#803 |
## Summary - update the custom record-reader plugin guide so it no longer says BaseRecordExtractor provides the shared value-conversion dispatcher - describe BaseRecordExtractor as shared scaffolding for include-list handling and map-key stringification - restate the current RecordExtractor contract for custom extractor authors, cross-checked against apache/pinot#18436 ## Validation - git diff --check - relative-link-check on edited pages - cross-checked against apache/pinot source for #18436
Summary
Remove the shared
BaseRecordExtractor.convertdispatcher and its helpers (convertSingleValue,convertMultiValue,convertCollection,convertArray,convertPrimitiveArray,convertMap,convertRecord,isMultiValue,isMap,isRecord). Each format extractor now owns its conversion logic.BaseRecordExtractorkeeps only:init/_fields/_extractAll/initConfig— include-list resolutionstringifyMapKey— shared map-key contract serializationWhat changed per extractor
JSONRecordExtractorsuper.convert(...)convert/convertList/convertMap(BigInteger → BigDecimal;List→Object[];Map→Map<String, Object>; scalar pass-through)CLPLogRecordExtractorsuper.convert(...)ThriftRecordExtractorsuper.convert(...); overrodeisRecord/convertRecordconvert/convertSingleValue/convertCollection/convertMap/convertRecordOther adjustments
convertmethods drop the inherited@Nullablewhere the new impl never returns null (JSON / CLPLog / Thrift).CSVRecordExtractor.convertgains@Nullableto match its existing null-on-empty behavior.BaseRecordExtractorTestslimmed to cover the surviving surface (init,stringifyMapKey); the dispatcher tests are exercised through the per-format extractor tests.🤖 Generated with Claude Code