Added UDT for IP and Binary support#5463
Conversation
PR Reviewer Guide 🔍(Review updated until commit 0918d6b)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 0918d6b Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit b7fd95b
Suggestions up to commit 540074b
|
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
540074b to
b7fd95b
Compare
|
Persistent review updated to latest commit b7fd95b |
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
|
Persistent review updated to latest commit 0918d6b |
Description
Companion change for opensearch-project/OpenSearch#21807 which adds Calcite UDTs (
IpType,BinaryType) for OpenSearchipandbinarycolumns to the analytics-engine path. This PR teaches the SQL plugin to recognize the new UDTs at the response-schema boundary so:byte[]cells from the analytics backend get rendered as canonical IP strings / base64 strings (was:unsupported object class [B)."type": "ip"/"type": "binary"correctly (was: both labelled"binary").CIDRMATCHquery shapes keep working unchanged — the UDT is invisible to Calcite's planner-internal coercion machinery.Background
OpenSearchSchemaBuilder(in the OpenSearch sandbox) used to map bothipandbinaryfield types to plainSqlTypeName.VARBINARY. That collapse caused three downstream bugs in this plugin's PPL path through the analytics engine:AnalyticsExecutionEngine.convertRowscallsExprValueUtils.fromObjectValue(byte[])which has nobyte[]case →unsupported object class [B."type": "binary"for an IP column (becauseconvertRelDataTypeToExprType(VARBINARY) → BINARYExprType).PPLTypeChecker.family(SqlTypeFamily.BINARY, SqlTypeFamily.STRING), with the byte-range expansion living inline at the registration site.The companion OpenSearch PR introduces
IpType/BinaryType(Calcite UDTs that extendAbstractSqlTypewithVARBINARYunderneath) and moves CIDRMATCH dispatch into a backendScalarFunctionAdapter.Approach
Two response-boundary changes plus one cleanup, all minimal:
1. Result-column UDT recognition (
OpenSearchTypeFactory).New sibling function
convertAnalyticsEngineRelDataTypeToExprTypedoes aninstanceofdispatch:The original
convertRelDataTypeToExprTypeis deliberately unchanged — Calcite's coercion machinery round-trips through it, so returningIPExprType for a VARBINARY column synthesizesIP(string)casts that DataFusion can't resolve. Keeping UDT recognition in a sibling function isolates it to the response-schema path.2. Per-column UDT dispatch in row conversion (
AnalyticsExecutionEngine.convertRows).New static helper
toExprValue(Object value, RelDataType type):byte[]+IpType→InetAddresses.toAddrString(InetAddress.getByAddress(bytes))(matchesIpFieldMapper.valueFetchersemantics: dotted-quad for IPv4 / IPv4-mapped, RFC 5952 for pure IPv6).byte[]+BinaryType→Base64.getEncoder().encodeToString(bytes)(matches the OpenSearchbinaryfield wire contract).ExprValueUtils.fromObjectValue(value)unchanged.UnknownHostExceptionfrom a malformed IP buffer falls through to the default handler so the user sees a clear error rather than a malformed address string.3. CIDRMATCH cleanup (
PPLFuncImpTable).udf/ip/CidrMatchAdapterand its inline registration. CIDRMATCH dispatch now lives inCidrMatchFunctionAdapteron the backend (in the companion OpenSearch PR), which means it serves both the production SQL-plugin variant and the sandbox test front-end with a single implementation.withRexBuilderShim) to a single typecheck-onlyregisterOperator(CIDRMATCH, PPLBuiltinOperators.CIDRMATCH).CidrMatchFunctionUDF stays as the dynamic last-resort fallback.Testing
Unit tests added:
OpenSearchTypeFactoryTest:testConvertResultColumnIpTypeReturnsIpExprType—IpType→ExprCoreType.IP.testConvertResultColumnBinaryTypeReturnsBinaryExprType—BinaryType→ExprCoreType.BINARY.testConvertResultColumnPlainVarbinaryFallsBackToBinary— plainVARBINARY(no UDT) keeps returningBINARY.testConvertResultColumnDelegatesParityForNonUdtTypes— for every non-UDTRelDataType, the result-column variant must agree with the planner-internal variant. Drift here would mean response-schema labels diverge from what Calcite's coercion sees.AnalyticsExecutionEngineTest:executeRelNode_ipColumnRendersAsAddressString— both ipv4-mapped IPv6 (::ffff:1.2.3.4→"1.2.3.4") and pure IPv6 (::1) buffers render to the right canonical form, schema reports"ip".executeRelNode_binaryColumnRendersAsBase64—byte[]payload base64-encodes to match OpenSearch wire format, schema reports"binary".Existing tests still pass.
[B-class regression caught at:physicalPlanExecute_callsOnFailure(already in the file, exercises the same converter dispatch).End-to-end (against a single-node
gradle runcluster with the companion OpenSearch PR applied):cast(host as STRING)returns"1.2.3.4"/"::1"(was a 16-byte garbage buffer).cast(blob as STRING)matches| fields blob(base64-encoded).where host = '1.2.3.4'coerces cleanly with noUnable to convert call IP(string)regression.cidrmatch(host, '1.2.3.0/24')(column form) andCIDRMATCH('1.2.3.4', '1.2.3.0/24')(literal form) both return correct row counts via the backend adapter."type": "ip"foripcolumns,"type": "binary"forbinarycolumns.