Skip to content

feat: add optional JSON functions crate with json_get_str scaffolding#21353

Open
crm26 wants to merge 1 commit intoapache:mainfrom
crm26:json-functions-scaffolding
Open

feat: add optional JSON functions crate with json_get_str scaffolding#21353
crm26 wants to merge 1 commit intoapache:mainfrom
crm26:json-functions-scaffolding

Conversation

@crm26
Copy link
Copy Markdown

@crm26 crm26 commented Apr 4, 2026

Summary

Adds a new datafusion-functions-json crate to the workspace with json_get_str as the initial function, establishing the integration pattern for bringing JSON functions into core DataFusion.

Per @alamb's suggestion in #21301:

Perhaps we could make it smaller -- like one that implements scaffolding / new crate / etc and one function. And then we can add the other functions as follow on PRs

What's included

  • New crate datafusion/functions-json/ with register_all() hook
  • json_get_str(json, *keys) -> str — extracts a string value from a JSON string at the given path (supports nested keys and array indices)
  • Optional feature flag json_expressions on the core crate (not enabled by default)
  • Handles Utf8, LargeUtf8, and Utf8View string types
  • Minimum 2 args enforced at planning time via coerce_types
  • sqllogictest coverage (json_functions.slt)
  • 9 unit tests + 1 doctest

Design notes

  • Uses serde_json (already a workspace dependency) for JSON parsing. The existing datafusion-contrib/datafusion-functions-json crate uses jiter for better performance — follow-on PRs can switch the implementation while keeping the same interface.
  • Feature flag is json_expressions, matching the pattern of nested_expressions, unicode_expressions, etc.
  • Registration wired through session_state_defaults.rs, same as other function crates.

Follow-on PRs

Will add the remaining functions from datafusion-functions-json: json_get, json_get_int, json_get_float, json_get_bool, json_get_json, json_get_array, json_as_text, json_length, json_contains, and -> / ->> operators.

Ref: #21301

Adds a new `datafusion-functions-json` crate to the workspace with
`json_get_str` as the initial function, establishing the integration
pattern for bringing JSON functions into core DataFusion.

This PR implements the scaffolding discussed in apache#21301:
- New crate `datafusion/functions-json/` with registration hook
- `json_get_str(json, *keys) -> str` extracts a string value from
  a JSON string at the given path (supports nested keys and array indices)
- Optional feature flag `json_expressions` on the core crate (not default)
- Handles Utf8, LargeUtf8, and Utf8View string types
- Minimum 2 args enforced at planning time via coerce_types
- SQL-level tests via sqllogictest
- 9 unit tests + 1 doctest

Follow-on PRs will add the remaining functions from
datafusion-functions-json (json_get, json_get_int, json_get_float,
json_get_bool, json_get_json, json_get_array, json_as_text,
json_length, json_contains, and -> / ->> operators).

Closes apache#21301 (partial)
@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant