Skip to content

TEST - troubleshoot #21315#21334

Draft
kosiew wants to merge 8 commits intoapache:mainfrom
kosiew:test-21150
Draft

TEST - troubleshoot #21315#21334
kosiew wants to merge 8 commits intoapache:mainfrom
kosiew:test-21150

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented Apr 3, 2026

NOT FOR REVIEW

Trying to reproduce a CI error from #21315

kosiew added 8 commits April 2, 2026 16:32
Return ScalarValue::Dictionary(...) in dictionary batches instead of
unwrapping to inner scalars. Enhance min_max! logic to safely handle
dictionary-vs-dictionary and dictionary-vs-non-dictionary comparisons.
Add regression tests for raw-dictionary covering no-coercion,
null-containing, and multi-batch scenarios.
Centralize dictionary batch handling for min/max operations.
Streamline min_max_batch_generic to initialize from the first
non-null element. Implement shared setup/assert helpers in
dictionary tests to reduce repetition while preserving test
coverage.
Refactor dictionary min/max flow by removing the wrap macro arm,
making re-wrapping explicit through a private helper. This
separates the "choose inner winner" from the "wrap as
dictionary" step for easier auditing.

In `datafusion/functions-aggregate/src/min_max.rs`, update
`string_dictionary_batch` to accept slices instead of owned
Vecs, and introduce a small `evaluate_dictionary_accumulator`
helper to streamline min/max assertions with a shared
accumulator execution path, reducing repeated setup.
Update min_max.rs to ensure dictionary batches iterate
actual array rows, comparing referenced scalar values.
Unreferenced dictionary entries no longer affect MIN/MAX,
and referenced null values are correctly skipped.
Expanded tests to cover these changes and updated
expectations
Added regression tests for unreferenced and referenced
null dictionary values.
Update get_min_max_result_type to maintain Dictionary<K, V>
instead of unwrapping to V, allowing planned MIN/MAX execution to
utilize the dictionary-aware accumulator. Add end-to-end SQL
regression test to ensure MIN/MAX properly ignores unreferenced
dictionary values and preserves dictionary-typed output schema.
Adjust unit expectations for dictionary coercion tests to reflect
new planned-path behavior.
Update the MIN/MAX integration test in basic.rs to use two MemTable
partitions, ensuring the physical plan includes both partial and final
aggregate stages. Retain checks for dictionary-typed output schema
and results. In min_max.rs, remove the no-op min_max_dictionary!
macro and inline the existing generic comparison helper for
improved clarity and efficiency.
- Refactored the formatting in `basic.rs` to enhance readability by breaking long lines into shorter segments.
- Updated `min_max.rs` for consistent formatting in the `test_min_max_dictionary_ignores_unreferenced_values` function.
Ensure MIN/MAX(Dictionary(..., T)) returns T at SQL boundary
while retaining the new dictionary comparison logic.
Update regression tests to verify that dictionary inputs
are accepted and that the result is the underlying scalar type.
Adjust planner-level regression test in basic.rs to expect
final output schema to be Utf8 instead of dictionary-typed.
@github-actions github-actions bot added core Core DataFusion crate functions Changes to functions implementation labels Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant