Draft
Conversation
Return ScalarValue::Dictionary(...) in dictionary batches instead of unwrapping to inner scalars. Enhance min_max! logic to safely handle dictionary-vs-dictionary and dictionary-vs-non-dictionary comparisons. Add regression tests for raw-dictionary covering no-coercion, null-containing, and multi-batch scenarios.
Centralize dictionary batch handling for min/max operations. Streamline min_max_batch_generic to initialize from the first non-null element. Implement shared setup/assert helpers in dictionary tests to reduce repetition while preserving test coverage.
Refactor dictionary min/max flow by removing the wrap macro arm, making re-wrapping explicit through a private helper. This separates the "choose inner winner" from the "wrap as dictionary" step for easier auditing. In `datafusion/functions-aggregate/src/min_max.rs`, update `string_dictionary_batch` to accept slices instead of owned Vecs, and introduce a small `evaluate_dictionary_accumulator` helper to streamline min/max assertions with a shared accumulator execution path, reducing repeated setup.
Update min_max.rs to ensure dictionary batches iterate actual array rows, comparing referenced scalar values. Unreferenced dictionary entries no longer affect MIN/MAX, and referenced null values are correctly skipped. Expanded tests to cover these changes and updated expectations Added regression tests for unreferenced and referenced null dictionary values.
Update get_min_max_result_type to maintain Dictionary<K, V> instead of unwrapping to V, allowing planned MIN/MAX execution to utilize the dictionary-aware accumulator. Add end-to-end SQL regression test to ensure MIN/MAX properly ignores unreferenced dictionary values and preserves dictionary-typed output schema. Adjust unit expectations for dictionary coercion tests to reflect new planned-path behavior.
Update the MIN/MAX integration test in basic.rs to use two MemTable partitions, ensuring the physical plan includes both partial and final aggregate stages. Retain checks for dictionary-typed output schema and results. In min_max.rs, remove the no-op min_max_dictionary! macro and inline the existing generic comparison helper for improved clarity and efficiency.
- Refactored the formatting in `basic.rs` to enhance readability by breaking long lines into shorter segments. - Updated `min_max.rs` for consistent formatting in the `test_min_max_dictionary_ignores_unreferenced_values` function.
Ensure MIN/MAX(Dictionary(..., T)) returns T at SQL boundary while retaining the new dictionary comparison logic. Update regression tests to verify that dictionary inputs are accepted and that the result is the underlying scalar type. Adjust planner-level regression test in basic.rs to expect final output schema to be Utf8 instead of dictionary-typed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NOT FOR REVIEW
Trying to reproduce a CI error from #21315