fix: use spill writer's schema instead of the first batch schema for spill files#21293
Open
gruuya wants to merge 1 commit intoapache:mainfrom
Open
fix: use spill writer's schema instead of the first batch schema for spill files#21293gruuya wants to merge 1 commit intoapache:mainfrom
gruuya wants to merge 1 commit intoapache:mainfrom
Conversation
debff98 to
fd58942
Compare
635b511 to
d0bcb4e
Compare
d0bcb4e to
2f08c97
Compare
alamb
approved these changes
Apr 2, 2026
| } | ||
| } | ||
|
|
||
| #[cfg(test)] |
Contributor
There was a problem hiding this comment.
I wonder if the unit tests are needed give the end to end tests?
Contributor
Author
There was a problem hiding this comment.
I was actually of the opposite opinion—the e2e test is slightly unwieldy to me, and convoluted in a way, since it tries to reproduce the bug using non-standard primitives and hit the right spill conditions in an unnatural way (because otherwise it's hard to repro). For instance union/repartition exec used directly instead of sql/dataframe API, special task ctx, sort_batch called directly on batches instead of wrapping with SortExec.
Also InProgressSpillFile spill file unit tests were missing anyway.
Either way, I'm fine with dropping either one (or none) of those, let me know what you think.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
InProgressSpillFilewill use the first batch schema when it has no writer, which could have different nullability from the subsequent batches (e.g. due to union-ing a literal projection and a table column projection). This can then lead to a panic insort_batch.InProgressSpillFilealready has access to the canonical schema that the spill file should have (self.spill_writer.schema()).What changes are included in this PR?
Single line fix: instead of using the first batch schema for the spill file schema, use the spill_writer's schema instead.
The rest of the changes are two new tests.
Are these changes tested?
Yes, two new tests are added.
Are there any user-facing changes?