Skip to content

fix: prevent duplicate alias collision with user-provided __datafusion_extracted names (#20432)#81

Draft
Broshen wants to merge 1 commit intoDataDog:mainfrom
Broshen:boshen/cherry-pick-fix-name-tracker
Draft

fix: prevent duplicate alias collision with user-provided __datafusion_extracted names (#20432)#81
Broshen wants to merge 1 commit intoDataDog:mainfrom
Broshen:boshen/cherry-pick-fix-name-tracker

Conversation

@Broshen
Copy link

@Broshen Broshen commented Feb 24, 2026

  • Fixes a bug where the optimizer's AliasGenerator could produce alias names that collide with__datafusion_extracted_N aliases, causing a "Schema contains duplicate unqualified field name" error
  • I don't expect users themselves to create these aliases, but if you run the optimizers twice (with different AliasGenerator instances) you'll hit this.
  • Adds AliasGenerator::update_min_id() to advance the counter past existing aliases
  • Scans each plan node's expressions during ExtractLeafExpressions traversal to seed the generator before any extraction occurs
  • Switches to controlling the traversal which also means the config-based short circuit more clearly skips the entire rule.

Closes apache#20430

  • Unit test: test_user_provided_extracted_alias_no_collision in extract_leaf_expressions
  • SLT regression test in projection_pushdown.slt with explicit __datafusion_extracted_2 alias

🤖 Generated with Claude Code


Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

…n_extracted names (apache#20432)

- Fixes a bug where the optimizer's `AliasGenerator` could produce alias
names that collide with`__datafusion_extracted_N` aliases, causing a
"Schema contains duplicate unqualified field name" error
- I don't expect users themselves to create these aliases, but if you
run the optimizers twice (with different `AliasGenerator` instances)
you'll hit this.
- Adds `AliasGenerator::update_min_id()` to advance the counter past
existing aliases
- Scans each plan node's expressions during `ExtractLeafExpressions`
traversal to seed the generator before any extraction occurs
- Switches to controlling the traversal which also means the
config-based short circuit more clearly skips the entire rule.

Closes apache#20430

- [x] Unit test: `test_user_provided_extracted_alias_no_collision` in
`extract_leaf_expressions`
- [x] SLT regression test in `projection_pushdown.slt` with explicit
`__datafusion_extracted_2` alias

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimizer rule error for ExtractLeafExpressions

2 participants