[SPARK-56001][SQL][FOLLOWUP] Reject table alias for INSERT ... REPLACE WHERE#55871
Closed
cloud-fan wants to merge 1 commit into
Closed
[SPARK-56001][SQL][FOLLOWUP] Reject table alias for INSERT ... REPLACE WHERE#55871cloud-fan wants to merge 1 commit into
cloud-fan wants to merge 1 commit into
Conversation
…E WHERE The grammar for INSERT ... REPLACE WHERE | ON unifies the two variants into `#insertIntoReplaceBooleanCond` and accepts a `tableAlias` for both, because REPLACE ON's condition can reference the target via the alias (e.g. `t.col`). The REPLACE WHERE branch in the visitor never reads `ctx.tableAlias()`, so an alias supplied to REPLACE WHERE is silently ignored. A query like INSERT INTO t AS s REPLACE WHERE s.a = 1 SELECT * FROM source parses successfully, then fails at analysis with a confusing "column s.a not found" because the underlying `UnresolvedRelation` was not wrapped with the alias. Reject the alias at parse time so users get a clear error pointing at the right place. The grammar stays unified (no rule split); the visitor adds a single guard before the WHERE branch's existing logic and throws the new `INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED` parse error suggesting REPLACE ON when an alias is needed. The two existing `insert table: REPLACE WHERE with tableAlias …` parser tests in `DDLParserSuite` documented the silent-ignore behavior; they are rewritten to assert the new parse error.
Contributor
Author
gengliangwang
approved these changes
May 15, 2026
Member
|
Thanks, merging to master/4.x/4.2 |
Member
|
cc @huaxingao this is a follow-up to improve user-experience. |
gengliangwang
pushed a commit
that referenced
this pull request
May 15, 2026
…E WHERE Followup to #54722. ### What changes were proposed in this pull request? The grammar for INSERT ... REPLACE WHERE | ON unifies the two variants into `#insertIntoReplaceBooleanCond` and accepts a `tableAlias` for both, because REPLACE ON's condition can reference the target via the alias (e.g. `t.col`). The REPLACE WHERE branch in `AstBuilder` never reads `ctx.tableAlias()`, so an alias supplied to REPLACE WHERE is silently ignored. A query like ```sql INSERT INTO t AS s REPLACE WHERE s.a = 1 SELECT * FROM source ``` parses successfully, then fails at analysis with a confusing "column s.a not found" because the underlying `UnresolvedRelation` was not wrapped with the alias. This PR rejects the alias at parse time so users get a clear error pointing at the right place. The grammar stays unified (no rule split); the visitor adds a single guard before the WHERE branch's existing logic and throws a new `INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED` parse error that suggests REPLACE ON when an alias is needed. ### Why are the changes needed? The current behavior — silently ignoring the alias and then failing at analysis — is misleading. Either the alias should be wired through (a semantic change requiring more invasive plumbing through `OverwriteByExpression`'s write resolution path) or it should be rejected. Rejecting it at parse time is the smaller, safer fix and matches the natural reading of the grammar (an alias only makes sense when the condition references the target via the alias, which is REPLACE ON's case, not REPLACE WHERE's). ### Does this PR introduce *any* user-facing change? Yes. `INSERT INTO t AS s REPLACE WHERE …` now fails with `INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED` at parse time instead of silently dropping the alias and failing later (or, for queries whose WHERE doesn't reference the alias, silently producing the same plan as if the alias were absent). The new error message suggests using REPLACE ON for cases that need the alias. ### How was this patch tested? - Two existing `DDLParserSuite` tests (`insert table: REPLACE WHERE with tableAlias [and / without] BY NAME`) documented the silent-ignore behavior; they are rewritten to assert the new parse error. - Verified the rewritten tests fail without the AstBuilder guard and pass with it. ### Was this patch authored or co-authored using generative AI tooling? Yes — written with assistance from Claude. Closes #55871 from cloud-fan/SPARK-56001-followup. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 6d96d19) Signed-off-by: Gengliang Wang <gengliang@apache.org>
gengliangwang
pushed a commit
that referenced
this pull request
May 15, 2026
…E WHERE Followup to #54722. ### What changes were proposed in this pull request? The grammar for INSERT ... REPLACE WHERE | ON unifies the two variants into `#insertIntoReplaceBooleanCond` and accepts a `tableAlias` for both, because REPLACE ON's condition can reference the target via the alias (e.g. `t.col`). The REPLACE WHERE branch in `AstBuilder` never reads `ctx.tableAlias()`, so an alias supplied to REPLACE WHERE is silently ignored. A query like ```sql INSERT INTO t AS s REPLACE WHERE s.a = 1 SELECT * FROM source ``` parses successfully, then fails at analysis with a confusing "column s.a not found" because the underlying `UnresolvedRelation` was not wrapped with the alias. This PR rejects the alias at parse time so users get a clear error pointing at the right place. The grammar stays unified (no rule split); the visitor adds a single guard before the WHERE branch's existing logic and throws a new `INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED` parse error that suggests REPLACE ON when an alias is needed. ### Why are the changes needed? The current behavior — silently ignoring the alias and then failing at analysis — is misleading. Either the alias should be wired through (a semantic change requiring more invasive plumbing through `OverwriteByExpression`'s write resolution path) or it should be rejected. Rejecting it at parse time is the smaller, safer fix and matches the natural reading of the grammar (an alias only makes sense when the condition references the target via the alias, which is REPLACE ON's case, not REPLACE WHERE's). ### Does this PR introduce *any* user-facing change? Yes. `INSERT INTO t AS s REPLACE WHERE …` now fails with `INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED` at parse time instead of silently dropping the alias and failing later (or, for queries whose WHERE doesn't reference the alias, silently producing the same plan as if the alias were absent). The new error message suggests using REPLACE ON for cases that need the alias. ### How was this patch tested? - Two existing `DDLParserSuite` tests (`insert table: REPLACE WHERE with tableAlias [and / without] BY NAME`) documented the silent-ignore behavior; they are rewritten to assert the new parse error. - Verified the rewritten tests fail without the AstBuilder guard and pass with it. ### Was this patch authored or co-authored using generative AI tooling? Yes — written with assistance from Claude. Closes #55871 from cloud-fan/SPARK-56001-followup. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 6d96d19) Signed-off-by: Gengliang Wang <gengliang@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Followup to #54722.
What changes were proposed in this pull request?
The grammar for INSERT ... REPLACE WHERE | ON unifies the two variants into
#insertIntoReplaceBooleanCondand accepts atableAliasfor both, because REPLACE ON's condition can reference the target via the alias (e.g.t.col). The REPLACE WHERE branch inAstBuildernever readsctx.tableAlias(), so an alias supplied to REPLACE WHERE is silently ignored. A query likeparses successfully, then fails at analysis with a confusing "column s.a not found" because the underlying
UnresolvedRelationwas not wrapped with the alias.This PR rejects the alias at parse time so users get a clear error pointing at the right place. The grammar stays unified (no rule split); the visitor adds a single guard before the WHERE branch's existing logic and throws a new
INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWEDparse error that suggests REPLACE ON when an alias is needed.Why are the changes needed?
The current behavior — silently ignoring the alias and then failing at analysis — is misleading. Either the alias should be wired through (a semantic change requiring more invasive plumbing through
OverwriteByExpression's write resolution path) or it should be rejected. Rejecting it at parse time is the smaller, safer fix and matches the natural reading of the grammar (an alias only makes sense when the condition references the target via the alias, which is REPLACE ON's case, not REPLACE WHERE's).Does this PR introduce any user-facing change?
Yes.
INSERT INTO t AS s REPLACE WHERE …now fails withINSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWEDat parse time instead of silently dropping the alias and failing later (or, for queries whose WHERE doesn't reference the alias, silently producing the same plan as if the alias were absent). The new error message suggests using REPLACE ON for cases that need the alias.How was this patch tested?
DDLParserSuitetests (insert table: REPLACE WHERE with tableAlias [and / without] BY NAME) documented the silent-ignore behavior; they are rewritten to assert the new parse error.Was this patch authored or co-authored using generative AI tooling?
Yes — written with assistance from Claude.