Skip to content

[SPARK-56001][SQL][FOLLOWUP] Reject table alias for INSERT ... REPLACE WHERE#55871

Closed
cloud-fan wants to merge 1 commit into
apache:masterfrom
cloud-fan:SPARK-56001-followup
Closed

[SPARK-56001][SQL][FOLLOWUP] Reject table alias for INSERT ... REPLACE WHERE#55871
cloud-fan wants to merge 1 commit into
apache:masterfrom
cloud-fan:SPARK-56001-followup

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

Followup to #54722.

What changes were proposed in this pull request?

The grammar for INSERT ... REPLACE WHERE | ON unifies the two variants into #insertIntoReplaceBooleanCond and accepts a tableAlias for both, because REPLACE ON's condition can reference the target via the alias (e.g. t.col). The REPLACE WHERE branch in AstBuilder never reads ctx.tableAlias(), so an alias supplied to REPLACE WHERE is silently ignored. A query like

INSERT INTO t AS s REPLACE WHERE s.a = 1 SELECT * FROM source

parses successfully, then fails at analysis with a confusing "column s.a not found" because the underlying UnresolvedRelation was not wrapped with the alias.

This PR rejects the alias at parse time so users get a clear error pointing at the right place. The grammar stays unified (no rule split); the visitor adds a single guard before the WHERE branch's existing logic and throws a new INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED parse error that suggests REPLACE ON when an alias is needed.

Why are the changes needed?

The current behavior — silently ignoring the alias and then failing at analysis — is misleading. Either the alias should be wired through (a semantic change requiring more invasive plumbing through OverwriteByExpression's write resolution path) or it should be rejected. Rejecting it at parse time is the smaller, safer fix and matches the natural reading of the grammar (an alias only makes sense when the condition references the target via the alias, which is REPLACE ON's case, not REPLACE WHERE's).

Does this PR introduce any user-facing change?

Yes. INSERT INTO t AS s REPLACE WHERE … now fails with INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED at parse time instead of silently dropping the alias and failing later (or, for queries whose WHERE doesn't reference the alias, silently producing the same plan as if the alias were absent). The new error message suggests using REPLACE ON for cases that need the alias.

How was this patch tested?

  • Two existing DDLParserSuite tests (insert table: REPLACE WHERE with tableAlias [and / without] BY NAME) documented the silent-ignore behavior; they are rewritten to assert the new parse error.
  • Verified the rewritten tests fail without the AstBuilder guard and pass with it.

Was this patch authored or co-authored using generative AI tooling?

Yes — written with assistance from Claude.

…E WHERE

The grammar for INSERT ... REPLACE WHERE | ON unifies the two variants
into `#insertIntoReplaceBooleanCond` and accepts a `tableAlias` for both,
because REPLACE ON's condition can reference the target via the alias
(e.g. `t.col`). The REPLACE WHERE branch in the visitor never reads
`ctx.tableAlias()`, so an alias supplied to REPLACE WHERE is silently
ignored. A query like

  INSERT INTO t AS s REPLACE WHERE s.a = 1 SELECT * FROM source

parses successfully, then fails at analysis with a confusing
"column s.a not found" because the underlying `UnresolvedRelation` was
not wrapped with the alias.

Reject the alias at parse time so users get a clear error pointing at
the right place. The grammar stays unified (no rule split); the visitor
adds a single guard before the WHERE branch's existing logic and throws
the new `INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED` parse error
suggesting REPLACE ON when an alias is needed.

The two existing `insert table: REPLACE WHERE with tableAlias …` parser
tests in `DDLParserSuite` documented the silent-ignore behavior; they
are rewritten to assert the new parse error.
@cloud-fan
Copy link
Copy Markdown
Contributor Author

@gengliangwang
Copy link
Copy Markdown
Member

Thanks, merging to master/4.x/4.2

@gengliangwang
Copy link
Copy Markdown
Member

gengliangwang commented May 15, 2026

cc @huaxingao this is a follow-up to improve user-experience.

gengliangwang pushed a commit that referenced this pull request May 15, 2026
…E WHERE

Followup to #54722.

### What changes were proposed in this pull request?

The grammar for INSERT ... REPLACE WHERE | ON unifies the two variants into `#insertIntoReplaceBooleanCond` and accepts a `tableAlias` for both, because REPLACE ON's condition can reference the target via the alias (e.g. `t.col`). The REPLACE WHERE branch in `AstBuilder` never reads `ctx.tableAlias()`, so an alias supplied to REPLACE WHERE is silently ignored. A query like

```sql
INSERT INTO t AS s REPLACE WHERE s.a = 1 SELECT * FROM source
```

parses successfully, then fails at analysis with a confusing "column s.a not found" because the underlying `UnresolvedRelation` was not wrapped with the alias.

This PR rejects the alias at parse time so users get a clear error pointing at the right place. The grammar stays unified (no rule split); the visitor adds a single guard before the WHERE branch's existing logic and throws a new `INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED` parse error that suggests REPLACE ON when an alias is needed.

### Why are the changes needed?

The current behavior — silently ignoring the alias and then failing at analysis — is misleading. Either the alias should be wired through (a semantic change requiring more invasive plumbing through `OverwriteByExpression`'s write resolution path) or it should be rejected. Rejecting it at parse time is the smaller, safer fix and matches the natural reading of the grammar (an alias only makes sense when the condition references the target via the alias, which is REPLACE ON's case, not REPLACE WHERE's).

### Does this PR introduce *any* user-facing change?

Yes. `INSERT INTO t AS s REPLACE WHERE …` now fails with `INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED` at parse time instead of silently dropping the alias and failing later (or, for queries whose WHERE doesn't reference the alias, silently producing the same plan as if the alias were absent). The new error message suggests using REPLACE ON for cases that need the alias.

### How was this patch tested?

- Two existing `DDLParserSuite` tests (`insert table: REPLACE WHERE with tableAlias [and / without] BY NAME`) documented the silent-ignore behavior; they are rewritten to assert the new parse error.
- Verified the rewritten tests fail without the AstBuilder guard and pass with it.

### Was this patch authored or co-authored using generative AI tooling?

Yes — written with assistance from Claude.

Closes #55871 from cloud-fan/SPARK-56001-followup.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 6d96d19)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
gengliangwang pushed a commit that referenced this pull request May 15, 2026
…E WHERE

Followup to #54722.

### What changes were proposed in this pull request?

The grammar for INSERT ... REPLACE WHERE | ON unifies the two variants into `#insertIntoReplaceBooleanCond` and accepts a `tableAlias` for both, because REPLACE ON's condition can reference the target via the alias (e.g. `t.col`). The REPLACE WHERE branch in `AstBuilder` never reads `ctx.tableAlias()`, so an alias supplied to REPLACE WHERE is silently ignored. A query like

```sql
INSERT INTO t AS s REPLACE WHERE s.a = 1 SELECT * FROM source
```

parses successfully, then fails at analysis with a confusing "column s.a not found" because the underlying `UnresolvedRelation` was not wrapped with the alias.

This PR rejects the alias at parse time so users get a clear error pointing at the right place. The grammar stays unified (no rule split); the visitor adds a single guard before the WHERE branch's existing logic and throws a new `INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED` parse error that suggests REPLACE ON when an alias is needed.

### Why are the changes needed?

The current behavior — silently ignoring the alias and then failing at analysis — is misleading. Either the alias should be wired through (a semantic change requiring more invasive plumbing through `OverwriteByExpression`'s write resolution path) or it should be rejected. Rejecting it at parse time is the smaller, safer fix and matches the natural reading of the grammar (an alias only makes sense when the condition references the target via the alias, which is REPLACE ON's case, not REPLACE WHERE's).

### Does this PR introduce *any* user-facing change?

Yes. `INSERT INTO t AS s REPLACE WHERE …` now fails with `INSERT_REPLACE_WHERE_TABLE_ALIAS_NOT_ALLOWED` at parse time instead of silently dropping the alias and failing later (or, for queries whose WHERE doesn't reference the alias, silently producing the same plan as if the alias were absent). The new error message suggests using REPLACE ON for cases that need the alias.

### How was this patch tested?

- Two existing `DDLParserSuite` tests (`insert table: REPLACE WHERE with tableAlias [and / without] BY NAME`) documented the silent-ignore behavior; they are rewritten to assert the new parse error.
- Verified the rewritten tests fail without the AstBuilder guard and pass with it.

### Was this patch authored or co-authored using generative AI tooling?

Yes — written with assistance from Claude.

Closes #55871 from cloud-fan/SPARK-56001-followup.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 6d96d19)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants