ClickHouse: Support scalar expressions in WITH clause#2333
Conversation
| #[test] | ||
| fn parse_with_clause_named_expression_unsupported_in_other_dialects() { | ||
| // The named-expression form is only enabled for ClickHouse; other | ||
| // dialects should still reject `WITH 42 AS answer …`. | ||
| let res = sqlparser::parser::Parser::parse_sql( | ||
| &GenericDialect {}, | ||
| "WITH 42 AS answer SELECT answer FROM t", | ||
| ); | ||
| assert!(res.is_err(), "expected parse error, got {res:?}"); | ||
| } |
There was a problem hiding this comment.
I think we can skip this, no need to test what the parser does for an invalid query for unsupported dialects
There was a problem hiding this comment.
| } | ||
|
|
||
| #[test] | ||
| fn parse_with_clause_named_expression_ast() { |
There was a problem hiding this comment.
we can merge this with the test above since they cover the same feature
There was a problem hiding this comment.
| #[test] | ||
| fn parse_with_clause_named_expression() { | ||
| // Plain literal scalar. | ||
| clickhouse().verified_stmt("WITH 42 AS answer SELECT answer FROM t"); |
There was a problem hiding this comment.
for the tests, lets use all_dialects_where instead of hardcoding it to clickhouse
There was a problem hiding this comment.
| // CTE form must start with an identifier. If the leading token | ||
| // can't begin one (e.g. `42`, `(SELECT …)`, `(x, y) -> …`), this | ||
| // is unambiguously the named-expression form. | ||
| if matches!(self.peek_token().token, Token::Word(_)) { | ||
| if let Some(cte) = self.maybe_parse(|p| p.parse_cte())? { | ||
| return Ok(WithItem::Cte(cte)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Can this be simplified to use peek_subquery_or_cte_start? e.g.
that we do if self.dialect.supports() and !self.peek_subquery_or_cte_start { parse expr } else { parse cte }
There was a problem hiding this comment.
tbh, not sure i fully understand this. with that, will WITH 42 AS answer SELECT answer FROM t parse? peek_subquery_or_cte_start returns false on 42 (not (SELECT / (WITH), so we'd go to parse_cte, which then errors on 42 not being an identifier?
There was a problem hiding this comment.
Yeah that would work right or? if we error on 42 its because the syntax was invalid and we expected a cte but got 42 instead?
There was a problem hiding this comment.
right, i was mixing things up. can you confirm this WITH (SELECT max(x) FROM t) AS m SELECT m will work with your approach? i tried and seems that the first tokens (SELECT make peek_subquery_or_cte_start return true, which routes to parse_cte, but parse_cte immediately fails on ( since it expects an identifier.
There was a problem hiding this comment.
I think the main goal with the comment was to simplify the cases a bit, so that ideally we have one branch handling the cte case, and another handling the cse case. Another way (there might be others) to accomplish that would be to try to parse a cte and if that fails fallback to cse e.g.
if let Some(cte) = self.maybe_parse(|parser| parser...)? else {
return cte
} else if self.supprots() {
return self.parse_cse
} else {
self.expected("CTE or CSE")
}maybe above woul be straightforward in that the cases that are covered is clearer?
| /// Parse a single item in a `WITH` clause. | ||
| /// | ||
| /// In standard SQL this is always a CTE (`name [(cols)] AS (query)`). | ||
| /// Dialects that enable [`Dialect::supports_with_clause_scalar_expression`] | ||
| /// — currently only ClickHouse — also accept the reversed form | ||
| /// `<expression> AS <identifier>`, which can be freely interleaved with | ||
| /// CTEs in the same comma-separated list. |
There was a problem hiding this comment.
| /// Parse a single item in a `WITH` clause. | |
| /// | |
| /// In standard SQL this is always a CTE (`name [(cols)] AS (query)`). | |
| /// Dialects that enable [`Dialect::supports_with_clause_scalar_expression`] | |
| /// — currently only ClickHouse — also accept the reversed form | |
| /// `<expression> AS <identifier>`, which can be freely interleaved with | |
| /// CTEs in the same comma-separated list. | |
| /// Parse a single item in a `WITH` clause. |
There was a problem hiding this comment.
fixed and minor renaming
3b86842#diff-fc6a8d66b8cb6bd48a119a70e489cbcef82e7f551e7cf08e8e972ef4e774ef49R14642
| #[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)] | ||
| #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] | ||
| #[cfg_attr(feature = "visitor", derive(Visit, VisitMut))] | ||
| pub enum WithItem { |
There was a problem hiding this comment.
repr wise can we do something like?
pub enum WithExpression {
Cte(...),
Cse(ExprWithAlias), // can we reuse exprwithalias?
}There was a problem hiding this comment.
| #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] | ||
| #[cfg_attr(feature = "visitor", derive(Visit, VisitMut))] | ||
| pub enum WithItem { | ||
| /// A traditional common table expression: `name [(cols)] AS [MATERIALIZED] (query)`. |
There was a problem hiding this comment.
| /// A traditional common table expression: `name [(cols)] AS [MATERIALIZED] (query)`. | |
| /// A common table expression. |
There was a problem hiding this comment.
| /// `<expr> AS <alias>` — binds an expression (literal, scalar subquery, | ||
| /// lambda, …) to a name visible in the surrounding query. | ||
| /// | ||
| /// See ClickHouse's [common scalar expressions][1]. | ||
| /// | ||
| /// [1]: https://clickhouse.com/docs/sql-reference/statements/select/with#common-scalar-expressions |
There was a problem hiding this comment.
| /// `<expr> AS <alias>` — binds an expression (literal, scalar subquery, | |
| /// lambda, …) to a name visible in the surrounding query. | |
| /// | |
| /// See ClickHouse's [common scalar expressions][1]. | |
| /// | |
| /// [1]: https://clickhouse.com/docs/sql-reference/statements/select/with#common-scalar-expressions | |
| /// A common scalar expression. | |
| /// | |
| /// [Clickhouse]: https://clickhouse.com/docs/sql-reference/statements/select/with#common-scalar-expressions |
| pub cte_tables: Vec<Cte>, | ||
| /// The items declared by this `WITH` clause: traditional CTEs and, | ||
| /// for dialects that support it, named expressions. | ||
| pub items: Vec<WithItem>, |
There was a problem hiding this comment.
| pub items: Vec<WithItem>, | |
| pub exprs: Vec<WithExpression>, |
There was a problem hiding this comment.
| /// The items declared by this `WITH` clause: traditional CTEs and, | ||
| /// for dialects that support it, named expressions. |
There was a problem hiding this comment.
| /// The items declared by this `WITH` clause: traditional CTEs and, | |
| /// for dialects that support it, named expressions. | |
| /// The expressions declared by this `WITH` clause. |
There was a problem hiding this comment.
|
|
||
| // String literal scalar (from the ClickHouse docs). | ||
| dialects.verified_stmt( | ||
| "WITH '2019-08-01 15:23:00' AS ts_upper_bound SELECT * FROM hits \ |
There was a problem hiding this comment.
for the line breaks can we use concat like e.g. here to format the sql strings?
Fixes #2221.
ClickHouse's
WITHclause supports a reversed-order form<expression> AS <identifier>(alongside, and freely interleaved with, traditional CTEs). Currently the parser hard-errors on the leading non-identifier token:Changes
WithItemenum (Cte(Cte)|Named { expr, alias });With::cte_tables: Vec<Cte>becomesWith::items: Vec<WithItem>.Dialect::supports_with_clause_scalar_expression(defaultfalse,trueforClickHouseDialectonly).parse_with_itemroutes each WITH-list element: when the leading token can't begin a CTE name, it parses<expr> AS <ident>directly; otherwise it tries the CTE form viamaybe_parseand falls back.parse_cteitself is unchanged.Tests
Added in
tests/sqlparser_clickhouse.rs, covering: literal scalar (the issue's example), string literal, aggregate, scalar subquery, bare-identifier disambiguation, mixing scalar + CTE in one list, lambda, and a negative test confirming non-ClickHouse dialects still reject the form.