Fix ORDER BY with named aggregates that expand to subqueries#34
Fix ORDER BY with named aggregates that expand to subqueries#34
Conversation
When AGGREGATE(...) AT (...) expressions expand to correlated subqueries and are referenced by alias in ORDER BY, DuckDB errors with "Alias has subquery - not yet supported." Fix by wrapping the inner query in a subquery when ORDER BY references a subquery-aliased SELECT item.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1b14ae6ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
yardstick-rs/src/sql/measures.rs
Outdated
| let select_start = upper.find("SELECT").map(|p| p + 6).unwrap_or(0); | ||
| let select_text = &sql[select_start..from_pos]; |
There was a problem hiding this comment.
Find outer SELECT before extracting subquery aliases
extract_subquery_aliases_from_select uses upper.find("SELECT"), which picks the first SELECT anywhere in the SQL text rather than the outer query's SELECT. For statements with CTEs (for example WITH cte AS (SELECT ...) SELECT ..., (SELECT ...) AS year_total ... ORDER BY year_total), this slices the wrong SELECT list, so year_total is missed (or aliases from the CTE are picked instead). In that case needs_wrap stays false and the original DuckDB Alias has subquery error still occurs for ORDER BY on the subquery alias.
Useful? React with 👍 / 👎.
Fixes a bug where extract_subquery_aliases_from_select would match a SELECT inside a CTE rather than the outer query's SELECT, causing the ORDER BY wrapping to be skipped for CTE-based queries.
|
@codex review |
1 similar comment
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e25309ea64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
yardstick-rs/src/sql/measures.rs
Outdated
| while let Some(pos) = text_upper[search_from..].find(&ident_upper) { | ||
| let abs = search_from + pos; | ||
| let before_ok = abs == 0 || { | ||
| let c = text.as_bytes()[abs - 1]; | ||
| !c.is_ascii_alphanumeric() && c != b'_' && c != b'.' |
There was a problem hiding this comment.
Ignore literals/comments when detecting ORDER BY alias refs
identifier_appears_in does a raw substring scan and boundary check, so alias names inside string literals/comments are treated as real identifier references. In that case needs_wrap becomes true and the query is wrapped, which can invalidate otherwise-valid ORDER BY expressions that rely on original table qualifiers (for example o.col) because those qualifiers no longer exist in the outer SELECT * FROM (...) _q. This creates a regression for queries where the alias text appears only as literal/comment text.
Useful? React with 👍 / 👎.
yardstick-rs/src/sql/measures.rs
Outdated
| b'\'' => { | ||
| // Skip single-quoted strings | ||
| let mut j = i + 1; | ||
| while j < select_text.len() { | ||
| if bytes[j] == b'\'' { |
There was a problem hiding this comment.
Advance past quoted strings when scanning SELECT items
In the single-quote branch, the scanner computes j to skip a string literal but never updates the loop index, so commas inside '...' are still processed as top-level separators. That can split one SELECT item into fragments and hide a subquery alias (e.g. (SELECT 1) || 'a,b' AS x), preventing the wrap logic from triggering and leaving the original DuckDB alias/subquery error unfixed.
Useful? React with 👍 / 👎.
Fixes two issues from PR review: 1. extract_subquery_aliases_from_select did not advance past quoted strings, so commas inside literals could split SELECT items 2. identifier_appears_in matched alias names inside string literals and comments, potentially triggering unnecessary query wrapping
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50e133e095
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
yardstick-rs/src/sql/measures.rs
Outdated
| b'(' => depth += 1, | ||
| b')' => depth -= 1, | ||
| _ if depth == 0 => { | ||
| if i + 4 <= item.len() && upper[i..].starts_with(" AS ") { |
There was a problem hiding this comment.
Avoid UTF-8 boundary panics in alias AS scan
subquery_alias_from_item walks item by byte index and slices upper[i..] at each top-level byte. If a multibyte character appears outside parentheses (for example AGGREGATE(...) AT (...) || 'é' AS year_total), i can point into the middle of a UTF-8 code point and panic with byte index ... is not a char boundary. This path is executed for any query with top-level ORDER BY, so queries that previously executed can now crash before SQL reaches DuckDB.
Useful? React with 👍 / 👎.
yardstick-rs/src/sql/measures.rs
Outdated
| let c = stripped.as_bytes()[abs - 1]; | ||
| !c.is_ascii_alphanumeric() && c != b'_' && c != b'.' | ||
| }; |
There was a problem hiding this comment.
Exclude quoted qualified names from alias detection
The before_ok boundary check only rejects matches when the immediate previous byte is .. For ORDER BY o."year_total", the previous byte at year_total is ", so this function incorrectly treats it as an unqualified alias reference. That sets needs_wrap = true, and wrapping to SELECT * FROM (...) _q removes table alias o, causing valid qualified ORDER BY clauses to fail with missing-table binder errors.
Useful? React with 👍 / 👎.
- Use char_indices() in subquery_alias_from_item to avoid slicing at invalid UTF-8 boundaries when multibyte characters appear - Reject matches preceded by " or ` in identifier_appears_in to avoid treating o."year_total" as an unqualified alias reference
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c4bb0a6d89
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
yardstick-rs/src/sql/measures.rs
Outdated
| '(' => depth += 1, | ||
| ')' => depth -= 1, | ||
| _ if depth == 0 && ch == ' ' => { | ||
| if upper[i..].starts_with(" AS ") |
There was a problem hiding this comment.
Avoid indexing
upper with byte offsets from item
Fresh evidence: although this now uses char_indices(), i comes from item while the slice is taken from upper = item.to_uppercase(), whose byte length can differ for Unicode case mappings. For inputs like (SELECT 1) || 'ıı…' AS year_total, upper becomes shorter and upper[i..] can panic with an out-of-bounds/char-boundary error before SQL reaches DuckDB, so a valid query can crash the process.
Useful? React with 👍 / 👎.
yardstick-rs/src/sql/measures.rs
Outdated
| let abs = search_from + pos; | ||
| let before_ok = abs == 0 || { | ||
| let c = stripped.as_bytes()[abs - 1]; | ||
| !c.is_ascii_alphanumeric() && c != b'_' && c != b'.' && c != b'"' && c != b'`' |
There was a problem hiding this comment.
Match unqualified quoted aliases in ORDER BY detection
The boundary check rejects matches when the preceding byte is " or `, which avoids o."alias" false positives but also drops legitimate unqualified references like ORDER BY "year_total" (or backtick-quoted equivalents). In that case needs_wrap stays false, so queries that alias subquery expressions with quoted identifiers still hit DuckDB's Alias has subquery failure instead of being wrapped.
Useful? React with 👍 / 👎.
- subquery_alias_from_item: compare " AS " directly on item bytes instead of slicing the uppercased string, which can have different byte offsets for certain Unicode case mappings - identifier_appears_in: only reject " and ` before a match when preceded by . (qualified reference like o."alias"), allowing bare quoted identifiers like "year_total" to match correctly
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e630dbb883
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if before_ok && after_ok { | ||
| return true; | ||
| } | ||
| search_from = abs + 1; |
There was a problem hiding this comment.
Advance identifier scan by char boundary
identifier_appears_in advances with search_from = abs + 1, but abs is a byte offset in UTF-8. With non-ASCII aliases (for example AS "é"), a non-boundary match attempt (like scanning ORDER BY té) can set search_from into the middle of a multibyte character, and the next stripped_upper[search_from..] slice panics. This makes otherwise valid queries crash during SQL expansion instead of returning a planner error/result.
Useful? React with 👍 / 👎.
yardstick-rs/src/sql/measures.rs
Outdated
| ')' => depth -= 1, | ||
| ' ' if depth == 0 => { | ||
| if item[i..].len() >= 4 | ||
| && item[i..].as_bytes()[1..4].eq_ignore_ascii_case(b"AS ") |
There was a problem hiding this comment.
Parse AS aliases with general whitespace
subquery_alias_from_item only recognizes aliases when the token pattern is exactly " AS ", so valid SQL formatting like (SELECT ...) AS\nyear_total or (SELECT ...) AS\tyear_total is missed. In those cases the subquery alias is not collected, needs_wrap stays false, and ORDER BY on that alias still hits DuckDB's Alias has subquery failure that this patch is meant to avoid.
Useful? React with 👍 / 👎.
- identifier_appears_in: advance search_from to the next valid char boundary after a non-matching hit, preventing panics on non-ASCII text - subquery_alias_from_item: match AS with any surrounding whitespace (tabs, newlines, multiple spaces), not just single spaces
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d6f736c00
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
yardstick-rs/src/sql/measures.rs
Outdated
| if !item.to_uppercase().contains("(SELECT ") { | ||
| return None; |
There was a problem hiding this comment.
Relax subquery detection beyond literal "(SELECT "
subquery_alias_from_item only proceeds when the SELECT item text contains the exact substring "(SELECT ", so valid scalar subqueries formatted as ( SELECT ...), (SELECT\n...), or (WITH ... SELECT ...) are skipped. In those cases the alias is never collected, needs_wrap stays false, and ORDER BY on that alias still hits DuckDB’s "Alias has subquery" error this patch is meant to avoid.
Useful? React with 👍 / 👎.
| while let Some(pos) = stripped_upper[search_from..].find(&ident_upper) { | ||
| let abs = search_from + pos; | ||
| let before_ok = abs == 0 || { | ||
| let c = stripped.as_bytes()[abs - 1]; | ||
| if c == b'"' || c == b'`' { |
There was a problem hiding this comment.
Avoid matching aliases in ORDER BY keyword contexts
The alias check is a raw whole-word search, so aliases that are also SQL tokens (for example year) are treated as references when they appear in syntax like EXTRACT(YEAR FROM o.ts). That false positive sets needs_wrap = true and rewrites to SELECT * FROM (...) _q ORDER BY ...; after wrapping, qualified references such as o.ts are no longer valid and previously-working queries can fail with binder errors.
Useful? React with 👍 / 👎.
- item_has_subquery: detect subqueries with whitespace after opening paren and CTE-style (WITH ... SELECT ...) patterns - identifier_appears_in: track parenthesis depth and only match at top level, preventing false positives like EXTRACT(YEAR FROM ...) triggering unnecessary query wrapping
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ec199d84b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
yardstick-rs/src/sql/measures.rs
Outdated
| // Only match at top level (outside parentheses) | ||
| let at_top_level = depths[abs] == 0; | ||
| let before_ok = abs == 0 || { |
There was a problem hiding this comment.
Match aliases inside parenthesized ORDER BY expressions
identifier_appears_in only considers matches when depths[abs] == 0, so alias references nested in parentheses are ignored. That means common ORDER BY forms like COALESCE(year_total, 0) or (year_total) will not trigger wrapping even when year_total is a scalar-subquery alias, and DuckDB will still raise the same Alias has subquery error this patch is intended to avoid.
Useful? React with 👍 / 👎.
| if i + 2 <= bytes.len() | ||
| && bytes[i..i + 2].eq_ignore_ascii_case(b"AS") | ||
| && (i + 2 >= bytes.len() || bytes[i + 2].is_ascii_whitespace()) | ||
| { |
There was a problem hiding this comment.
Support subquery aliases declared without AS
subquery_alias_from_item only records aliases when it finds the AS token pattern, but SQL allows select-item aliases without AS (e.g. (SELECT ...) year_total). In that valid form, the subquery alias is never collected, needs_wrap stays false, and ORDER BY year_total still hits DuckDB's subquery-alias planner error.
Useful? React with 👍 / 👎.
- identifier_appears_in: remove depth=0 restriction so aliases inside COALESCE(year_total, 0) or (year_total) are correctly detected - subquery_alias_from_item: detect implicit aliases without AS keyword, e.g. (SELECT ...) year_total
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f83ac55ebb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
yardstick-rs/src/sql/measures.rs
Outdated
| if j + 6 <= bytes.len() && bytes[j..j + 6].eq_ignore_ascii_case(b"SELECT") { | ||
| return true; | ||
| } | ||
| if j + 4 <= bytes.len() && bytes[j..j + 4].eq_ignore_ascii_case(b"WITH") { | ||
| return true; |
There was a problem hiding this comment.
Require token boundaries for SELECT/WITH subquery checks
item_has_subquery treats any parenthesized text that starts with SELECT or WITH as a subquery, but it does not verify a keyword boundary. Expressions like (selected_value) AS x or (with_tax) AS x are therefore misclassified as subqueries, so expand_aggregate_with_at can wrap the whole query when ORDER BY x is present. That unnecessary wrap changes name resolution context and can break otherwise-valid qualified sort keys (for example ORDER BY x, o.col) because o is no longer visible outside SELECT * FROM (...) _q.
Useful? React with 👍 / 👎.
item_has_subquery now verifies the character after SELECT/WITH is not alphanumeric or underscore, preventing identifiers like selected_value or with_tax from being misclassified as subquery keywords.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d430c8a954
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
yardstick-rs/src/sql/measures.rs
Outdated
| if bytes[i] == b'(' { | ||
| let mut j = i + 1; | ||
| while j < bytes.len() && bytes[j].is_ascii_whitespace() { | ||
| j += 1; | ||
| } | ||
| if j + 6 <= bytes.len() | ||
| && bytes[j..j + 6].eq_ignore_ascii_case(b"SELECT") | ||
| && (j + 6 >= bytes.len() || !bytes[j + 6].is_ascii_alphanumeric() && bytes[j + 6] != b'_') |
There was a problem hiding this comment.
Skip literals/comments when detecting subquery aliases
item_has_subquery scans raw bytes and treats any ("SELECT"|"WITH") sequence as a real subquery, even when it appears inside a string literal or comment. That makes extract_subquery_aliases_from_select misclassify non-subquery SELECT items as subquery aliases, which can trigger the new outer wrapping path unnecessarily. In aggregate queries this can turn previously valid ORDER BY clauses (especially ones that also reference qualified columns like t.col) into binder errors after wrapping because the original table alias is no longer in scope.
Useful? React with 👍 / 👎.
item_has_subquery now skips single-quoted strings, line comments, and block comments so that text like '(SELECT ...)' inside a literal is not misclassified as a real subquery.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7740e275cb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| b'(' => depth += 1, | ||
| b')' => { | ||
| depth -= 1; |
There was a problem hiding this comment.
Skip string/comment parens in subquery alias scan
subquery_alias_from_item updates depth for every ( / ) byte but does not skip quoted strings or comments, so parentheses inside literals/comments after a scalar subquery can drive depth negative before the alias is parsed. For inputs like (SELECT ...) || ')' AS year_total, last_alias_start stays unset, the alias is missed, and the ORDER BY wrapper is not applied, so the original DuckDB "Alias has subquery" failure still occurs.
Useful? React with 👍 / 👎.
Summary
Fixes #28. When
AGGREGATE(...) AT (...)expressions expand to correlated subqueries and are referenced by alias inORDER BY, DuckDB errors with "Alias has subquery - not yet supported." This wraps the inner query in a subquery (SELECT * FROM (...) _q ORDER BY ...) whenORDER BYreferences a subquery-aliased SELECT item, so aliases resolve as plain column references.The wrapping is targeted: it only activates when
ORDER BYactually references a subquery alias (via word-boundary matching), avoiding breakage for queries that use qualified column names likeo.prodNameinORDER BY.Adds 3 integration tests and 3 unit tests covering expression-based ORDER BY (
revenue/year_total), simple alias ORDER BY, and DESC ordering.