[CALCITE-7551] Project/Filter/Join transpose and merge rules should not duplicate non-deterministic expressions (e.g. RAND())#4970
Conversation
…ot duplicate non-deterministic expressions (e.g. RAND())
| return null; | ||
| } | ||
| }.visitEach(nodes); | ||
| for (int i = 0; i < refs.length; i++) { |
There was a problem hiding this comment.
As discussed in Jira, this is a bit too conservative, since it will not distinguish CURRENT_TIMESTAMP from RAND. But fixing that may be in the scope of a separate PR - we need really two separate notions of nondeterminism.
There was a problem hiding this comment.
Agreed. We should complete the fine-grained judgment of the deterministic function first before completing this PR.
There was a problem hiding this comment.
As far as I understand, CURRENT_TIMESTAMP is actually different than non-deterministic functions like RAND().
-
Non-deterministic function: it may return different values for every evaluations.
1.1 Returns false toisDeterministic().
1.2 Returns false toisDynamicFunction(). -
Dynamic Function: It will return same value at every call site within one statement; can differ across executions
2.1 Returns true toisDeterministic().
2.2 Returns true toisDynamicFunction().
And this path only blocks when we get isDeterministic() method response as false. Dynamic functions like CURRENT_TIMESTAMP can be duplicated and actually it is safe to duplicate them.
| sql(sql).withConformance(SqlConformanceEnum.PRESTO).ok(); | ||
| } | ||
|
|
||
| /** Test case of |
There was a problem hiding this comment.
why is SqlToRel connected to these rules?
There was a problem hiding this comment.
Because SqlToRelConverter builds its RelNode tree through RelBuilder, and RelBuilder eagerly merges adjacent projects at construction time using the same helper that the planner rules use.
Concretely, RelBuilder.project_ method uses the method RelOptUtil.pushPastProjectUnlessBloat(nodeList, project, config.bloat()) which we are fixing as part of this PR, so I thought of putting a test here as well.
| // non-deterministic evaluation (e.g. RAND()) into two: one consumed by | ||
| // the new filter condition, and the original still produced by the | ||
| // project above. Refuse to transpose in that case. | ||
| if (!project.getProjects().stream().allMatch(RexUtil::isDeterministic)) { |
There was a problem hiding this comment.
Blocks transpose (because RAND() exists in project)? eg:
SELECT * FROM (
SELECT RAND() as r, col1 as b FROM emp
)
WHERE col1 > 0
…ferences a non-deterministic projected column
…ondition references a non-deterministic projected column
|



Jira Link
CALCITE-7551
Changes Proposed
Several optimizer and
RelBuilder-level transformations inline a lower-projection expression into multiple positions of an upper expression without checking whether the inlined expression is non-deterministic. When the lower expression is non-deterministic (e.g.RAND()), this silently turns one evaluation into many, changing query semantics.Reproduction
Before the fix,
SqlToRelConverterproduces:The two
RAND()calls are evaluated independently, soB - Ais no longer always1. After the fix the inner projection is preserved andA,Bshare a single evaluation:Affected code paths and fixes
The guard only kicks in when the transform would actually duplicate a non-deterministic evaluation — not whenever a non-deterministic column merely exists. The precise condition depends on whether the lower project is consumed by the transform or re-emitted above the new node:
RelOptUtil.pushPastProjectUnlessBloatpushShuttle(Project)substitutes everyRexInputRefwith the underlying projection. OnlyRexOverwas guarded; determinism was not. The bottom project is consumed (inlined away).RexInputRefuses per bottom slot; returnnullonly if a non-deterministic bottom expression is referenced more than once (a single reference yields a single inlined copy, which is safe). CoversProjectMergeRuleandFilterProjectTransposeRule's helper call, plus theRelBuilderbloat merge thatSqlToRelConverterrelies on.FilterProjectTransposeRuleRelOptUtil.InputFinder.bits). Filters on deterministic columns are pushed even if the project computes an unrelatedRAND().JoinProjectTransposeRuleRexProgramBuilder.mergePrograms+mergedProgram.expandLocalRef, inlining projected expressions into the new join condition while re-emitting the project above.nLeftFields).SemiJoinProjectTransposeRulemergePrograms+expandLocalRefpattern asJoinProjectTransposeRule.The "consumed vs re-emitted" distinction is why the threshold differs:
ProjectMergeRuleblocks at> 1references (the bottom project disappears, so one reference = one surviving copy), while the transpose rules block at>= 1(the project survives on top, so a single referenced occurrence already yields two evaluations).The fix uses
RexUtil.isDeterministic(...), which correctly distinguishes operators that may return different values across call sites within a single statement (RANDoverrides tofalse) from operators that are non-deterministic only across executions (CURRENT_TIMESTAMP, markedisDynamicFunction = truebut stillisDeterministic = true). The latter is safe to duplicate and is unaffected by this change.Tests
Blocked cases (transform must not fire / not duplicate):
RelOptRulesTest.testProjectMergeShouldIgnoreNonDeterministicRelOptRulesTest.testFilterProjectTransposeShouldIgnoreNonDeterministicRelOptRulesTest.testJoinProjectTransposeShouldIgnoreNonDeterministicRelOptRulesTest.testSemiJoinProjectTransposeShouldIgnoreNonDeterministicSqlToRelConverterTest.testRandNotDuplicatedInProjectionMergeAllowed cases (transform still fires when the non-deterministic column is unrelated to the condition):
RelOptRulesTest.testFilterProjectTransposeWithUnrelatedNonDeterministicRelOptRulesTest.testJoinProjectTransposeWithUnrelatedNonDeterministicRelOptRulesTest.testSemiJoinProjectTransposeWithUnrelatedNonDeterministicConfirmed unaffected (verified by additional probing during investigation)
CalcMergeRule(RexProgram preserves CSE via local refs),ProjectReduceExpressionsRule,ProjectValuesMergeRule,ProjectFilterTransposeRule,ProjectJoinTransposeRule,ProjectCorrelateTransposeRule,AggregateProjectMergeRule,FilterSetOpTransposeRule(textual duplication only — each row flows through one branch of aUNION ALL, so per-row evaluation count is unchanged),ProjectToWindowRule.