[CALCITE-7442] Getting Wrong index of Correlated variable inside Subquery after FilterJoinRule by yashlimbad · Pull Request #4840 · apache/calcite

yashlimbad · 2026-03-18T11:52:35Z

Jira Link

CALCITE-7442

Changes

Adjust offset of correlated variable inside subquery when pushing filter via FilterJoinRule.java

xiedeyantu · 2026-03-19T01:25:55Z

There is an error in the CI that needs to be resolved.

yashlimbad · 2026-03-19T03:04:57Z

Updated the Code.

Copilot

Pull request overview

Fixes a decorrelation/planning correctness issue (CALCITE-7442) where a correlated variable’s field index inside a subquery can become incorrect after FilterJoinRule pushes filters.

Changes:

Add a regression test that exercises FILTER_INTO_JOIN + FILTER_SUB_QUERY_TO_CORRELATE and then decorrelates, asserting expected plans.
Extend FilterJoinRule to propagate correlation-variable sets when creating pushed-down filters (and when keeping an above-join filter).
Enhance RelOptUtil.classifyFilters shifting logic so correlated field accesses inside RexSubQuery.rel can be adjusted during filter pushdown.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
core/src/test/java/org/apache/calcite/sql2rel/RelDecorrelatorTest.java	Adds a regression test covering correlated variable index behavior through filter pushdown + decorrelation.
core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java	Tracks correlation variables when constructing new Filters after pushdown.
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java	Extends filter-shifting to also adjust correlated field accesses inside subqueries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

core/src/test/java/org/apache/calcite/sql2rel/RelDecorrelatorTest.java

xiedeyantu · 2026-03-19T07:54:25Z

I'm not sure if these comments will be helpful, as I'm not very familiar with this specific area. If someone more knowledgeable doesn't review this PR within the next few days, I'll give it a try myself.

yashlimbad · 2026-03-19T09:03:41Z

some comments were looking helpful so I updated code now will check by running build if it goes through and push accordingly! 😄 it's fine if someone reviews code in few days, but I feel @julianhyde would be the best to review this because I see only his commit on RelOptUtil which I updated which is called from FilterJoinRule

yashlimbad · 2026-03-19T11:31:51Z

sometimes the tests passes sometimes fails randomly with below error in CI

Execution failed for task ':arrow:test'.
> Could not resolve all dependencies for configuration ':arrow:jacocoAgent'.
   > Could not load module metadata from /home/jenkins/.gradle/caches/modules-2/metadata-2.106/descriptors/org.jacoco/org.jacoco.agent/0.8.11/26c913274550a0b2221f47a0fe2d2358/descriptor.bin

even tho gradlew build is passed
it will be good if this CI run is consistent

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

yashlimbad · 2026-03-24T04:41:51Z

Hey @xiedeyantu ,
I think no one has reviewed the PR yet.
so I request you, can you please review this PR?

xiedeyantu · 2026-03-24T05:04:21Z

Sorry, could I get back to you in a few days? I'm currently on vacation and don't have access to a computer for debugging.

yashlimbad · 2026-03-24T05:42:54Z

sure, no problem!

caicancai · 2026-03-25T02:26:04Z

@yashlimbad fix correlated variable's index inside subquery Your PR headline doesn't seem to match the Jira headline.

yashlimbad · 2026-03-25T03:57:10Z

my bad. updated @caicancai !

caicancai

Just two simple comments

caicancai · 2026-03-25T13:10:21Z

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

     * @param adjustments     the amount to adjust each field by
+     * @param offset          the amount to shift field accesses by when
+     *                        rewriting correlated subqueries
+     * @param correlateVariableChild the child relation providing the


The code comment format here is strange.

the variable name is big here, that's why it's going into it's doc. any suggestions on formatting?

Indeed, I'll take a look at other Calcite code later to see if there are any good solutions.

Perhaps changing to a shorter, more concise variable name would solve the problem. 🤔

got it, will get back on this tomorrow

using a shorter variable name to make formatting of comments nice is not a good reason.

Thanks for the reminder, I'll look into whether there's a better way to handle this tomorrow.

caicancai · 2026-03-25T13:45:07Z

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

+    @Override public RexNode visitSubQuery(RexSubQuery subQuery) {
+      boolean[] update = {false};
+      List<RexNode> clonedOperands = visitList(subQuery.operands, update);
+      if (update[0]) {


I suspect there might be an issue with the EXISTS subquery, but I'm not entirely sure. Could you add a similar test?

EXISTS fails similar to IN clause, nice catch! thanks for this. will work on it

fixed and added test

Besides update[0], is there a better representation? Could you explain why it must be update[0]?

caicancai · 2026-03-25T13:45:47Z

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

+        subQuery = subQuery.clone(subQuery.getType(), clonedOperands);
+        final Set<CorrelationId> variablesSet = RelOptUtil.getVariablesUsed(subQuery.rel);
+        if (!variablesSet.isEmpty() && correlateVariableChild != null) {
+          CorrelationId id = Iterables.getOnlyElement(variablesSet);


Are you assuming there's only one correlation variable?

my bad, updating whole variable set now

caicancai · 2026-03-26T13:48:18Z

I have some questions that I might need to confirm with debugging. I will try my best to complete the review this week.

yashlimbad · 2026-03-26T13:56:52Z

Great! thank you @caicancai

xiedeyantu

LGTM! Please wait for @caicancai to finish the review. I have no further suggestions.

caicancai

overall LGTM

caicancai · 2026-03-29T12:54:00Z

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

+
+    @Override public RexNode visitSubQuery(RexSubQuery subQuery) {
+      boolean[] update = {false};
+      List<RexNode> clonedOperands = visitList(subQuery.operands, update);


boolean[] update = {false}; Is it necessary?

yes,

visitList( List<? extends RexNode> exprs, boolean @Nullable [] update

accepts list of boolean and updates 0th index to true if updated.
pattern is same everywhere

caicancai · 2026-03-29T12:55:24Z

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

+    @Override public RexNode visitSubQuery(RexSubQuery subQuery) {
+      boolean[] update = {false};
+      List<RexNode> clonedOperands = visitList(subQuery.operands, update);
+      if (update[0]) {


Besides update[0], is there a better representation? Could you explain why it must be update[0]?

silundong

Pushing down correlated subqueries may carry risks; sorry I didn't provide concrete cases. The logic here is indeed complex. Perhaps follow the logic for handling the top-level Filter in FilterJoinRule—i.e., not pushing down correlated subqueries—would be a good option.

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java

core/src/test/java/org/apache/calcite/sql2rel/RelDecorrelatorTest.java

yashlimbad · 2026-03-30T04:41:48Z

Pushing down correlated subqueries may carry risks; sorry I didn't provide concrete cases. The logic here is indeed complex. Perhaps follow the logic for handling the top-level Filter in FilterJoinRule—i.e., not pushing down correlated subqueries—would be a good option.

the subquery is inside Join condition, not inside filter on top of join. and decorrelating join's condition is harder and can complicate resulting tree sometime. on the other hand decorrelating filter is logically better

silundong · 2026-03-30T07:44:46Z

decorrelating join's condition is harder and can complicate resulting tree sometime. on the other hand decorrelating filter is logically better

Making FilterJoinRule handle correlated subqueries in JOIN ON condition would make the logic significantly more complex. In addition, the current subquery removal rule already provide a optimization, and their effect is essentially the same as pushing down correlated subqueries in ON condition. Considering the issues mentioned above, I would prefer to prohibit pushing down correlated subqueries.
That said, if the issues above can all be addressed properly, that would also be good. But I suspect this would require more rounds of careful review.

yashlimbad · 2026-03-30T11:09:26Z

decorrelating join's condition is harder and can complicate resulting tree sometime. on the other hand decorrelating filter is logically better

Making FilterJoinRule handle correlated subqueries in JOIN ON condition would make the logic significantly more complex. In addition, the current subquery removal rule already provide a optimization, and their effect is essentially the same as pushing down correlated subqueries in ON condition. Considering the issues mentioned above, I would prefer to prohibit pushing down correlated subqueries. That said, if the issues above can all be addressed properly, that would also be good. But I suspect this would require more rounds of careful review.

I agree that subquery remove rule will do the same, but what if someone invokes FilterJoinRule but not SubqueryRemoveRule? the index would remain broken at that point. SubqueryRemoveRule handles it, it doesn't mean we shouldn't fix pushed filter's broken index. And still SubqueryRemoveRule will not get correct variable set because we failed to extract variable set from subquery. Give me sometime, I will brainstorm on your outer scope review and comeback.

asolimando · 2026-03-31T06:55:36Z

decorrelating join's condition is harder and can complicate resulting tree sometime. on the other hand decorrelating filter is logically better

Making FilterJoinRule handle correlated subqueries in JOIN ON condition would make the logic significantly more complex. In addition, the current subquery removal rule already provide a optimization, and their effect is essentially the same as pushing down correlated subqueries in ON condition. Considering the issues mentioned above, I would prefer to prohibit pushing down correlated subqueries. That said, if the issues above can all be addressed properly, that would also be good. But I suspect this would require more rounds of careful review.

I agree that subquery remove rule will do the same, but what if someone invokes FilterJoinRule but not SubqueryRemoveRule? the index would remain broken at that point. SubqueryRemoveRule handles it, it doesn't mean we shouldn't fix pushed filter's broken index. And still SubqueryRemoveRule will not get correct variable set because we failed to extract variable set from subquery. Give me sometime, I will brainstorm on your outer scope review and comeback.

Most of Calcite is based on the assumption that subquery removal is always a good first step, and further rules don't have to deal with subqueries.

The assumption looks reasonable to me, and I don't see a valid reason to do otherwise here.

More code and more complex support for rules makes them harder to maintain, evolve and understand, without a compelling reason we should strive to keep things simple IMO.

yashlimbad · 2026-03-31T07:42:39Z

decorrelating join's condition is harder and can complicate resulting tree sometime. on the other hand decorrelating filter is logically better

Making FilterJoinRule handle correlated subqueries in JOIN ON condition would make the logic significantly more complex. In addition, the current subquery removal rule already provide a optimization, and their effect is essentially the same as pushing down correlated subqueries in ON condition. Considering the issues mentioned above, I would prefer to prohibit pushing down correlated subqueries. That said, if the issues above can all be addressed properly, that would also be good. But I suspect this would require more rounds of careful review.

I agree that subquery remove rule will do the same, but what if someone invokes FilterJoinRule but not SubqueryRemoveRule? the index would remain broken at that point. SubqueryRemoveRule handles it, it doesn't mean we shouldn't fix pushed filter's broken index. And still SubqueryRemoveRule will not get correct variable set because we failed to extract variable set from subquery. Give me sometime, I will brainstorm on your outer scope review and comeback.

Most of Calcite is based on the assumption that subquery removal is always a good first step, and further rules don't have to deal with subqueries.

The assumption looks reasonable to me, and I don't see a valid reason to do otherwise here.

More code and more complex support for rules makes them harder to maintain, evolve and understand, without a compelling reason we should strive to keep things simple IMO.

@asolimando
Thanks, that makes sense from a maintenance/simplicity perspective.

I think the concern here is a bit narrower than "every rule should support subqueries". Once FilterJoinRule chooses to rewrite a filter that contains a correlated subquery, it is already operating on that structure. At that point, I think the rule should do one of two things:

preserve a valid expression after the rewrite, or
explicitly decline that transformation

My concern with the current behavior is that it rewrites the predicate but can leave correlated field references inconsistent, which means correctness depends on a later SubQueryRemoveRule pass to repair the intermediate state. In other words, the issue is not only whether SubQueryRemoveRule can eventually handle it, but whether FilterJoinRule should emit an invalid expression for a shape it has already transformed. The PR description itself is fixing exactly that kind of planning/decorrelation correctness issue around correlated variable field indices after filter pushdown.
I agree that adding broad subquery support to FilterJoinRule would be too much. So if the preference is to keep subquery handling centralized, I think a conservative alternative would be to skip pushdown when the predicate contains correlated subqueries, rather than allowing a rewrite that can leave broken correlated references.

So from my perspective the acceptable outcomes are:

fix the correlated-reference shifting/variable propagation during pushdown, or
prohibit this specific pushdown for correlated subqueries.
But relying on a later rule to repair an intermediate invalid state seems brittle.

If the preference is to keep subquery handling centralized, I'm happy to revise this toward the conservative bail-out approach and avoid pushing conjuncts that contain correlated references.

silundong · 2026-04-01T00:52:56Z

@yashlimbad Unless absolutely necessary, please avoid force-pushing during review, as it makes it difficult for reviewers to compare incremental diffs. You can push as many commits as you need and squash them before the final merge.

yashlimbad · 2026-04-01T03:35:48Z

oh, my bad @silundong, I will commit separately from next time. if you want last commit diff then here it is
https://github.com/apache/calcite/compare/4f4341c7a921a1e8c777d3e7c8172cfaee443eed..75f3b7a7fb0d7c2bfe7568087a3a1ff03a4980f8

silundong · 2026-04-02T06:29:25Z

No worries. I will complete the review over the next few days.

yashlimbad · 2026-04-03T05:14:55Z

Thank you! @silundong

silundong

I want to restate my concern that pushing down correlated subqueries may introduce risks and make the logic significantly more complex; therefore, avoiding such pushdowns is a good option.

core/src/main/java/org/apache/calcite/rel/core/Join.java

core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java

silundong · 2026-04-07T08:26:44Z

Thank you for your effort. I'm sorry that your implementation hasn't fully convinced me. I've outlined my concerns in detail in the review. If other reviewers accept it, I will respect their decision and have no further objections.

yashlimbad · 2026-04-13T09:54:07Z

@silundong I have addressed your concern mentioned above. we will not push a subquery if it's having a correlation variable. I have updated tests as well accordingly, please review now.

sonarqubecloud · 2026-04-13T10:13:48Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
78.9% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

silundong

It appears that making the appropriate changes to RelOptUtil will satisfy our needs; other files do not seem to require additional modifications.

silundong · 2026-04-14T02:59:12Z

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

+    // binder for the variable above it, so the reference would be stranded.
+    // Pushing such a predicate to the RIGHT input is still safe (the right
+    // subtree is the natural consumer of the binding established by this
+    // join), and it can of course stay on the join itself.


Based on past community discussions and the current implementation, the CorrelationId in variablesSet of Join represents the concatenation of the left and right rows; the CorrelationId in Correlate represents rows that are produced by the left side and consumed by the right side.

To avoid confusion, the comment here may need to be adjusted. The core summary of our previous discussion is: pushing down correlated subqueries carries risks and involves extremely complex logic, and therefore pushing down is prohibited.

silundong · 2026-04-14T06:42:58Z

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

    private final @Nullable Set<RelDataTypeField> extraFields;
+    /** Correlation ids that are considered "local" when analysing
+     * sub-queries: only these contribute bits via {@link #visitSubQuery}.
+     * If null, all correlation ids encountered inside a sub-query


If null, visitSubQuery should return immediately.

silundong · 2026-04-14T08:05:57Z

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

      final ImmutableBitSet inputBits = inputFinder.build();

+      // Disable left-push only for filters that reference a CorrelationId
+      // bound by this join.


Disable pushing down to both, not just to the left side.

silundong · 2026-04-14T08:10:41Z

core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java

+    // conjunct lives. The buckets are needed to (a) plumb variablesSet onto the
+    // newly-created child Filters and (b) recompute the join's own variablesSet
+    // without dropping ids that the surviving join condition still references.
+    final Set<CorrelationId> leftVariablesSet = new LinkedHashSet<>();


Maybe the changes in the RelOptUtil are sufficient to meet our needs; no other files need to be modified.

silundong · 2026-04-14T08:55:13Z

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

    }
+
+    @Override public Void visitSubQuery(RexSubQuery subQuery) {
+      final Set<CorrelationId> variablesSet = RelOptUtil.getVariablesUsed(subQuery.rel);


We only care about localCorrelationIds, so if localCorrelationIds is null, return immediately; otherwise, iterate over localCorrelationIds and call RelOptUtil.correlationColumns.
RelOptUtil.getVariablesUsed(subQuery.rel) is unnecessary.

silundong · 2026-04-14T09:14:37Z

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java

+   * reached through a {@link org.apache.calcite.rex.RexFieldAccess}) or
+   * transitively via the inner plan of a {@link RexSubQuery}.
+   */
+  private static boolean referencesAnyCorrelation(RexNode node,


The purpose of RexUtil.CorrelationFinder is similar to this. Would you consider extending it? That would reuse and strengthen the existing capability.

yashlimbad force-pushed the correlate_subquery_fix branch 4 times, most recently from 53f3132 to fb20a82 Compare March 18, 2026 16:08

yashlimbad force-pushed the correlate_subquery_fix branch from fb20a82 to e6ebfcc Compare March 19, 2026 02:58

xiedeyantu requested a review from Copilot March 19, 2026 07:10

Copilot AI reviewed Mar 19, 2026

View reviewed changes

yashlimbad force-pushed the correlate_subquery_fix branch 3 times, most recently from 2c9af44 to 815c57d Compare March 19, 2026 11:26

xiedeyantu added the request review request a review from committers/contributors label Mar 19, 2026

mihaibudiu reviewed Mar 19, 2026

View reviewed changes

core/src/main/java/org/apache/calcite/plan/RelOptUtil.java Outdated Show resolved Hide resolved

yashlimbad force-pushed the correlate_subquery_fix branch 2 times, most recently from e35441e to 55e4b1d Compare March 20, 2026 05:01

yashlimbad changed the title ~~[CALCITE-7442] fix correlated variable's index inside subquery~~ [CALCITE-7442] Getting Wrong index of Correlated variable inside Subquery after FilterJoinRule Mar 25, 2026

caicancai reviewed Mar 25, 2026

View reviewed changes

caicancai requested changes Mar 25, 2026

View reviewed changes

yashlimbad force-pushed the correlate_subquery_fix branch from 55e4b1d to 4f4341c Compare March 26, 2026 09:02

xiedeyantu approved these changes Mar 29, 2026

View reviewed changes

caicancai approved these changes Mar 29, 2026

View reviewed changes

silundong reviewed Mar 29, 2026

View reviewed changes

[CALCITE-7442] Correlated variable has wrong index inside subquery

75f3b7a

yashlimbad force-pushed the correlate_subquery_fix branch from 4f4341c to 75f3b7a Compare March 31, 2026 15:21

silundong reviewed Apr 6, 2026

View reviewed changes

core/src/main/java/org/apache/calcite/rel/core/Join.java Outdated Show resolved Hide resolved

core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java Show resolved Hide resolved

yashlimbad added 2 commits April 13, 2026 15:17

Make join.copy method abstract and fix join variableSet collection

55fad53

Push filters with subquery into join if it doesn't contain correlation

1a2e7c3

yashlimbad force-pushed the correlate_subquery_fix branch from 8ea6009 to 1a2e7c3 Compare April 13, 2026 09:48

silundong reviewed Apr 14, 2026

View reviewed changes

Conversation

yashlimbad commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Jira Link

Changes

Uh oh!

xiedeyantu commented Mar 19, 2026

Uh oh!

yashlimbad commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiedeyantu commented Mar 19, 2026

Uh oh!

yashlimbad commented Mar 19, 2026

Uh oh!

yashlimbad commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yashlimbad commented Mar 24, 2026

Uh oh!

xiedeyantu commented Mar 24, 2026

Uh oh!

yashlimbad commented Mar 24, 2026

Uh oh!

caicancai commented Mar 25, 2026

Uh oh!

yashlimbad commented Mar 25, 2026

Uh oh!

caicancai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

caicancai commented Mar 26, 2026

Uh oh!

yashlimbad commented Mar 26, 2026

Uh oh!

xiedeyantu left a comment

Choose a reason for hiding this comment

Uh oh!

caicancai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

yashlimbad commented Mar 18, 2026 •

edited

Loading

yashlimbad commented Mar 19, 2026 •

edited

Loading

yashlimbad commented Apr 1, 2026 •

edited

Loading