Skip to content

[CALCITE-7442] Getting Wrong index of Correlated variable inside Subquery after FilterJoinRule#4840

Open
yashlimbad wants to merge 3 commits intoapache:mainfrom
yashlimbad:correlate_subquery_fix
Open

[CALCITE-7442] Getting Wrong index of Correlated variable inside Subquery after FilterJoinRule#4840
yashlimbad wants to merge 3 commits intoapache:mainfrom
yashlimbad:correlate_subquery_fix

Conversation

@yashlimbad
Copy link
Copy Markdown

@yashlimbad yashlimbad commented Mar 18, 2026

Jira Link

CALCITE-7442

Changes

Adjust offset of correlated variable inside subquery when pushing filter via FilterJoinRule.java

@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch 4 times, most recently from 53f3132 to fb20a82 Compare March 18, 2026 16:08
@xiedeyantu
Copy link
Copy Markdown
Member

There is an error in the CI that needs to be resolved.

@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch from fb20a82 to e6ebfcc Compare March 19, 2026 02:58
@yashlimbad
Copy link
Copy Markdown
Author

Updated the Code.

@xiedeyantu xiedeyantu requested a review from Copilot March 19, 2026 07:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a decorrelation/planning correctness issue (CALCITE-7442) where a correlated variable’s field index inside a subquery can become incorrect after FilterJoinRule pushes filters.

Changes:

  • Add a regression test that exercises FILTER_INTO_JOIN + FILTER_SUB_QUERY_TO_CORRELATE and then decorrelates, asserting expected plans.
  • Extend FilterJoinRule to propagate correlation-variable sets when creating pushed-down filters (and when keeping an above-join filter).
  • Enhance RelOptUtil.classifyFilters shifting logic so correlated field accesses inside RexSubQuery.rel can be adjusted during filter pushdown.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
core/src/test/java/org/apache/calcite/sql2rel/RelDecorrelatorTest.java Adds a regression test covering correlated variable index behavior through filter pushdown + decorrelation.
core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java Tracks correlation variables when constructing new Filters after pushdown.
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java Extends filter-shifting to also adjust correlated field accesses inside subqueries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@xiedeyantu
Copy link
Copy Markdown
Member

I'm not sure if these comments will be helpful, as I'm not very familiar with this specific area. If someone more knowledgeable doesn't review this PR within the next few days, I'll give it a try myself.

@yashlimbad
Copy link
Copy Markdown
Author

some comments were looking helpful so I updated code now will check by running build if it goes through and push accordingly! 😄 it's fine if someone reviews code in few days, but I feel @julianhyde would be the best to review this because I see only his commit on RelOptUtil which I updated which is called from FilterJoinRule

@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch 3 times, most recently from 2c9af44 to 815c57d Compare March 19, 2026 11:26
@yashlimbad
Copy link
Copy Markdown
Author

yashlimbad commented Mar 19, 2026

sometimes the tests passes sometimes fails randomly with below error in CI

Execution failed for task ':arrow:test'.
> Could not resolve all dependencies for configuration ':arrow:jacocoAgent'.
   > Could not load module metadata from /home/jenkins/.gradle/caches/modules-2/metadata-2.106/descriptors/org.jacoco/org.jacoco.agent/0.8.11/26c913274550a0b2221f47a0fe2d2358/descriptor.bin

even tho gradlew build is passed
it will be good if this CI run is consistent

@xiedeyantu xiedeyantu added the request review request a review from committers/contributors label Mar 19, 2026
@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch 2 times, most recently from e35441e to 55e4b1d Compare March 20, 2026 05:01
@yashlimbad
Copy link
Copy Markdown
Author

Hey @xiedeyantu ,
I think no one has reviewed the PR yet.
so I request you, can you please review this PR?

@xiedeyantu
Copy link
Copy Markdown
Member

Sorry, could I get back to you in a few days? I'm currently on vacation and don't have access to a computer for debugging.

@yashlimbad
Copy link
Copy Markdown
Author

sure, no problem!

@caicancai
Copy link
Copy Markdown
Member

@yashlimbad fix correlated variable's index inside subquery Your PR headline doesn't seem to match the Jira headline.

@yashlimbad yashlimbad changed the title [CALCITE-7442] fix correlated variable's index inside subquery [CALCITE-7442] Getting Wrong index of Correlated variable inside Subquery after FilterJoinRule Mar 25, 2026
@yashlimbad
Copy link
Copy Markdown
Author

my bad. updated @caicancai !

Copy link
Copy Markdown
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two simple comments

* @param adjustments the amount to adjust each field by
* @param offset the amount to shift field accesses by when
* rewriting correlated subqueries
* @param correlateVariableChild the child relation providing the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code comment format here is strange.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable name is big here, that's why it's going into it's doc. any suggestions on formatting?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I'll take a look at other Calcite code later to see if there are any good solutions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps changing to a shorter, more concise variable name would solve the problem. 🤔

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, will get back on this tomorrow

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a shorter variable name to make formatting of comments nice is not a good reason.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder, I'll look into whether there's a better way to handle this tomorrow.

@Override public RexNode visitSubQuery(RexSubQuery subQuery) {
boolean[] update = {false};
List<RexNode> clonedOperands = visitList(subQuery.operands, update);
if (update[0]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect there might be an issue with the EXISTS subquery, but I'm not entirely sure. Could you add a similar test?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXISTS fails similar to IN clause, nice catch! thanks for this. will work on it

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and added test

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides update[0], is there a better representation? Could you explain why it must be update[0]?

subQuery = subQuery.clone(subQuery.getType(), clonedOperands);
final Set<CorrelationId> variablesSet = RelOptUtil.getVariablesUsed(subQuery.rel);
if (!variablesSet.isEmpty() && correlateVariableChild != null) {
CorrelationId id = Iterables.getOnlyElement(variablesSet);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you assuming there's only one correlation variable?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, updating whole variable set now

@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch from 55e4b1d to 4f4341c Compare March 26, 2026 09:02
@caicancai
Copy link
Copy Markdown
Member

I have some questions that I might need to confirm with debugging. I will try my best to complete the review this week.

@yashlimbad
Copy link
Copy Markdown
Author

Great! thank you @caicancai

Copy link
Copy Markdown
Member

@xiedeyantu xiedeyantu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please wait for @caicancai to finish the review. I have no further suggestions.

Copy link
Copy Markdown
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM


@Override public RexNode visitSubQuery(RexSubQuery subQuery) {
boolean[] update = {false};
List<RexNode> clonedOperands = visitList(subQuery.operands, update);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean[] update = {false}; Is it necessary?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,

visitList(
      List<? extends RexNode> exprs, boolean @Nullable [] update

accepts list of boolean and updates 0th index to true if updated.
pattern is same everywhere

@Override public RexNode visitSubQuery(RexSubQuery subQuery) {
boolean[] update = {false};
List<RexNode> clonedOperands = visitList(subQuery.operands, update);
if (update[0]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides update[0], is there a better representation? Could you explain why it must be update[0]?

Copy link
Copy Markdown
Contributor

@silundong silundong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushing down correlated subqueries may carry risks; sorry I didn't provide concrete cases. The logic here is indeed complex. Perhaps follow the logic for handling the top-level Filter in FilterJoinRule—i.e., not pushing down correlated subqueries—would be a good option.

@yashlimbad
Copy link
Copy Markdown
Author

Pushing down correlated subqueries may carry risks; sorry I didn't provide concrete cases. The logic here is indeed complex. Perhaps follow the logic for handling the top-level Filter in FilterJoinRule—i.e., not pushing down correlated subqueries—would be a good option.

the subquery is inside Join condition, not inside filter on top of join. and decorrelating join's condition is harder and can complicate resulting tree sometime. on the other hand decorrelating filter is logically better

@silundong
Copy link
Copy Markdown
Contributor

decorrelating join's condition is harder and can complicate resulting tree sometime. on the other hand decorrelating filter is logically better

Making FilterJoinRule handle correlated subqueries in JOIN ON condition would make the logic significantly more complex. In addition, the current subquery removal rule already provide a optimization, and their effect is essentially the same as pushing down correlated subqueries in ON condition. Considering the issues mentioned above, I would prefer to prohibit pushing down correlated subqueries.
That said, if the issues above can all be addressed properly, that would also be good. But I suspect this would require more rounds of careful review.

@yashlimbad
Copy link
Copy Markdown
Author

decorrelating join's condition is harder and can complicate resulting tree sometime. on the other hand decorrelating filter is logically better

Making FilterJoinRule handle correlated subqueries in JOIN ON condition would make the logic significantly more complex. In addition, the current subquery removal rule already provide a optimization, and their effect is essentially the same as pushing down correlated subqueries in ON condition. Considering the issues mentioned above, I would prefer to prohibit pushing down correlated subqueries. That said, if the issues above can all be addressed properly, that would also be good. But I suspect this would require more rounds of careful review.

I agree that subquery remove rule will do the same, but what if someone invokes FilterJoinRule but not SubqueryRemoveRule? the index would remain broken at that point. SubqueryRemoveRule handles it, it doesn't mean we shouldn't fix pushed filter's broken index. And still SubqueryRemoveRule will not get correct variable set because we failed to extract variable set from subquery. Give me sometime, I will brainstorm on your outer scope review and comeback.

@asolimando
Copy link
Copy Markdown
Member

decorrelating join's condition is harder and can complicate resulting tree sometime. on the other hand decorrelating filter is logically better

Making FilterJoinRule handle correlated subqueries in JOIN ON condition would make the logic significantly more complex. In addition, the current subquery removal rule already provide a optimization, and their effect is essentially the same as pushing down correlated subqueries in ON condition. Considering the issues mentioned above, I would prefer to prohibit pushing down correlated subqueries. That said, if the issues above can all be addressed properly, that would also be good. But I suspect this would require more rounds of careful review.

I agree that subquery remove rule will do the same, but what if someone invokes FilterJoinRule but not SubqueryRemoveRule? the index would remain broken at that point. SubqueryRemoveRule handles it, it doesn't mean we shouldn't fix pushed filter's broken index. And still SubqueryRemoveRule will not get correct variable set because we failed to extract variable set from subquery. Give me sometime, I will brainstorm on your outer scope review and comeback.

Most of Calcite is based on the assumption that subquery removal is always a good first step, and further rules don't have to deal with subqueries.

The assumption looks reasonable to me, and I don't see a valid reason to do otherwise here.

More code and more complex support for rules makes them harder to maintain, evolve and understand, without a compelling reason we should strive to keep things simple IMO.

@yashlimbad
Copy link
Copy Markdown
Author

decorrelating join's condition is harder and can complicate resulting tree sometime. on the other hand decorrelating filter is logically better

Making FilterJoinRule handle correlated subqueries in JOIN ON condition would make the logic significantly more complex. In addition, the current subquery removal rule already provide a optimization, and their effect is essentially the same as pushing down correlated subqueries in ON condition. Considering the issues mentioned above, I would prefer to prohibit pushing down correlated subqueries. That said, if the issues above can all be addressed properly, that would also be good. But I suspect this would require more rounds of careful review.

I agree that subquery remove rule will do the same, but what if someone invokes FilterJoinRule but not SubqueryRemoveRule? the index would remain broken at that point. SubqueryRemoveRule handles it, it doesn't mean we shouldn't fix pushed filter's broken index. And still SubqueryRemoveRule will not get correct variable set because we failed to extract variable set from subquery. Give me sometime, I will brainstorm on your outer scope review and comeback.

Most of Calcite is based on the assumption that subquery removal is always a good first step, and further rules don't have to deal with subqueries.

The assumption looks reasonable to me, and I don't see a valid reason to do otherwise here.

More code and more complex support for rules makes them harder to maintain, evolve and understand, without a compelling reason we should strive to keep things simple IMO.

@asolimando
Thanks, that makes sense from a maintenance/simplicity perspective.

I think the concern here is a bit narrower than "every rule should support subqueries". Once FilterJoinRule chooses to rewrite a filter that contains a correlated subquery, it is already operating on that structure. At that point, I think the rule should do one of two things:

  1. preserve a valid expression after the rewrite, or
  2. explicitly decline that transformation

My concern with the current behavior is that it rewrites the predicate but can leave correlated field references inconsistent, which means correctness depends on a later SubQueryRemoveRule pass to repair the intermediate state. In other words, the issue is not only whether SubQueryRemoveRule can eventually handle it, but whether FilterJoinRule should emit an invalid expression for a shape it has already transformed. The PR description itself is fixing exactly that kind of planning/decorrelation correctness issue around correlated variable field indices after filter pushdown.
I agree that adding broad subquery support to FilterJoinRule would be too much. So if the preference is to keep subquery handling centralized, I think a conservative alternative would be to skip pushdown when the predicate contains correlated subqueries, rather than allowing a rewrite that can leave broken correlated references.

So from my perspective the acceptable outcomes are:

  • fix the correlated-reference shifting/variable propagation during pushdown, or
  • prohibit this specific pushdown for correlated subqueries.
    But relying on a later rule to repair an intermediate invalid state seems brittle.

If the preference is to keep subquery handling centralized, I'm happy to revise this toward the conservative bail-out approach and avoid pushing conjuncts that contain correlated references.

@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch from 4f4341c to 75f3b7a Compare March 31, 2026 15:21
@silundong
Copy link
Copy Markdown
Contributor

@yashlimbad Unless absolutely necessary, please avoid force-pushing during review, as it makes it difficult for reviewers to compare incremental diffs. You can push as many commits as you need and squash them before the final merge.

@yashlimbad
Copy link
Copy Markdown
Author

yashlimbad commented Apr 1, 2026

oh, my bad @silundong, I will commit separately from next time. if you want last commit diff then here it is
https://github.com/apache/calcite/compare/4f4341c7a921a1e8c777d3e7c8172cfaee443eed..75f3b7a7fb0d7c2bfe7568087a3a1ff03a4980f8

@silundong
Copy link
Copy Markdown
Contributor

No worries. I will complete the review over the next few days.

@yashlimbad
Copy link
Copy Markdown
Author

Thank you! @silundong

Copy link
Copy Markdown
Contributor

@silundong silundong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to restate my concern that pushing down correlated subqueries may introduce risks and make the logic significantly more complex; therefore, avoiding such pushdowns is a good option.

@silundong
Copy link
Copy Markdown
Contributor

Thank you for your effort. I'm sorry that your implementation hasn't fully convinced me. I've outlined my concerns in detail in the review. If other reviewers accept it, I will respect their decision and have no further objections.

@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch from 8ea6009 to 1a2e7c3 Compare April 13, 2026 09:48
@yashlimbad
Copy link
Copy Markdown
Author

@silundong I have addressed your concern mentioned above. we will not push a subquery if it's having a correlation variable. I have updated tests as well accordingly, please review now.

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@silundong silundong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that making the appropriate changes to RelOptUtil will satisfy our needs; other files do not seem to require additional modifications.

// binder for the variable above it, so the reference would be stranded.
// Pushing such a predicate to the RIGHT input is still safe (the right
// subtree is the natural consumer of the binding established by this
// join), and it can of course stay on the join itself.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on past community discussions and the current implementation, the CorrelationId in variablesSet of Join represents the concatenation of the left and right rows; the CorrelationId in Correlate represents rows that are produced by the left side and consumed by the right side.

To avoid confusion, the comment here may need to be adjusted. The core summary of our previous discussion is: pushing down correlated subqueries carries risks and involves extremely complex logic, and therefore pushing down is prohibited.

private final @Nullable Set<RelDataTypeField> extraFields;
/** Correlation ids that are considered "local" when analysing
* sub-queries: only these contribute bits via {@link #visitSubQuery}.
* If null, all correlation ids encountered inside a sub-query
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If null, visitSubQuery should return immediately.

final ImmutableBitSet inputBits = inputFinder.build();

// Disable left-push only for filters that reference a CorrelationId
// bound by this join.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disable pushing down to both, not just to the left side.

// conjunct lives. The buckets are needed to (a) plumb variablesSet onto the
// newly-created child Filters and (b) recompute the join's own variablesSet
// without dropping ids that the surviving join condition still references.
final Set<CorrelationId> leftVariablesSet = new LinkedHashSet<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the changes in the RelOptUtil are sufficient to meet our needs; no other files need to be modified.

}

@Override public Void visitSubQuery(RexSubQuery subQuery) {
final Set<CorrelationId> variablesSet = RelOptUtil.getVariablesUsed(subQuery.rel);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only care about localCorrelationIds, so if localCorrelationIds is null, return immediately; otherwise, iterate over localCorrelationIds and call RelOptUtil.correlationColumns.
RelOptUtil.getVariablesUsed(subQuery.rel) is unnecessary.

* reached through a {@link org.apache.calcite.rex.RexFieldAccess}) or
* transitively via the inner plan of a {@link RexSubQuery}.
*/
private static boolean referencesAnyCorrelation(RexNode node,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of RexUtil.CorrelationFinder is similar to this. Would you consider extending it? That would reuse and strengthen the existing capability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

request review request a review from committers/contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants