Skip to content

Support Expanding OR Conditions in INNER JOIN into Multiple Mutually Exclusive Branches #22369

@xiedeyantu

Description

@xiedeyantu

Is your feature request related to a problem or challenge?

Currently, DataFusion handles conjunctive join conditions well. For example, conditions like:

a.id = b.id AND a.v > 10

can already be optimized by extracting equijoin keys while preserving the remaining predicates as join filters.

However, for queries such as:

SELECT *
FROM t1
INNER JOIN t2
  ON t1.id = t2.id OR t1.age = t2.age

the presence of an OR condition prevents join key extraction. As a result, the optimizer typically falls back to a single inner join with a complex filter condition, making it difficult for later optimization stages and the execution engine to take advantage of equijoin-based optimizations. Therefore, this type of join will be executed using nested loop join.

This proposal introduces a new logical optimizer rule that rewrites:

t1 INNER JOIN t2 ON cond1 OR cond2 OR ... OR condN

into:

SELECT * FROM (
  (t1 INNER JOIN t2 ON cond1)
  UNION ALL
  (t1 INNER JOIN t2 ON cond2 AND NOT cond1)
  UNION ALL
  (t1 INNER JOIN t2 ON cond3 AND NOT cond1 AND NOT cond2)
  ...
)

Each branch becomes mutually exclusive, preserving the original semantics while exposing simpler join conditions that can be optimized independently.

Initial Scope

The first implementation only targets the following cases:

  • Only INNER JOIN
  • Only when the join condition contains a top-level splittable OR
  • Logical plan rewrite only; no execution-layer changes
  • No support for outer joins, semi/anti joins, or full joins
  • No join reordering across multi-level join trees; only rewrite the current join node

Rewrite Example

Input:

SELECT *
FROM t1
INNER JOIN t2
  ON t1.id = t2.id OR t1.age = t2.age

Rewritten as:

SELECT *
FROM (
  SELECT * FROM t1 INNER JOIN t2 ON t1.id = t2.id
  UNION ALL
  SELECT * FROM t1 INNER JOIN t2
    ON t1.age = t2.age AND NOT (t1.id = t2.id)
)

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions