Skip to content

[core] Introduce BucketSelector based on partition values to achieve bucket level predicate push down#7486

Open
JingsongLi wants to merge 1 commit intoapache:masterfrom
JingsongLi:BucketSelector_by_partition
Open

[core] Introduce BucketSelector based on partition values to achieve bucket level predicate push down#7486
JingsongLi wants to merge 1 commit intoapache:masterfrom
JingsongLi:BucketSelector_by_partition

Conversation

@JingsongLi
Copy link
Contributor

@JingsongLi JingsongLi commented Mar 19, 2026

Purpose

Introducing BucketSelector based on partition values to achieve bucket level predicate push down optimization.

Case 1: bucket filtering with compound predicates on a single-field bucket key.

Table schema:

  • Partition key: column 'a' (INT)
  • Bucket key: column 'b' (INT)
  • Bucket count: 10

Data distribution: 5 partitions (a=1 to 5) × 20 b-values (b=1 to 20) = 100 rows.

Scenarios:

  • Predicate: (a < 3 AND b = 5) OR (a = 3 AND b = 7) - Tests partition range filter with bucket equality, combined with OR. Expected: buckets for partition 1,2 with b=5 and partition 3 with b=7.
  • Predicate: (a < 3 AND b = 5) OR (a = 3 AND b < 100) - Tests partition range with bucket equality, OR partition equality with bucket range. Expected: mixed buckets from partition 3 and specific buckets from partitions 1,2.
  • Predicate: (a = 2 AND b = 5) OR (a = 3 AND b = 7) - Tests partition equality with bucket equality in both OR branches. Expected: exact bucket matching for each partition-b combination.

Case2: bucket filtering with compound predicates on a composite (multi-field) bucket key.

Table schema:

  • Partition key: column 'a' (INT)
  • Bucket key: columns 'b' and 'c' (composite, INT)
  • Bucket count: 10

Data distribution: 5 partitions (a=1 to 5) × 20 b-values (b=1 to 20) × 10 c-values (c=0 to 9) = 1000 rows.

Test scenarios:

  • Predicate: ((a < 3 AND b = 5) OR (a = 3 AND b = 7)) AND c = 5 - Tests nested OR within AND, with partition range, bucket field equality, and additional bucket field filter. The 'c = 5' condition is part of the composite bucket key, affecting bucket selection.
  • Predicate: ((a < 3 AND b = 5) OR (a = 3 AND b < 100)) AND c = 5 - Tests range predicate on one bucket field (b) combined with equality on another (c). Validates handling of multiple bucket key fields with different predicate types.
  • Predicate: ((a = 2 AND b = 5) OR (a = 3 AND b = 7)) AND c = 5 - Tests exact matching on both partition and bucket fields. The composite bucket key (b,c) ensures precise bucket targeting.

Tests

API and Format

Documentation

Generative AI tooling

@JingsongLi JingsongLi changed the title [core] Introduce BucketSelector by partition value [core] Introduce BucketSelector based on partition values to achieve bucket level predicate push down Mar 19, 2026
@JingsongLi JingsongLi force-pushed the BucketSelector_by_partition branch from 686a764 to 858c506 Compare March 20, 2026 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant