Skip to content

test#63057

Draft
bobhan1 wants to merge 7 commits intoapache:masterfrom
bobhan1:test-refactor-segment-prefetcher
Draft

test#63057
bobhan1 wants to merge 7 commits intoapache:masterfrom
bobhan1:test-refactor-segment-prefetcher

Conversation

@bobhan1
Copy link
Copy Markdown
Contributor

@bobhan1 bobhan1 commented May 7, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 7, 2026

run buildall

bobhan1 added 7 commits May 8, 2026 11:18
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Refactor segment file-cache prefetching around a cache-block-aware cached remote reader. This removes the old SegmentPrefetcher and moves prefetch ownership to CacheBlockAwarePrefetchRemoteReader, which inherits CachedRemoteFileReader and expands higher-level file access ranges to all covered file cache blocks. The new reader registers one move-only RAII ReadPatternHandle per caller, so multiple column iterators sharing the same underlying segment file reader keep independent prefetch progress and are unregistered automatically when the owner is destroyed or reset.

The segment layer now builds file access ranges through SegmentFileAccessRangeBuilder. Rowids are converted through ordinal indexes only while building the access ranges; actual prefetch progress is triggered later by the current data page file offset. SegmentIterator initializes these patterns after segment open, index pruning, and iterator initialization, when the row bitmap is known and subsequent data page reads are monotonic in scan order. The implementation also handles data pages that cross file-cache-block boundaries or are larger than one cache block by prefetching every covered cache block instead of assuming page size is smaller than the cache block size.

A new FileReaderOptions::enable_cache_block_prefetch option controls whether FILE_BLOCK_CACHE readers are created as CacheBlockAwarePrefetchRemoteReader. Cloud segment data-file readers enable it when segment file-cache prefetch is enabled for query or compaction. Existing complex and variant column iterators forward cache-block prefetch setup to their nested file iterators.

### Release note

None

### Check List (For Author)

- Test:

    - Unit Test: ./run-be-ut.sh --run --filter=CacheBlockAwarePrefetchRemoteReaderTest.*:BlockFileCacheTest.usage_example_registers_independent_column_patterns:BlockFileCacheTest.cache_block_aware_prefetch_remote_reader_prefetches_cache_blocks -j100

    - Format Check: build-support/check-format.sh

    - Static Check: git diff --check

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: CacheBlockAwarePrefetchRemoteReader previously supported multiple read patterns because segment column iterators shared the same underlying file reader and had to manually trigger prefetch. This refactors cache-block prefetch so each physical FileColumnIterator owns an independent reader when prefetch is enabled, the reader holds at most one read pattern, and read_at() automatically advances prefetch by file offset. The code comments also document how Segment, ColumnReaderCache, ColumnReader, and FileColumnIterator cooperate to keep the cache-aware reader iterator-local.

### Release note

None

### Check List (For Author)

- Test: Unit Test / Manual test
    - Unit Test: ./run-be-ut.sh --run --filter=CacheBlockAwarePrefetchRemoteReaderTest.*:BlockFileCacheTest.usage_example_read_at_automatically_prefetches_single_pattern:BlockFileCacheTest.cache_block_aware_prefetch_remote_reader_prefetches_cache_blocks -j100
    - Manual test: build-support/clang-format.sh; git diff --check
- Behavior changed: Yes. Cache-block prefetch is now iterator-local and is triggered automatically from read_at() when enabled.
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Refine the cache-block-aware prefetch reader after extracting it from segment prefetching. The previous reader state kept block planning and cursor progress tightly coupled, and tests did not fully cover component interaction paths. This commit splits immutable prefetch planning from mutable cursor state, keeps file access ranges as the trigger source so reads inside a page range still advance prefetch, renames the dry-run cache warming API to async_touch_local_cache, and expands unit tests for builder, plan, cursor, async cache touch, and reader integration behavior.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./run-be-ut.sh --run --filter=CacheBlockAwarePrefetchRemoteReaderTest.*:BlockFileCacheTest.usage_example_read_at_automatically_prefetches_single_pattern:BlockFileCacheTest.cached_remote_file_reader_async_touch_local_cache_downloads_range:BlockFileCacheTest.cache_block_aware_prefetch_remote_reader_prefetches_cache_blocks -j100
    - build-support/clang-format.sh
    - git diff --check
- Behavior changed: No (query results are unchanged; only file-cache prefetch scheduling internals are refactored)
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Add an explicit initial-window touch API for cache-block-aware prefetch readers. Segment predicate columns install patterns from row ranges that are guaranteed to be read, so SegmentIterator now touches their first prefetch window immediately after installing the pattern, while non-predicate and common-expression columns keep read_at-triggered prefetch because their exact rowids are batch-local after predicate evaluation.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - build-support/clang-format.sh
    - git diff --check
    - ./run-be-ut.sh --run --filter=CacheBlockAwarePrefetchRemoteReaderTest.*:BlockFileCacheTest.usage_example_read_at_automatically_prefetches_single_pattern:BlockFileCacheTest.cached_remote_file_reader_async_touch_local_cache_downloads_range:BlockFileCacheTest.cache_block_aware_prefetch_remote_reader* -j100
- Behavior changed: No (query results unchanged; only file-cache prefetch timing changes for predicate columns)
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Add BE unit test coverage for cache-block-aware segment prefetch with many sparse data pages, many cache blocks, large file offset spans, overlapping cache blocks, and large ranges that cross more cache blocks than the configured prefetch window.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - build-support/clang-format.sh
    - git diff --check
    - ./run-be-ut.sh --run --filter=CacheBlockAwarePrefetchRemoteReaderTest.*:BlockFileCacheTest.usage_example_read_at_automatically_prefetches_single_pattern:BlockFileCacheTest.cached_remote_file_reader_async_touch_local_cache_downloads_range:BlockFileCacheTest.cache_block_aware_prefetch_remote_reader* -j100
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#63056

Problem Summary: Update ReaderOwnedColumnIterator to forward the cache-block prefetch interfaces introduced by the segment prefetcher refactor, removing stale SegmentPrefetcher API usage left after rebasing onto master.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - ./build.sh --be -j100
- Behavior changed: No
- Does this need documentation: No
@bobhan1 bobhan1 force-pushed the test-refactor-segment-prefetcher branch from 2b5c2a7 to cb26d59 Compare May 8, 2026 03:19
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 8, 2026

run buildall

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 8, 2026

run cloud_p0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants