test#63057
Draft
bobhan1 wants to merge 7 commits intoapache:masterfrom
Draft
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Refactor segment file-cache prefetching around a cache-block-aware cached remote reader. This removes the old SegmentPrefetcher and moves prefetch ownership to CacheBlockAwarePrefetchRemoteReader, which inherits CachedRemoteFileReader and expands higher-level file access ranges to all covered file cache blocks. The new reader registers one move-only RAII ReadPatternHandle per caller, so multiple column iterators sharing the same underlying segment file reader keep independent prefetch progress and are unregistered automatically when the owner is destroyed or reset.
The segment layer now builds file access ranges through SegmentFileAccessRangeBuilder. Rowids are converted through ordinal indexes only while building the access ranges; actual prefetch progress is triggered later by the current data page file offset. SegmentIterator initializes these patterns after segment open, index pruning, and iterator initialization, when the row bitmap is known and subsequent data page reads are monotonic in scan order. The implementation also handles data pages that cross file-cache-block boundaries or are larger than one cache block by prefetching every covered cache block instead of assuming page size is smaller than the cache block size.
A new FileReaderOptions::enable_cache_block_prefetch option controls whether FILE_BLOCK_CACHE readers are created as CacheBlockAwarePrefetchRemoteReader. Cloud segment data-file readers enable it when segment file-cache prefetch is enabled for query or compaction. Existing complex and variant column iterators forward cache-block prefetch setup to their nested file iterators.
### Release note
None
### Check List (For Author)
- Test:
- Unit Test: ./run-be-ut.sh --run --filter=CacheBlockAwarePrefetchRemoteReaderTest.*:BlockFileCacheTest.usage_example_registers_independent_column_patterns:BlockFileCacheTest.cache_block_aware_prefetch_remote_reader_prefetches_cache_blocks -j100
- Format Check: build-support/check-format.sh
- Static Check: git diff --check
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: CacheBlockAwarePrefetchRemoteReader previously supported multiple read patterns because segment column iterators shared the same underlying file reader and had to manually trigger prefetch. This refactors cache-block prefetch so each physical FileColumnIterator owns an independent reader when prefetch is enabled, the reader holds at most one read pattern, and read_at() automatically advances prefetch by file offset. The code comments also document how Segment, ColumnReaderCache, ColumnReader, and FileColumnIterator cooperate to keep the cache-aware reader iterator-local.
### Release note
None
### Check List (For Author)
- Test: Unit Test / Manual test
- Unit Test: ./run-be-ut.sh --run --filter=CacheBlockAwarePrefetchRemoteReaderTest.*:BlockFileCacheTest.usage_example_read_at_automatically_prefetches_single_pattern:BlockFileCacheTest.cache_block_aware_prefetch_remote_reader_prefetches_cache_blocks -j100
- Manual test: build-support/clang-format.sh; git diff --check
- Behavior changed: Yes. Cache-block prefetch is now iterator-local and is triggered automatically from read_at() when enabled.
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Refine the cache-block-aware prefetch reader after extracting it from segment prefetching. The previous reader state kept block planning and cursor progress tightly coupled, and tests did not fully cover component interaction paths. This commit splits immutable prefetch planning from mutable cursor state, keeps file access ranges as the trigger source so reads inside a page range still advance prefetch, renames the dry-run cache warming API to async_touch_local_cache, and expands unit tests for builder, plan, cursor, async cache touch, and reader integration behavior.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-be-ut.sh --run --filter=CacheBlockAwarePrefetchRemoteReaderTest.*:BlockFileCacheTest.usage_example_read_at_automatically_prefetches_single_pattern:BlockFileCacheTest.cached_remote_file_reader_async_touch_local_cache_downloads_range:BlockFileCacheTest.cache_block_aware_prefetch_remote_reader_prefetches_cache_blocks -j100
- build-support/clang-format.sh
- git diff --check
- Behavior changed: No (query results are unchanged; only file-cache prefetch scheduling internals are refactored)
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Add an explicit initial-window touch API for cache-block-aware prefetch readers. Segment predicate columns install patterns from row ranges that are guaranteed to be read, so SegmentIterator now touches their first prefetch window immediately after installing the pattern, while non-predicate and common-expression columns keep read_at-triggered prefetch because their exact rowids are batch-local after predicate evaluation.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- build-support/clang-format.sh
- git diff --check
- ./run-be-ut.sh --run --filter=CacheBlockAwarePrefetchRemoteReaderTest.*:BlockFileCacheTest.usage_example_read_at_automatically_prefetches_single_pattern:BlockFileCacheTest.cached_remote_file_reader_async_touch_local_cache_downloads_range:BlockFileCacheTest.cache_block_aware_prefetch_remote_reader* -j100
- Behavior changed: No (query results unchanged; only file-cache prefetch timing changes for predicate columns)
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Add BE unit test coverage for cache-block-aware segment prefetch with many sparse data pages, many cache blocks, large file offset spans, overlapping cache blocks, and large ranges that cross more cache blocks than the configured prefetch window.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- build-support/clang-format.sh
- git diff --check
- ./run-be-ut.sh --run --filter=CacheBlockAwarePrefetchRemoteReaderTest.*:BlockFileCacheTest.usage_example_read_at_automatically_prefetches_single_pattern:BlockFileCacheTest.cached_remote_file_reader_async_touch_local_cache_downloads_range:BlockFileCacheTest.cache_block_aware_prefetch_remote_reader* -j100
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve? Issue Number: N/A Related PR: apache#63056 Problem Summary: Update ReaderOwnedColumnIterator to forward the cache-block prefetch interfaces introduced by the segment prefetcher refactor, removing stale SegmentPrefetcher API usage left after rebasing onto master. ### Release note None ### Check List (For Author) - Test: Manual test - ./build.sh --be -j100 - Behavior changed: No - Does this need documentation: No
2b5c2a7 to
cb26d59
Compare
Contributor
Author
|
run buildall |
Contributor
Author
|
run cloud_p0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)