branch-4.1: [feature](cache) support file cache admission control #59065#61547
Merged
yiguolei merged 1 commit intobranch-4.1from Mar 21, 2026
Merged
branch-4.1: [feature](cache) support file cache admission control #59065#61547yiguolei merged 1 commit intobranch-4.1from
yiguolei merged 1 commit intobranch-4.1from
Conversation
To fully understand the implementation of the PR, please refer to the following link(It is a Chinese document): https://www.notion.so/V3-1-2c31293e1081807ca476dd5c87efb28e ### 1. PR Function Overview The core function of this PR is the implementation of **File Cache Admission Control**. * **Background**: Doris supports using File Cache to accelerate data access for remote storage (e.g., HDFS, S3). However, in certain scenarios (such as large-scale ETL jobs or heavy ad-hoc queries), reading massive amounts of cold data can evict existing hot data from the File Cache. This leads to "cache pollution" and a significant drop in cache hit rates. * **Goal**: Provide a mechanism to decide whether data scanned by a specific query is allowed to enter the File Cache, based on dimensions such as user identity and table information. If admission is denied, the query will read the data directly from remote storage without populating the cache. ### 2. Implementation Scheme Analysis The implementation consists of the following key components: #### 2.1. FE Side: Admission Decision The primary logic is located in the `createScanRangeLocations` method of `FileQueryScanNode.java`. * **New Configuration**: Introduced the `Config.enable_file_cache_admission_control` switch. * **New Manager**: Introduced `FileCacheAdmissionManager` (Singleton) to execute the specific admission judgment logic. * **Decision Flow**: 1. Before generating Scan Ranges, `FileQueryScanNode` retrieves the current User Identity (`userIdentity`), Catalog, Database, and Table information. 2. It calls `FileCacheAdmissionManager.getInstance().isAllowed(...)` to obtain a boolean result `fileCacheAdmission`. 3. It logs the decision result and the time cost in query plan. ``` | 0:VHIVE_SCAN_NODE(74) | | table: test_file_cache_features.tpch1_parquet.lineitem | | inputSplitNum=10, totalFileSize=205792918, scanRanges=10 | | partition=1/1 | | cardinality=1469949, numNodes=1 | | pushdown agg=NONE | | file cache request ADMITTED: user_identity:root@%, reason:user table-level whitelist rule, cost:37996 ns | | limit: 1 ``` #### 2.2. FE Side: Decision Propagation The decision result needs to be propagated from the `FileQueryScanNode` down to the underlying split assignment logic. * **SplitAssignment Modification**: * The constructor of the `SplitAssignment` class (located in `org.apache.doris.datasource`) is modified to accept a new `boolean fileCacheAdmission` parameter. * **SplitToScanRange Modification**: * The `splitToScanRange` method (or its corresponding Lambda expression) is updated to receive the `fileCacheAdmission` parameter. * This method is responsible for setting this value into the Thrift object. #### 2.3. Communication Protocol: Thrift Definition Update To pass the FE's decision to the BE, the Thrift definition (likely `TFileRangeDesc` or `TFileScanRangeParams` in `PlanNodes.thrift`) requires a new field. * **Inferred Change**: A new field `optional bool file_cache_admission`, is added to the `TFileRangeDesc` struct. #### 2.4. BE Side: Enforcement Although the analysis focuses on the FE, the complete loop requires enforcement on the BE side: * **FileReader**: The BE's `FileReader` (e.g., `HdfsFileReader` or `S3FileReader`) checks the `file_cache_admission` flag in the incoming `TFileRangeDesc` during initialization or reading. * **Cache Policy**: * If `file_cache_admission` is **true** (default): It uses the standard `FileCachePolicy`, where data not found in the cache is written to the Block File Cache after reading. * If `file_cache_admission` is **false**: It sets the `FileCachePolicy` to `NO_CACHE`, skips the cache writing step, reading directly from remote storage. This protects the existing cache from being polluted. ### 3. Summary This PR introduces an **Admission Control Manager** during the FE query planning phase and transparently passes this control signal through the **Split Assignment** and **Scan Range Generation** phases. This ultimately guides the BE side's file readers to **selectively** use the file cache. Co-authored-by: xuchenhao <419062425@qq.com> Co-authored-by: xuchenhao <48084123+xuchenhao@users.noreply.github.com> Co-authored-by: morningman <yunyou@selectdb.com>
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
yiguolei
approved these changes
Mar 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-picked from #59065