Skip to content

branch-4.1: [feature](cache) support file cache admission control #59065#61547

Merged
yiguolei merged 1 commit intobranch-4.1from
auto-pick-59065-branch-4.1
Mar 21, 2026
Merged

branch-4.1: [feature](cache) support file cache admission control #59065#61547
yiguolei merged 1 commit intobranch-4.1from
auto-pick-59065-branch-4.1

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #59065

To fully understand the implementation of the PR, please refer to the
following link(It is a Chinese document):
https://www.notion.so/V3-1-2c31293e1081807ca476dd5c87efb28e

### 1. PR Function Overview

The core function of this PR is the implementation of **File Cache
Admission Control**.

* **Background**: Doris supports using File Cache to accelerate data
access for remote storage (e.g., HDFS, S3). However, in certain
scenarios (such as large-scale ETL jobs or heavy ad-hoc queries),
reading massive amounts of cold data can evict existing hot data from
the File Cache. This leads to "cache pollution" and a significant drop
in cache hit rates.
* **Goal**: Provide a mechanism to decide whether data scanned by a
specific query is allowed to enter the File Cache, based on dimensions
such as user identity and table information. If admission is denied, the
query will read the data directly from remote storage without populating
the cache.

### 2. Implementation Scheme Analysis

The implementation consists of the following key components:

#### 2.1. FE Side: Admission Decision

The primary logic is located in the `createScanRangeLocations` method of
`FileQueryScanNode.java`.

* **New Configuration**: Introduced the
`Config.enable_file_cache_admission_control` switch.
* **New Manager**: Introduced `FileCacheAdmissionManager` (Singleton) to
execute the specific admission judgment logic.
*   **Decision Flow**:
1. Before generating Scan Ranges, `FileQueryScanNode` retrieves the
current User Identity (`userIdentity`), Catalog, Database, and Table
information.
2. It calls `FileCacheAdmissionManager.getInstance().isAllowed(...)` to
obtain a boolean result `fileCacheAdmission`.
    3.  It logs the decision result and the time cost in query plan.
```
|   0:VHIVE_SCAN_NODE(74)                                                                                       |
|      table: test_file_cache_features.tpch1_parquet.lineitem                                                   |
|      inputSplitNum=10, totalFileSize=205792918, scanRanges=10                                                 |
|      partition=1/1                                                                                            |
|      cardinality=1469949, numNodes=1                                                                          |
|      pushdown agg=NONE                                                                                        |
|      file cache request ADMITTED: user_identity:root@%, reason:user table-level whitelist rule, cost:37996 ns |
|      limit: 1 
```

#### 2.2. FE Side: Decision Propagation

The decision result needs to be propagated from the `FileQueryScanNode`
down to the underlying split assignment logic.

*   **SplitAssignment Modification**:
* The constructor of the `SplitAssignment` class (located in
`org.apache.doris.datasource`) is modified to accept a new `boolean
fileCacheAdmission` parameter.

*   **SplitToScanRange Modification**:
* The `splitToScanRange` method (or its corresponding Lambda expression)
is updated to receive the `fileCacheAdmission` parameter.
* This method is responsible for setting this value into the Thrift
object.

#### 2.3. Communication Protocol: Thrift Definition Update

To pass the FE's decision to the BE, the Thrift definition (likely
`TFileRangeDesc` or `TFileScanRangeParams` in `PlanNodes.thrift`)
requires a new field.

* **Inferred Change**: A new field `optional bool file_cache_admission`,
is added to the `TFileRangeDesc` struct.

#### 2.4. BE Side: Enforcement

Although the analysis focuses on the FE, the complete loop requires
enforcement on the BE side:

* **FileReader**: The BE's `FileReader` (e.g., `HdfsFileReader` or
`S3FileReader`) checks the `file_cache_admission` flag in the incoming
`TFileRangeDesc` during initialization or reading.
*   **Cache Policy**:
* If `file_cache_admission` is **true** (default): It uses the standard
`FileCachePolicy`, where data not found in the cache is written to the
Block File Cache after reading.
* If `file_cache_admission` is **false**: It sets the `FileCachePolicy`
to `NO_CACHE`, skips the cache writing step, reading directly from
remote storage. This protects the existing cache from being polluted.

### 3. Summary

This PR introduces an **Admission Control Manager** during the FE query
planning phase and transparently passes this control signal through the
**Split Assignment** and **Scan Range Generation** phases. This
ultimately guides the BE side's file readers to **selectively** use the
file cache.


Co-authored-by: xuchenhao <419062425@qq.com>
Co-authored-by: xuchenhao <48084123+xuchenhao@users.noreply.github.com>
Co-authored-by: morningman <yunyou@selectdb.com>
@github-actions github-actions bot requested a review from yiguolei as a code owner March 20, 2026 04:13
@Thearas
Copy link
Contributor

Thearas commented Mar 20, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Mar 20, 2026
@Thearas
Copy link
Contributor

Thearas commented Mar 20, 2026

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 23.08% (3/13) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.63% (19446/36948)
Line Coverage 36.06% (181354/502891)
Region Coverage 32.57% (140490/431354)
Branch Coverage 33.49% (61213/182793)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (13/13) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.99% (25674/36168)
Line Coverage 53.87% (270023/501250)
Region Coverage 51.42% (223895/435394)
Branch Coverage 52.70% (96627/183361)

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.54% (1786/2274)
Line Coverage 64.34% (31938/49641)
Region Coverage 65.16% (15986/24532)
Branch Coverage 55.72% (8501/15258)

@yiguolei yiguolei merged commit 3806e40 into branch-4.1 Mar 21, 2026
26 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants