[KVCache] Add free_cpu_block_num gauge metric by liyonghua0910 · Pull Request #7856 · PaddlePaddle/FastDeploy

liyonghua0910 · 2026-05-19T09:18:36Z

Motivation

当前已有 free_gpu_block_num 指标用于监控 GPU 可用 block 数量，但缺少对应的 CPU 侧指标 free_cpu_block_num。在开启 swap/prefix caching 场景下，需要监控 CPU block 的使用情况以便及时发现瓶颈。

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

fastdeploy/metrics/metrics.py: 新增 free_cpu_block_num Gauge 指标定义和注册
fastdeploy/cache_manager/prefix_cache_manager.py: 在以下位置设置 free_cpu_block_num：
- __init__ 初始化时设置为 num_cpu_blocks
- _setup_for_worker 初始化时设置为 num_cpu_blocks
- allocate_cpu_blocks 分配后更新
- recycle_cpu_blocks 回收后更新
- free_cpu_block_ids 淘汰后更新
- reset 方法中更新
docs/online_serving/metrics.md: 补充英文文档
docs/zh/online_serving/metrics.md: 补充中文文档

Usage or Command

访问 /metrics 端点即可查看 fastdeploy:free_cpu_block_num 指标。

Accuracy Tests

此 PR 不涉及模型输出变更，无需精度测试。

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Add free_cpu_block_num to track available CPU blocks in cache, complementing the existing free_gpu_block_num metric. This enables monitoring CPU block usage for swap/prefix caching scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddle-bot · 2026-05-19T09:18:44Z

Thanks for your contribution!

codecov-commenter · 2026-05-19T10:38:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@a8ffcaa). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7856   +/-   ##
==========================================
  Coverage           ?   63.33%           
==========================================
  Files              ?      462           
  Lines              ?    64378           
  Branches           ?     9871           
==========================================
  Hits               ?    40776           
  Misses             ?    20828           
  Partials           ?     2774

Flag	Coverage Δ
GPU	`72.44% <100.00%> (?)`
XPU	`7.12% <33.33%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-19 18:43:53

📋 Review 摘要

PR 概述：新增 free_cpu_block_num Gauge 指标，用于监控 swap/prefix caching 场景下 CPU block 使用情况。
变更范围：fastdeploy/metrics/metrics.py、fastdeploy/cache_manager/prefix_cache_manager.py、docs/
影响面 Tag：[KVCache] [Docs]

问题

级别	文件	概述
📝 PR 规范	—	`Checklist` 中 "Add unit tests" 未勾选且未在 PR 中说明原因

📝 PR 规范检查

PR 标题格式合规（[KVCache] 为官方 Tag），描述模板各段完整。唯一问题：Checklist 中 Add unit tests 未勾选，且 PR body 中没有说明不添加单测的原因（模板要求 "Please write the reason in this PR if no unit tests"）。

PR 描述建议（可直接复制，在 Accuracy Tests 下方 Checklist 中补充说明原因）：

## Motivation
当前已有 `free_gpu_block_num` 指标用于监控 GPU 可用 block 数量，但缺少对应的 CPU 侧指标 `free_cpu_block_num`。在开启 swap/prefix caching 场景下，需要监控 CPU block 的使用情况以便及时发现瓶颈。

## Modifications
1. **fastdeploy/metrics/metrics.py**: 新增 `free_cpu_block_num` Gauge 指标定义和注册
2. **fastdeploy/cache_manager/prefix_cache_manager.py**: 在以下位置设置 `free_cpu_block_num`：
   - `__init__` 初始化时设置为 `num_cpu_blocks`
   - `update_cache_config` 初始化时设置为 `num_cpu_blocks`
   - `allocate_cpu_blocks` 分配后更新
   - `recycle_cpu_blocks` 回收后更新
   - `free_cpu_block_ids` 淘汰后更新
   - `reset` 方法中更新
3. **docs/online_serving/metrics.md**: 补充英文文档
4. **docs/zh/online_serving/metrics.md**: 补充中文文档

## Usage or Command
访问 `/metrics` 端点即可查看 `fastdeploy:free_cpu_block_num` 指标。

## Accuracy Tests
此 PR 不涉及模型输出变更，无需精度测试。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
  > 此 PR 仅新增 Prometheus Gauge 指标的注册与 set 调用，无业务逻辑变更，不涉及 block 分配/回收语义修改，故无需补充单测。
- [ ] Provide accuracy results.
  > 此 PR 不涉及模型输出变更，无需精度测试。
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

代码变更逻辑正确，已覆盖 prefix_cache_manager.py 中所有修改 cpu_free_block_list 的位置（__init__、update_cache_config、allocate_cpu_blocks、recycle_cpu_blocks、free_cpu_block_ids、reset），与现有 free_gpu_block_num 保持对称风格。仅需在 Checklist 中补充不写单测的原因即可。

PaddlePaddle-bot · 2026-05-19T10:51:35Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-19 23:48:30

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 725feef
Merge base: a8ffcaa (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

✅ 所有 Required 任务均已通过，建议合并。有 1 个 Optional 任务失败（环境问题，不阻塞合并），1 个 Optional 任务等待中。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
59(18)	41	39	1	0	1	0

2 任务状态汇总

2.1 Required 任务 : 10/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
✅	其余 10 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 29/31 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	1m1s	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 29 个可选任务通过	-	-	-

3 失败详情（仅 required）

无 required 失败任务。

PaddlePaddle-bot · 2026-05-19T11:55:06Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-19 19:54:15

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 725feef
Merge base: bda1756 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

所有 Required 任务尚未全部完成（1 个等待中），请等待结果。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
59(18)	41	38	1	0	2	0

2 任务状态汇总

2.1 Required任务 : 9/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
⏸️	`Run Four Cards Tests / run_4_cards_tests`	-	等待中	-	-	-
✅	其余 9 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 29/31 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	1m1s	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 29 个可选任务通过	-	-	-

3 失败详情（仅 required）

无 required 失败任务。

liyonghua0910 temporarily deployed to Metax_ci May 19, 2026 09:18 — with GitHub Actions Inactive

PaddlePaddle-bot reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KVCache] Add free_cpu_block_num gauge metric#7856

[KVCache] Add free_cpu_block_num gauge metric#7856
liyonghua0910 wants to merge 1 commit into
PaddlePaddle:developfrom
liyonghua0910:develop+20260518_free_cpu_block_num

liyonghua0910 commented May 19, 2026

Uh oh!

paddle-bot Bot commented May 19, 2026

Uh oh!

codecov-commenter commented May 19, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot commented May 19, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

liyonghua0910 commented May 19, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 19, 2026

Uh oh!

codecov-commenter commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required 任务 : 10/10 通过

2.2 可选任务 — 29/31 通过

3 失败详情（仅 required）

Uh oh!

PaddlePaddle-bot commented May 19, 2026

1 任务总览

2 任务状态汇总

2.1 Required任务 : 9/10 通过

2.2 可选任务 — 29/31 通过

3 失败详情（仅 required）

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented May 19, 2026 •

edited

Loading

PaddlePaddle-bot commented May 19, 2026 •

edited

Loading