[KVCache] Add free_cpu_block_num gauge metric#7856
Conversation
Add free_cpu_block_num to track available CPU blocks in cache, complementing the existing free_gpu_block_num metric. This enables monitoring CPU block usage for swap/prefix caching scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for your contribution! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7856 +/- ##
==========================================
Coverage ? 63.33%
==========================================
Files ? 462
Lines ? 64378
Branches ? 9871
==========================================
Hits ? 40776
Misses ? 20828
Partials ? 2774
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-19 18:43:53
📋 Review 摘要
PR 概述:新增 free_cpu_block_num Gauge 指标,用于监控 swap/prefix caching 场景下 CPU block 使用情况。
变更范围:fastdeploy/metrics/metrics.py、fastdeploy/cache_manager/prefix_cache_manager.py、docs/
影响面 Tag:[KVCache] [Docs]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 📝 PR 规范 | — | Checklist 中 "Add unit tests" 未勾选且未在 PR 中说明原因 |
📝 PR 规范检查
PR 标题格式合规([KVCache] 为官方 Tag),描述模板各段完整。唯一问题:Checklist 中 Add unit tests 未勾选,且 PR body 中没有说明不添加单测的原因(模板要求 "Please write the reason in this PR if no unit tests")。
PR 描述建议(可直接复制,在 Accuracy Tests 下方 Checklist 中补充说明原因):
## Motivation
当前已有 `free_gpu_block_num` 指标用于监控 GPU 可用 block 数量,但缺少对应的 CPU 侧指标 `free_cpu_block_num`。在开启 swap/prefix caching 场景下,需要监控 CPU block 的使用情况以便及时发现瓶颈。
## Modifications
1. **fastdeploy/metrics/metrics.py**: 新增 `free_cpu_block_num` Gauge 指标定义和注册
2. **fastdeploy/cache_manager/prefix_cache_manager.py**: 在以下位置设置 `free_cpu_block_num`:
- `__init__` 初始化时设置为 `num_cpu_blocks`
- `update_cache_config` 初始化时设置为 `num_cpu_blocks`
- `allocate_cpu_blocks` 分配后更新
- `recycle_cpu_blocks` 回收后更新
- `free_cpu_block_ids` 淘汰后更新
- `reset` 方法中更新
3. **docs/online_serving/metrics.md**: 补充英文文档
4. **docs/zh/online_serving/metrics.md**: 补充中文文档
## Usage or Command
访问 `/metrics` 端点即可查看 `fastdeploy:free_cpu_block_num` 指标。
## Accuracy Tests
此 PR 不涉及模型输出变更,无需精度测试。
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
> 此 PR 仅新增 Prometheus Gauge 指标的注册与 set 调用,无业务逻辑变更,不涉及 block 分配/回收语义修改,故无需补充单测。
- [ ] Provide accuracy results.
> 此 PR 不涉及模型输出变更,无需精度测试。
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
代码变更逻辑正确,已覆盖 prefix_cache_manager.py 中所有修改 cpu_free_block_list 的位置(__init__、update_cache_config、allocate_cpu_blocks、recycle_cpu_blocks、free_cpu_block_ids、reset),与现有 free_gpu_block_num 保持对称风格。仅需在 Checklist 中补充不写单测的原因即可。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览✅ 所有 Required 任务均已通过,建议合并。有 1 个 Optional 任务失败(环境问题,不阻塞合并),1 个 Optional 任务等待中。
2 任务状态汇总2.1 Required 任务 : 10/10 通过
2.2 可选任务 — 29/31 通过
3 失败详情(仅 required)无 required 失败任务。 |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览所有 Required 任务尚未全部完成(1 个等待中),请等待结果。
2 任务状态汇总2.1 Required任务 : 9/10 通过
2.2 可选任务 — 29/31 通过
3 失败详情(仅 required)无 required 失败任务。 |
Motivation
当前已有
free_gpu_block_num指标用于监控 GPU 可用 block 数量,但缺少对应的 CPU 侧指标free_cpu_block_num。在开启 swap/prefix caching 场景下,需要监控 CPU block 的使用情况以便及时发现瓶颈。Modifications
free_cpu_block_numGauge 指标定义和注册free_cpu_block_num:__init__初始化时设置为num_cpu_blocks_setup_for_worker初始化时设置为num_cpu_blocksallocate_cpu_blocks分配后更新recycle_cpu_blocks回收后更新free_cpu_block_ids淘汰后更新reset方法中更新Usage or Command
访问
/metrics端点即可查看fastdeploy:free_cpu_block_num指标。Accuracy Tests
此 PR 不涉及模型输出变更,无需精度测试。
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.