[Cherry-Pick][Feature][Log]console metrics log for pd disaggregation #7843#7845
[Cherry-Pick][Feature][Log]console metrics log for pd disaggregation #7843#7845CSWYF3634076 wants to merge 1 commit into
Conversation
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-18 20:42:55
📋 Review 摘要
PR 概述:为 PD 分离场景(Splitwise)中的 Decode 节点添加独立的 console 指标日志,解决 Decode 节点误用 Prefill 日志格式的问题。
变更范围:fastdeploy/engine/(common_engine.py、sched/resource_manager_v1.py、sched/scheduler_metrics_logger.py)及对应测试
影响面 Tag:[Engine] [Scheduler] [PD Disaggregation]
问题
未发现阻塞性问题。
📝 PR 规范检查
标题包含非官方 Tag [Log],且在 [Cherry-Pick] 之后使用了两个 Tag([Feature] 和 [Log]),不符合 Cherry-Pick 格式规范(仅允许一个官方 Tag)。
标题建议(可直接复制):
[Cherry-Pick][PD Disaggregation] add console metrics log for pd disaggregation (#7843)
PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):
## Motivation
Cherry-pick from https://github.com/PaddlePaddle/FastDeploy/pull/7843
Fix the issue where node D prints prefill logs in the PD disaggregation
## Modifications
- `scheduler_metrics_logger.py`:将 `log_prefill_batch` 重构为私有方法 `_log_prefill_like_batch`,新增公开方法 `log_prefill_batch`(包装器)和 `log_decode_bootstrap_batch`(Decode 节点 bootstrap 日志);日志消息中增加 `splitwise_role` 字段;`SchedulerMetricsLogger.__init__` 新增 `splitwise_role` 参数,默认值 `"mixed"`。
- `resource_manager_v1.py`:在 `_log_console_scheduler_metrics` 中,当 `splitwise_role == "decode"` 时调用 `log_decode_bootstrap_batch`,否则调用原有 `log_prefill_batch`,避免 Decode 节点的日志误标为 "Prefill batch"。
- `common_engine.py`:构造 `SchedulerMetricsLogger` 时传入 `splitwise_role`。
- 补充单测:新增 `test_log_decode_bootstrap_batch_logs_expected_message`、`test_decode_role_prefill_task_logs_decode_bootstrap_batch`、`test_default_splitwise_role_is_mixed`。
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
代码逻辑清晰,重构合理,测试覆盖全面,Cherry-pick 来源明确。仅标题含非官方 Tag [Log],建议修正后合入。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览❌ 存在 1 个 Required 任务失败,阻塞合并,需优先处理。
2 任务状态汇总2.1 Required任务 : 8/9 通过
2.2 可选任务 — 23/26 通过
3 失败详情(仅 required)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 测试失败(置信度: 高)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage
失败用例:
根因详情: 关键日志: 修复建议:
修复建议摘要: 在测试文件 import 区添加 关联变更: 链接: 查看日志 |
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7845 +/- ##
==============================================
Coverage ? 72.46%
==============================================
Files ? 381
Lines ? 54162
Branches ? 8461
==============================================
Hits ? 39246
Misses ? 12157
Partials ? 2759
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览当前存在 2 个 Required 失败任务需要处理,请优先修复。
2 任务状态汇总2.1 Required任务 : 8/10 通过
2.2 可选任务 — 23/26 通过
3 失败详情(仅 required)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 测试失败(置信度: 高)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage
失败用例:
根因详情: 关键日志: 修复建议:
关联变更: 链接: 查看日志 Pre Commit — 代码规范(置信度: 高)Pre Commit
根因详情: 关键日志: 修复建议:
链接: 查看日志 |
Motivation
Cherry-pick from #7843
Fix the issue where node D prints prefill logs in the PD disaggregation
Modifications
scheduler_metrics_logger.py:将log_prefill_batch重构为私有方法_log_prefill_like_batch,新增公开方法log_prefill_batch(包装器)和log_decode_bootstrap_batch(Decode 节点 bootstrap 日志);日志消息中增加splitwise_role字段;SchedulerMetricsLogger.__init__新增splitwise_role参数,默认值"mixed"。resource_manager_v1.py:在_log_console_scheduler_metrics中,当splitwise_role == "decode"时调用log_decode_bootstrap_batch,否则调用原有log_prefill_batch,避免 Decode 节点的日志误标为 "Prefill batch"。common_engine.py:构造SchedulerMetricsLogger时传入splitwise_role。test_log_decode_bootstrap_batch_logs_expected_message、test_decode_role_prefill_task_logs_decode_bootstrap_batch、test_default_splitwise_role_is_mixed。Usage or Command
N/A
Accuracy Tests
N/A
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.