[Cherry-Pick][KVCache] Support request-level prefix cache disable(#7854)#7855
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7855 +/- ##
==============================================
Coverage ? 72.95%
==============================================
Files ? 381
Lines ? 54231
Branches ? 8476
==============================================
Hits ? 39562
Misses ? 11891
Partials ? 2778
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-19 19:03:05
📋 Review 摘要
PR 概述:新增请求级 prefix caching 禁用开关,允许单个请求跳过全局 prefix cache 的匹配、写入和释放路径。
变更范围:engine/request.py、engine/resource_manager.py、engine/sched/resource_manager_v1.py、entrypoints/openai/protocol.py、文档
影响面 Tag:[KVCache] [APIServer] [Engine]
问题
未发现阻塞性问题。
📝 PR 规范检查
标题格式 [Cherry-Pick][KVCache] ... (#7854) 符合 Cherry-Pick 规范;描述包含 Motivation / Modifications / Usage or Command / Accuracy Tests / Checklist 全部段落,结构合规。✓
总体评价
实现思路清晰,_enable_prefix_cache_for_request 辅助方法统一替换散落的 enable_prefix_cache 直接判断,代码改动具有一致性。序列化链路(protocol.py → request.py to_dict/from_dict/from_generic_request)完整;单测覆盖了默认值、传播、bypass 等关键路径。整体可合入。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览所有 required 任务均已通过 ✅,PR 可合并。有 2 个可选任务失败(不阻塞合并)。
2 任务状态汇总2.1 Required任务 : 10/10 通过
2.2 可选任务 — 23/26 通过
3 失败详情(仅 required)无 |
Motivation
将 develop PR #7854 同步到 release/2.6,支持请求级禁用 prefix caching。部分请求需要跳过 prefix cache 的匹配、写入和释放复用路径,以避免污染或复用全局缓存;默认值保持 False,继续遵循全局 prefix caching 配置。
Modifications
disable_prefix_caching参数,并贯通到内部Request序列化/反序列化。Usage or Command
# 单测 /root/paddlejob/inference-public/chengyanfu/.venv/py310/bin/python -m pytest \ tests/engine/test_request.py \ tests/engine/test_resource_manager.py \ tests/v1/test_resource_manager_v1.py -q请求示例:
Accuracy Tests
不涉及模型计算逻辑或算子变更,未执行精度测试。
Checklist
pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.