Skip to content

[Cherry-Pick][KVCache] Support request-level prefix cache disable(#7854)#7855

Open
kevincheng2 wants to merge 1 commit into
PaddlePaddle:release/2.6from
kevincheng2:cp/disable-prefix-caching-release-2.6-20260519-v2
Open

[Cherry-Pick][KVCache] Support request-level prefix cache disable(#7854)#7855
kevincheng2 wants to merge 1 commit into
PaddlePaddle:release/2.6from
kevincheng2:cp/disable-prefix-caching-release-2.6-20260519-v2

Conversation

@kevincheng2
Copy link
Copy Markdown
Collaborator

Motivation

将 develop PR #7854 同步到 release/2.6,支持请求级禁用 prefix caching。部分请求需要跳过 prefix cache 的匹配、写入和释放复用路径,以避免污染或复用全局缓存;默认值保持 False,继续遵循全局 prefix caching 配置。

Modifications

  • OpenAI Completion/ChatCompletion 请求新增 disable_prefix_caching 参数,并贯通到内部 Request 序列化/反序列化。
  • ResourceManager 和 ResourceManagerV1 在请求级禁用时跳过 prefix cache 匹配、cache block 更新、输出/存储写入和 prefix tree 释放路径。
  • 补充 legacy ResourceManager、ResourceManagerV1、Request/OpenAI protocol 单测。
  • 更新中英文 online serving 请求参数文档。

Usage or Command

# 单测
/root/paddlejob/inference-public/chengyanfu/.venv/py310/bin/python -m pytest \
  tests/engine/test_request.py \
  tests/engine/test_resource_manager.py \
  tests/v1/test_resource_manager_v1.py -q

请求示例:

client.chat.completions.create(
    model="null",
    messages=[{"role": "user", "content": "hello"}],
    extra_body={"disable_prefix_caching": True},
)

Accuracy Tests

不涉及模型计算逻辑或算子变更,未执行精度测试。

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 19, 2026

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.41935% with 7 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@41d44d6). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/engine/sched/resource_manager_v1.py 70.83% 0 Missing and 7 partials ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #7855   +/-   ##
==============================================
  Coverage               ?   72.95%           
==============================================
  Files                  ?      381           
  Lines                  ?    54231           
  Branches               ?     8476           
==============================================
  Hits                   ?    39562           
  Misses                 ?    11891           
  Partials               ?     2778           
Flag Coverage Δ
GPU 72.95% <77.41%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-19 19:03:05

📋 Review 摘要

PR 概述:新增请求级 prefix caching 禁用开关,允许单个请求跳过全局 prefix cache 的匹配、写入和释放路径。
变更范围engine/request.pyengine/resource_manager.pyengine/sched/resource_manager_v1.pyentrypoints/openai/protocol.py、文档
影响面 Tag[KVCache] [APIServer] [Engine]

问题

未发现阻塞性问题。

📝 PR 规范检查

标题格式 [Cherry-Pick][KVCache] ... (#7854) 符合 Cherry-Pick 规范;描述包含 Motivation / Modifications / Usage or Command / Accuracy Tests / Checklist 全部段落,结构合规。✓

总体评价

实现思路清晰,_enable_prefix_cache_for_request 辅助方法统一替换散落的 enable_prefix_cache 直接判断,代码改动具有一致性。序列化链路(protocol.py → request.py to_dict/from_dict/from_generic_request)完整;单测覆盖了默认值、传播、bypass 等关键路径。整体可合入。

@PaddlePaddle-bot
Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-19 19:52:00

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

所有 required 任务均已通过 ✅,PR 可合并。有 2 个可选任务失败(不阻塞合并)。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
36(0) 36 33 2 0 1 0

2 任务状态汇总

2.1 Required任务 : 10/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
其余 10 个必选任务通过 - - - - -

2.2 可选任务 — 23/26 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 1m29s Job -
Trigger Jenkins for PR 50m10s Job -
⏸️ CI_HPU - - -
其余 23 个可选任务通过 - - -

3 失败详情(仅 required)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants