[unitest] small change in test_deepgemm_precision.py by zhoutianzi666 · Pull Request #7834 · PaddlePaddle/FastDeploy

zhoutianzi666 · 2026-05-15T10:22:49Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

codecov-commenter · 2026-05-15T11:06:32Z

Codecov Report

❌ Patch coverage is 33.33333% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@79dd64a). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...executor/layers/attention/mla_attention_backend.py	33.33%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7834   +/-   ##
==========================================
  Coverage           ?   63.33%           
==========================================
  Files              ?      462           
  Lines              ?    64372           
  Branches           ?     9872           
==========================================
  Hits               ?    40768           
  Misses             ?    20836           
  Partials           ?     2768

Flag	Coverage Δ
GPU	`72.44% <33.33%> (?)`
XPU	`7.12% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-05-15T11:14:25Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-19 21:56:09

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: bb51fb0
Merge base: 79dd64a (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

❌ 存在 1 个 Required 失败任务，需处理后方可合并。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
29(0)	29	25	3	0	1	0

2 任务状态汇总

2.1 Required任务 : 7/8 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	1h22m	PR问题：`mla_attention_backend.py` 新增代码覆盖率33%，未达80%阈值	为 `mla_attention_backend.py` L800-L1061 新增测试用例或申请豁免	Job	-
✅	其余 7 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 18/21 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Check PR Template`	14s	Job	-
❌	`Trigger Jenkins for PR`	7m19s	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 18 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率不达标（置信度: 高）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

状态: ❌ 失败
错误类型: 测试覆盖率不达标
置信度: 高
根因摘要: mla_attention_backend.py 新增代码覆盖率33%，未达80%阈值
分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试	错误	根因
覆盖率检查（Verify Code Coverage Threshold 80%）	覆盖率33% < 阈值80%	新增代码行未被测试覆盖

根因详情:
本次 PR 修改了 fastdeploy/model_executor/layers/attention/mla_attention_backend.py，共变更 112 行，但 diff 覆盖率仅为 33.33%（统计9行，仅3行被测试覆盖）。未覆盖行包括 L800、L801、L953、L954、L1060、L1061。注意：单元测试本身全部通过（TEST_EXIT_CODE=0），仅覆盖率阈值检查失败（COVERAGE_EXIT_CODE=9）。

关键日志:

COVERAGE_EXIT_CODE: 9
Coverage generation failed (exit code 9)
{"fastdeploy/model_executor/layers/attention/mla_attention_backend.py":
  {"percent_covered": 33.33, "violation_lines": [800, 801, 953, 954, 1060, 1061],
   "covered_lines": [1067, 1069, 1070]},
 "total_percent_covered": 33, "num_changed_lines": 112}
##[error]Process completed with exit code 9.

修复建议:

在 tests/ 中为 fastdeploy/model_executor/layers/attention/mla_attention_backend.py L800-801、L953-954、L1060-1061 添加测试用例，使 diff 覆盖率达到 80% 以上
如以上行属于硬件相关代码（如 GPU kernel），难以在 CI 环境测试，可向仓库管理员申请覆盖率豁免

修复建议摘要: 为 mla_attention_backend.py L800-L1061 新增测试用例或申请豁免

关联变更: fastdeploy/model_executor/layers/attention/mla_attention_backend.py（PR 变更112行，覆盖率仅33%）
链接: 查看日志

PaddlePaddle-bot · 2026-05-15T18:48:09Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-16 02:47:05

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 1f7549b
Merge base: 79dd64a (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

❌ 存在 1 个 Required 失败任务需要处理，其余 Required 任务已通过。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	38	3	0	1	0

2 任务状态汇总

2.1 Required任务 : 9/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Pre Commit`	44s	PR问题：flake8 F841，test_flashmla_precision.py:60 未使用变量 `a`	删除或重命名 test_flashmla_precision.py:60 未使用变量	Job	-
✅	其余 9 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 29/32 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	34m40s	Job	-
❌	`Check PR Template`	18s	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 29 个可选任务通过	-	-	-

3 失败详情（仅 required）

Pre Commit — 代码规范（置信度: 高）

Pre Commit

状态: ❌ 失败
错误类型: 代码规范
置信度: 高
根因摘要: flake8 F841：test_flashmla_precision.py:60 存在未使用变量 a
分析器: 通用分析(fallback)

根因详情:
Pre-commit 在 tests/operators/test_flashmla_precision.py 第60行检测到 flake8 F841 错误：局部变量 a 被赋值但从未被使用。该文件是本次 PR 的修改文件之一。其余代码风格检查（black、isort、ruff 等）均通过。

关键日志:

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/operators/test_flashmla_precision.py:60:21: F841 local variable 'a' is assigned to but never used

修复建议:

删除 tests/operators/test_flashmla_precision.py 第60行的未使用变量赋值语句，或将变量名改为 _ 以明确表示忽略
本地验证：pre-commit run --files tests/operators/test_flashmla_precision.py tests/operators/test_deepgemm_precision.py

修复建议摘要: 删除或重命名 test_flashmla_precision.py:60 未使用变量 a

关联变更: PR 修改文件：tests/operators/test_flashmla_precision.py、tests/operators/test_deepgemm_precision.py

链接: 查看日志

PaddlePaddle-bot · 2026-05-16T23:15:12Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-17 07:13:32

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 4a7d2c2
Merge base: 79dd64a (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

⚠️ 无必选（Required）检查配置，所有任务均为可选。已知可访问的任务中，3 个可选任务失败；另有 4 个 Workflow 因网络（TLS超时）无法获取状态，建议手动确认（含主测试任务 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage）。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
9(0)	9	6	3	0	0	0

⚠️ 注意：以下 4 个 Workflow 因 TLS 握手超时无法获取 Job 状态，不计入上表统计：run_id: 25952268048, 25952268062, 25952267943, 25952267922。主测试任务 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 可能位于其中，请手动确认 CI 详情。

2 任务状态汇总

2.1 Required 任务：0/0 通过

当前未配置必选检查（Branch Protection Rules 中无 Required Status Checks），所有任务均为可选。

2.2 可选任务 — 6/9 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	失败原因	日志	重跑
❌	`CI_HPU`	1h13m	环境问题：exit code 1	Job	-
❌	`Check PR Template`	12s	PR模板：exit code 7	Job	-
❌	`Run iluvatar Tests / run_iluvatar_cases`	10m35s	环境问题：Pod unhealthy	Job	-
✅	其余 6 个可选任务通过	-	-	-	-

3 失败详情（仅 required）

无 required 失败任务。

可选任务失败说明

Run iluvatar Tests / run_iluvatar_cases

错误类型：环境问题（基础设施）
错误摘要：Pod iluvatar-gpu-2-nczzk-runner-dsqbn-workflow is unhealthy with phase status Pending
处置建议：与 self-hosted runner 管理员联系，或等待 Pod 就绪后 rerun

Check PR Template

错误类型：PR 格式检查
错误摘要：Process completed with exit code 7（PR 描述不符合模板要求）
处置建议：请按 PR 模板格式完善 PR 描述内容

CI_HPU

错误类型：待进一步确认
错误摘要：Process completed with exit code 1（运行时长 1h13m，疑为测试或环境问题）
处置建议：查看完整日志确认根因，必要时 rerun

EmmonsCurse · 2026-05-19T07:40:10Z

we

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-19 19:53:47

📋 Review 摘要

PR 概述：重构 MLA KV Cache 写入逻辑，引入 slot_mapping 替换 max_seq_len 参数，并统一 forward_mixed 中 prefill/decode 路径的缓存写入调用；同时优化算子测试精度计时方式。

变更范围：custom_ops/gpu_ops/append_attn/、fastdeploy/model_executor/layers/attention/mla_attention_backend.py、tests/operators/

影响面 Tag：[OP] [KVCache]

问题

级别	文件	概述
🔴 Bug	`custom_ops/gpu_ops/append_attn/mla_cache_kernel.cuh:214`	调试代码残留：`printf` + `asm volatile("trap;")` 在生产环境触发会终止整个 CUDA context
❓ 疑问	`fastdeploy/model_executor/layers/attention/mla_attention_backend.py:846`	decode 分支移除 `decode_mla_write_cache`，统一用 `prefill_mla_write_cache`，但 decode token（batch_id=-1）的实际写缓存语义需确认

📝 PR 规范检查

PR 标题缺少官方 Tag，描述所有 section（Motivation / Modifications / Usage or Command / Accuracy Tests / Checklist）均为空。

标题建议（可直接复制）：

[OP][KVCache] Refactor MLA write cache to use slot_mapping and unify prefill/decode path in forward_mixed

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation
重构 MLA KV Cache 写入接口，以 `slot_mapping` 替换 `max_seq_len` 参数，消除历史冗余参数；在 `forward_mixed` 中统一 prefill 与 decode token 的 KV Cache 写入路径，减少代码重复。同时优化 `test_deepgemm_precision.py` 和 `test_flashmla_precision.py` 中的计时方式，改用 CUDA Event 实现更精确的性能测量。

## Modifications
- `custom_ops/gpu_ops/append_attn/mla_cache_kernel.cuh`：`prefill_absorb_cache_kernel` 新增 `slot_mapping` 参数与 CT 模板参数；将 `batch_id_per_token` 类型从 `uint32_t` 改为 `int32_t` 以支持 -1 标记；新增 `ori_bi == -1` 跳过逻辑
- `custom_ops/gpu_ops/append_attn/mla_cache_kernel.cu`：`PrefillMLAWriteCache` / `PrefillMLAWriteCacheKernel` 新增 `slot_mapping` 参数，移除 `max_seq_len`；`DecodeMLAWriteCache` / `DecodeMLAWriteCacheKernel` 移除 `max_seq_len`
- `custom_ops/gpu_ops/cpp_extensions.cc`：同步更新函数签名声明
- `fastdeploy/model_executor/layers/attention/mla_attention_backend.py`：`forward_extend` / `forward_decode` / `forward_mixed` 同步更新调用签名；`forward_mixed` 中将缓存写入提升至分支外，统一处理 prefill + decode token；`mla_blackwell` / `flashmla_baseline` 新增 float8 类型支持
- `tests/operators/test_deepgemm_precision.py`：改用 CUDA Event 精确计时，新增 TFLOPs/s 输出，扩展测试用例
- `tests/operators/test_flashmla_precision.py`：改用 CUDA Event 精确计时，新增带宽输出，支持 float8 dtype

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本 PR 完成了 MLA KV Cache 写入接口的有效重构，引入 slot_mapping 并统一了 mixed batch 的处理路径，逻辑方向正确。但 CUDA kernel 中残留了 asm volatile("trap;") 调试断言，在生产环境触发将导致整个 GPU Context 崩溃，属于 P0 阻塞问题，需在合入前删除；同时 decode token 在新路径下的缓存写入语义需作者进一步确认。

PaddlePaddle-bot · 2026-05-19T11:57:10Z

    const uint32_t block_offset = ori_seq_id % block_size;

+    const int32_t block_idx1 = slot_mapping[token_idx] / block_size;
+    if (block_idx1 != block_idx) {


🔴 Bug CUDA kernel 中残留调试代码：printf + asm volatile("trap;") 在生产环境会导致 GPU 崩溃。

asm volatile("trap;") 相当于 GPU 上的 abort()，一旦触发将终止整个 CUDA context，导致服务不可用。这段代码明显是用于对齐验证 slot_mapping 与 block_tables 两种寻址路径是否一致的临时调试代码，不应合入主干。

建议修复：

验证通过后，直接删除整个 block（第 213-219 行）

如需保留作为 debug 模式开关，应改为 #ifdef DEBUG_MLA_CACHE ... #endif

PaddlePaddle-bot · 2026-05-19T11:57:10Z

@@ -845,20 +845,6 @@ def forward_mixed(

        # Decode branch: k is None


❓ 疑问 forward_mixed 中 decode 分支移除了 decode_mla_write_cache 调用，但 decode token 的 KV 写缓存通过提升到分支前的 prefill_mla_write_cache 统一处理。

请确认：prefill_mla_write_cache 在 batch_id_per_token[token_idx] == -1（decode token 在 mixed batch 中被标记为 -1）时通过 if (ori_bi == -1) continue; 正确跳过，但 decode token 的实际写缓存是否仍然有效？原 decode_mla_write_cache 使用 seq_lens_decoder/seq_lens_encoder 参数顺序，而新代码使用 seq_lens_this_time/seq_lens_decoder，语义是否一致？

commit

88d944f

zhoutianzi666 temporarily deployed to Metax_ci May 15, 2026 10:22 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

zhoutianzi666 changed the title ~~commit~~ [unitest] small change in test_deepgemm_precision.py May 15, 2026

commit

16f8f98

zhoutianzi666 temporarily deployed to Metax_ci May 15, 2026 13:51 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

commit

1f7549b

zhoutianzi666 temporarily deployed to Metax_ci May 15, 2026 14:12 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

commit

4a7d2c2

zhoutianzi666 temporarily deployed to Metax_ci May 16, 2026 04:03 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

commit

dcbcb46

zhoutianzi666 temporarily deployed to Metax_ci May 18, 2026 04:01 — with GitHub Actions Inactive

commit

f98f1a6

zhoutianzi666 temporarily deployed to Metax_ci May 18, 2026 04:03 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

commit

3f0849e

zhoutianzi666 had a problem deploying to Metax_ci May 18, 2026 08:03 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

commit

129c510

zhoutianzi666 had a problem deploying to Metax_ci May 18, 2026 08:31 — with GitHub Actions Error

commit

67468be

zhoutianzi666 had a problem deploying to Metax_ci May 18, 2026 08:37 — with GitHub Actions Error

commit

28f0471

zhoutianzi666 temporarily deployed to Metax_ci May 18, 2026 08:38 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

commit

bb8c233

zhoutianzi666 temporarily deployed to Metax_ci May 19, 2026 04:22 — with GitHub Actions Inactive

commit

72fed03

zhoutianzi666 temporarily deployed to Metax_ci May 19, 2026 08:18 — with GitHub Actions Inactive

commit

e468263

zhoutianzi666 temporarily deployed to Metax_ci May 19, 2026 08:46 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

commit

5e5d7f0

zhoutianzi666 had a problem deploying to Metax_ci May 19, 2026 11:19 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

commit

bb51fb0

zhoutianzi666 had a problem deploying to Metax_ci May 19, 2026 11:48 — with GitHub Actions Failure

PaddlePaddle-bot suggested changes May 19, 2026

View reviewed changes

		@@ -845,20 +845,6 @@ def forward_mixed(

		# Decode branch: k is None

Conversation

zhoutianzi666 commented May 15, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 7/8 通过

2.2 可选任务 — 18/21 通过

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 15, 2026

1 任务总览

2 任务状态汇总

2.1 Required任务 : 9/10 通过

2.2 可选任务 — 29/32 通过

3 失败详情（仅 required）

Pre Commit

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 16, 2026

1 任务总览

2 任务状态汇总

2.1 Required 任务：0/0 通过

2.2 可选任务 — 6/9 通过

3 失败详情（仅 required）

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

EmmonsCurse commented May 19, 2026 • edited by zhoutianzi666 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented May 15, 2026 •

edited

Loading

PaddlePaddle-bot commented May 15, 2026 •

edited

Loading

EmmonsCurse commented May 19, 2026 •

edited by zhoutianzi666

Loading