Skip to content

[unitest] small change in test_deepgemm_precision.py#7834

Open
zhoutianzi666 wants to merge 15 commits into
PaddlePaddle:developfrom
zhoutianzi666:make_time_more_precisi
Open

[unitest] small change in test_deepgemm_precision.py#7834
zhoutianzi666 wants to merge 15 commits into
PaddlePaddle:developfrom
zhoutianzi666:make_time_more_precisi

Conversation

@zhoutianzi666
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 15, 2026

Codecov Report

❌ Patch coverage is 33.33333% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@79dd64a). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...executor/layers/attention/mla_attention_backend.py 33.33% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7834   +/-   ##
==========================================
  Coverage           ?   63.33%           
==========================================
  Files              ?      462           
  Lines              ?    64372           
  Branches           ?     9872           
==========================================
  Hits               ?    40768           
  Misses             ?    20836           
  Partials           ?     2768           
Flag Coverage Δ
GPU 72.44% <33.33%> (?)
XPU 7.12% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 15, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-19 21:56:09

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

❌ 存在 1 个 Required 失败任务,需处理后方可合并。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
29(0) 29 25 3 0 1 0

2 任务状态汇总

2.1 Required任务 : 7/8 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 1h22m PR问题:mla_attention_backend.py 新增代码覆盖率33%,未达80%阈值 mla_attention_backend.py L800-L1061 新增测试用例或申请豁免 Job -
其余 7 个必选任务通过 - - - - -

2.2 可选任务 — 18/21 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Check PR Template 14s Job -
Trigger Jenkins for PR 7m19s Job -
⏸️ CI_HPU - - -
其余 18 个可选任务通过 - - -

3 失败详情(仅 required)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率不达标(置信度: 高)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

  • 状态: ❌ 失败
  • 错误类型: 测试覆盖率不达标
  • 置信度: 高
  • 根因摘要: mla_attention_backend.py 新增代码覆盖率33%,未达80%阈值
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试 错误 根因
覆盖率检查(Verify Code Coverage Threshold 80%) 覆盖率33% < 阈值80% 新增代码行未被测试覆盖

根因详情:
本次 PR 修改了 fastdeploy/model_executor/layers/attention/mla_attention_backend.py,共变更 112 行,但 diff 覆盖率仅为 33.33%(统计9行,仅3行被测试覆盖)。未覆盖行包括 L800、L801、L953、L954、L1060、L1061。注意:单元测试本身全部通过(TEST_EXIT_CODE=0),仅覆盖率阈值检查失败(COVERAGE_EXIT_CODE=9)。

关键日志:

COVERAGE_EXIT_CODE: 9
Coverage generation failed (exit code 9)
{"fastdeploy/model_executor/layers/attention/mla_attention_backend.py":
  {"percent_covered": 33.33, "violation_lines": [800, 801, 953, 954, 1060, 1061],
   "covered_lines": [1067, 1069, 1070]},
 "total_percent_covered": 33, "num_changed_lines": 112}
##[error]Process completed with exit code 9.

修复建议:

  1. tests/ 中为 fastdeploy/model_executor/layers/attention/mla_attention_backend.py L800-801、L953-954、L1060-1061 添加测试用例,使 diff 覆盖率达到 80% 以上
  2. 如以上行属于硬件相关代码(如 GPU kernel),难以在 CI 环境测试,可向仓库管理员申请覆盖率豁免

修复建议摘要: 为 mla_attention_backend.py L800-L1061 新增测试用例或申请豁免

关联变更: fastdeploy/model_executor/layers/attention/mla_attention_backend.py(PR 变更112行,覆盖率仅33%)
链接: 查看日志

@zhoutianzi666 zhoutianzi666 changed the title commit [unitest] small change in test_deepgemm_precision.py May 15, 2026
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-16 02:47:05

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

❌ 存在 1 个 Required 失败任务需要处理,其余 Required 任务已通过。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 38 3 0 1 0

2 任务状态汇总

2.1 Required任务 : 9/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Pre Commit 44s PR问题:flake8 F841,test_flashmla_precision.py:60 未使用变量 a 删除或重命名 test_flashmla_precision.py:60 未使用变量 Job -
其余 9 个必选任务通过 - - - - -

2.2 可选任务 — 29/32 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 34m40s Job -
Check PR Template 18s Job -
⏸️ CI_HPU - - -
其余 29 个可选任务通过 - - -

3 失败详情(仅 required)

Pre Commit — 代码规范(置信度: 高)

Pre Commit

  • 状态: ❌ 失败
  • 错误类型: 代码规范
  • 置信度: 高
  • 根因摘要: flake8 F841:test_flashmla_precision.py:60 存在未使用变量 a
  • 分析器: 通用分析(fallback)

根因详情:
Pre-commit 在 tests/operators/test_flashmla_precision.py 第60行检测到 flake8 F841 错误:局部变量 a 被赋值但从未被使用。该文件是本次 PR 的修改文件之一。其余代码风格检查(black、isort、ruff 等)均通过。

关键日志:

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/operators/test_flashmla_precision.py:60:21: F841 local variable 'a' is assigned to but never used

修复建议:

  1. 删除 tests/operators/test_flashmla_precision.py 第60行的未使用变量赋值语句,或将变量名改为 _ 以明确表示忽略
  2. 本地验证:pre-commit run --files tests/operators/test_flashmla_precision.py tests/operators/test_deepgemm_precision.py

修复建议摘要: 删除或重命名 test_flashmla_precision.py:60 未使用变量 a

关联变更: PR 修改文件:tests/operators/test_flashmla_precision.pytests/operators/test_deepgemm_precision.py

链接: 查看日志

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-17 07:13:32

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

⚠️ 无必选(Required)检查配置,所有任务均为可选。已知可访问的任务中,3 个可选任务失败;另有 4 个 Workflow 因网络(TLS超时)无法获取状态,建议手动确认(含主测试任务 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage)。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
9(0) 9 6 3 0 0 0

⚠️ 注意:以下 4 个 Workflow 因 TLS 握手超时无法获取 Job 状态,不计入上表统计:run_id: 25952268048, 25952268062, 25952267943, 25952267922。主测试任务 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 可能位于其中,请手动确认 CI 详情


2 任务状态汇总

2.1 Required 任务:0/0 通过

当前未配置必选检查(Branch Protection Rules 中无 Required Status Checks),所有任务均为可选。

2.2 可选任务 — 6/9 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 失败原因 日志 重跑
CI_HPU 1h13m 环境问题:exit code 1 Job -
Check PR Template 12s PR模板:exit code 7 Job -
Run iluvatar Tests / run_iluvatar_cases 10m35s 环境问题:Pod unhealthy Job -
其余 6 个可选任务通过 - - - -

3 失败详情(仅 required)

无 required 失败任务。


可选任务失败说明

Run iluvatar Tests / run_iluvatar_cases

  • 错误类型:环境问题(基础设施)
  • 错误摘要:Pod iluvatar-gpu-2-nczzk-runner-dsqbn-workflow is unhealthy with phase status Pending
  • 处置建议:与 self-hosted runner 管理员联系,或等待 Pod 就绪后 rerun

Check PR Template

  • 错误类型:PR 格式检查
  • 错误摘要:Process completed with exit code 7(PR 描述不符合模板要求)
  • 处置建议:请按 PR 模板格式完善 PR 描述内容

CI_HPU

  • 错误类型:待进一步确认
  • 错误摘要:Process completed with exit code 1(运行时长 1h13m,疑为测试或环境问题)
  • 处置建议:查看完整日志确认根因,必要时 rerun

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@EmmonsCurse
Copy link
Copy Markdown
Collaborator

EmmonsCurse commented May 19, 2026

we

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-19 19:53:47

📋 Review 摘要

PR 概述:重构 MLA KV Cache 写入逻辑,引入 slot_mapping 替换 max_seq_len 参数,并统一 forward_mixed 中 prefill/decode 路径的缓存写入调用;同时优化算子测试精度计时方式。

变更范围custom_ops/gpu_ops/append_attn/fastdeploy/model_executor/layers/attention/mla_attention_backend.pytests/operators/

影响面 Tag[OP] [KVCache]

问题

级别 文件 概述
🔴 Bug custom_ops/gpu_ops/append_attn/mla_cache_kernel.cuh:214 调试代码残留:printf + asm volatile("trap;") 在生产环境触发会终止整个 CUDA context
❓ 疑问 fastdeploy/model_executor/layers/attention/mla_attention_backend.py:846 decode 分支移除 decode_mla_write_cache,统一用 prefill_mla_write_cache,但 decode token(batch_id=-1)的实际写缓存语义需确认

📝 PR 规范检查

PR 标题缺少官方 Tag,描述所有 section(Motivation / Modifications / Usage or Command / Accuracy Tests / Checklist)均为空。

标题建议(可直接复制):

  • [OP][KVCache] Refactor MLA write cache to use slot_mapping and unify prefill/decode path in forward_mixed

PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):

## Motivation
重构 MLA KV Cache 写入接口,以 `slot_mapping` 替换 `max_seq_len` 参数,消除历史冗余参数;在 `forward_mixed` 中统一 prefill 与 decode token 的 KV Cache 写入路径,减少代码重复。同时优化 `test_deepgemm_precision.py``test_flashmla_precision.py` 中的计时方式,改用 CUDA Event 实现更精确的性能测量。

## Modifications
- `custom_ops/gpu_ops/append_attn/mla_cache_kernel.cuh``prefill_absorb_cache_kernel` 新增 `slot_mapping` 参数与 CT 模板参数;将 `batch_id_per_token` 类型从 `uint32_t` 改为 `int32_t` 以支持 -1 标记;新增 `ori_bi == -1` 跳过逻辑
- `custom_ops/gpu_ops/append_attn/mla_cache_kernel.cu``PrefillMLAWriteCache` / `PrefillMLAWriteCacheKernel` 新增 `slot_mapping` 参数,移除 `max_seq_len``DecodeMLAWriteCache` / `DecodeMLAWriteCacheKernel` 移除 `max_seq_len`
- `custom_ops/gpu_ops/cpp_extensions.cc`:同步更新函数签名声明
- `fastdeploy/model_executor/layers/attention/mla_attention_backend.py``forward_extend` / `forward_decode` / `forward_mixed` 同步更新调用签名;`forward_mixed` 中将缓存写入提升至分支外,统一处理 prefill + decode token;`mla_blackwell` / `flashmla_baseline` 新增 float8 类型支持
- `tests/operators/test_deepgemm_precision.py`:改用 CUDA Event 精确计时,新增 TFLOPs/s 输出,扩展测试用例
- `tests/operators/test_flashmla_precision.py`:改用 CUDA Event 精确计时,新增带宽输出,支持 float8 dtype

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本 PR 完成了 MLA KV Cache 写入接口的有效重构,引入 slot_mapping 并统一了 mixed batch 的处理路径,逻辑方向正确。但 CUDA kernel 中残留了 asm volatile("trap;") 调试断言,在生产环境触发将导致整个 GPU Context 崩溃,属于 P0 阻塞问题,需在合入前删除;同时 decode token 在新路径下的缓存写入语义需作者进一步确认。

const uint32_t block_offset = ori_seq_id % block_size;

const int32_t block_idx1 = slot_mapping[token_idx] / block_size;
if (block_idx1 != block_idx) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug CUDA kernel 中残留调试代码:printf + asm volatile("trap;") 在生产环境会导致 GPU 崩溃。

asm volatile("trap;") 相当于 GPU 上的 abort(),一旦触发将终止整个 CUDA context,导致服务不可用。这段代码明显是用于对齐验证 slot_mappingblock_tables 两种寻址路径是否一致的临时调试代码,不应合入主干。

建议修复:

  1. 验证通过后,直接删除整个 block(第 213-219 行)
  2. 如需保留作为 debug 模式开关,应改为 #ifdef DEBUG_MLA_CACHE ... #endif

@@ -845,20 +845,6 @@ def forward_mixed(

# Decode branch: k is None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 forward_mixed 中 decode 分支移除了 decode_mla_write_cache 调用,但 decode token 的 KV 写缓存通过提升到分支前的 prefill_mla_write_cache 统一处理。

请确认:prefill_mla_write_cachebatch_id_per_token[token_idx] == -1(decode token 在 mixed batch 中被标记为 -1)时通过 if (ori_bi == -1) continue; 正确跳过,但 decode token 的实际写缓存是否仍然有效?原 decode_mla_write_cache 使用 seq_lens_decoder/seq_lens_encoder 参数顺序,而新代码使用 seq_lens_this_time/seq_lens_decoder,语义是否一致?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants