[https://nvbugs/6133201][fix] Bump GEN max_num_tokens in disagg perf YAMLs#14191
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughFive performance benchmark YAML configuration files update worker generation token limits: three GB200 DeepSeek-R1 FP4 configurations increase ChangesPerformance benchmark worker token limit updates
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
/bot run |
|
PR_Github #48628 [ run ] triggered by Bot. Commit: |
|
PR_Github #48628 [ run ] completed with state
|
|
/bot run |
|
PR_Github #48665 [ run ] triggered by Bot. Commit: |
|
PR_Github #48665 [ run ] completed with state
|
…YAMLs Under MTP/Eagle3 with max_draft_len=D, each scheduled request consumes up to (1+D) tokens per forward step, so the GEN engine's per-step token budget must satisfy max_num_tokens >= max_batch_size * (1 + max_draft_len). The qwen3-235b and deepseek-r1 disagg-perf YAMLs added by NVIDIA#13343 set this budget to exactly half of what max_batch_size declares. When attention-DP routing seats more than (max_num_tokens / (1+D)) requests on a single rank (routine at concurrency=2048), the per-rank check in _prepare_tp_inputs trips: AssertionError: total_num_tokens (260) should be less than or equal to max_num_tokens (256) The worker dies; surviving ranks block forever in NIXL/UCX collectives waiting for it; the benchmark client streams a "0/16384" progress bar until SLURM's job-step timeout fires. This is the failure captured by nvbugs/6133201. Fix the arithmetic in all five mismatched configs (qwen3-235b: 256->512; deepseek-r1: 1536->3072). Verified on lyris GB200 against the original artifact's SHA + image: with the bump the test completes in 332s with non-zero throughput; without it, the assertion fires within seconds of the first NIXL transfer. Signed-off-by: Xiao Wang <24860335+xwang233@users.noreply.github.com>
9349aeb to
5d9b327
Compare
|
/bot run |
|
PR_Github #49269 [ run ] triggered by Bot. Commit: |
|
PR_Github #49269 [ run ] completed with state
|
|
/bot skip --comment "config-only YAML bump for post-merge perf-sanity + QA disagg tests; no pre-merge coverage" |
|
PR_Github #49457 [ skip ] triggered by Bot. Commit: |
|
PR_Github #49457 [ skip ] completed with state |
…YAMLs (NVIDIA#14191) Signed-off-by: Xiao Wang <24860335+xwang233@users.noreply.github.com>
…YAMLs (NVIDIA#14191) Signed-off-by: Xiao Wang <24860335+xwang233@users.noreply.github.com>
Summary
Fix nvbugs/6133201: five GEN-engine YAMLs added by #13343 set
max_num_tokensto exactly half of whatmax_batch_size × (1 + max_draft_len)requires under MTP/Eagle3. At high concurrency with attention-DP routing, the per-rank check in_prepare_tp_inputstrips, the worker dies, and the rest of the disagg job blocks forever in NIXL/UCX collectives — the benchmark client sees a stuck0/16384progress bar until the job-step timeout fires.This PR bumps
max_num_tokensin the five mismatched files so the invariantmax_num_tokens ≥ max_batch_size × (1 + max_draft_len)holds:perf/disaggregated/gb200_qwen3-235b-fp4_…_dep16_bs128_…_mtp3_con2048_ccb-NIXL.yamlperf/disaggregated/gb300_qwen3-235b-fp4_…_dep16_bs128_…_mtp3_con2048_ccb-NIXL.yamlperf/disaggregated/gb200_deepseek-r1-fp4_…_con2048_…_dep16_eplb0_mtp3_ccb-NIXL.yamlperf-sanity/disaggregated/gb200_deepseek-r1-fp4_…_con2048_…_dep16_eplb0_mtp3_ccb-NIXL.yamlperf-sanity/disaggregated/gb200_deepseek-r1-fp4_…_con2048_…_dep16_eplb288_mtp3_ccb-NIXL.yamlThe bug report only covers the first row (
disagg-e2e-gb200_qwen3-235b-fp4_…_con2048_…_mtp3_ccb-NIXL), but the GB300 sibling was already confirmed reproducing in comment 2 of the nvbug, and the three DeepSeek-R1 entries are the same latent arithmetic mismatch — fixing them together prevents the same failure firing on a different test next run.This is a test-config fix only — no source-code changes. A separate follow-up should enforce the invariant either in
LlmArgsvalidation or in the ADP router's per-rank scheduling so future YAMLs cannot land with this inconsistency silently.Test plan
max_num_tokens=512the failing test (perf/test_perf_sanity.py::test_e2e[disagg-e2e-gb200_qwen3-235b-fp4_…_con2048_…_mtp3_ccb-NIXL]) completes in 332 s with non-zero throughput; without the bump the assertion fires within seconds of the first NIXL transfer.Summary by CodeRabbit