Add DSV4 GB300 wide-EP sweep configs (EP=12/16/24/32/40)#1586
Add DSV4 GB300 wide-EP sweep configs (EP=12/16/24/32/40)#1586yhyang201 wants to merge 2 commits into
Conversation
Adds 5 new search-space entries and recipe files matching srt-slurm PR#173 wide-EP sweep topology (18 nodes total).
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| ep-num-redundant-experts: 16 | ||
| enable-dp-attention: true | ||
| enable-dp-lm-head: true | ||
| max-running-requests: 18432 |
There was a problem hiding this comment.
🟡 Nit: max-running-requests: 18432 is not divisible by data-parallel-size: 40 (18432/40 = 460.8), unlike the four sibling EP=12/16/24/32 files added in this same PR which all divide cleanly. Practical impact here is zero since the benchmark sets concurrencies: 2048 (far below the cap), but for consistency with the sibling files consider 18400 (= 460×40) or 18480 (= 462×40).
Extended reasoning...
Bug
In disagg-gb300-8p1d-dep4-dep40-18-c2048.yaml (decode block, line 148), max-running-requests: 18432 is set alongside data-parallel-size: 40. But 18432 / 40 = 460.8 — not an integer.
Why this looks like a copy/paste oversight
The value 18432 = 192 × 96 is the LCM-aligned choice that divides evenly across the other four sweep points added in this same PR:
| File (EP) | dp_size | 18432 / dp_size |
|---|---|---|
| dep12 (c12000) | 12 | 1536 |
| dep16 (c8192) | 16 | 1152 |
| dep24 (c3000) | 24 | 768 |
| dep32 (c2500) | 32 | 576 |
| dep40 (c2048) | 40 | 460.8 ← broken |
Every other GB300 recipe currently in the tree also has max-running-requests exactly divisible by its data-parallel-size. The EP=40 file is the only outlier, and 40 was clearly just missed when 18432 was selected for the sweep.
Runtime behavior
SGLang's DP-attention path floor-divides max_running_requests across DP ranks rather than erroring out, so this won't crash — it just results in 460 per rank × 40 = 18400 effective capacity (32 slots silently dropped, ~0.17%).
Why this is still worth flagging (but only as a nit)
Addressing the refutation: yes, the benchmark sets concurrencies: 2048, which is ~9× below either 18432 or 18400, so the cap is genuinely never reached and no benchmark number will change. There is no functional regression and no test will fail. That is exactly why this is filed as a nit, not as a blocking issue.
The reason it's still worth a one-line fix:
- The four sibling files in this same PR observe the divisibility invariant — the EP=40 file is internally inconsistent with its own cohort.
- Anyone copying this recipe to a different concurrency (e.g., raising
concurrenciesto actually exercise the cap, which is a natural follow-up sweep) will inherit the rounding inconsistency. - The fix is one character: change
18432→18400(or18480) on the single line at file:148.
Step-by-step proof
- Open
benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-8p1d-dep4-dep40-18-c2048.yaml. - In the
backend.sglang_config.decodeblock, observe:data-parallel-size: 40(line ~144)max-running-requests: 18432(line 148)
- Compute 18432 ÷ 40 = 460.8 → not an integer.
- Repeat the same check on the four sibling files added in this PR (dep12/16/24/32) — all evaluate to clean integers (1536 / 1152 / 768 / 576). The invariant holds for every other file in the PR and every pre-existing GB300 recipe in the tree.
Suggested fix
# decode block, line 148
max-running-requests: 18400 # = 460 × 40, matches the per-rank floor SGLang would compute anyway|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26642196095 |
Summary
Note
Low Risk
Benchmark and CI config only (YAML recipes and search-space entries); no runtime application or auth changes.
Overview
Extends
dsv4-fp4-gb300-dynamo-sglangwith five wide expert-parallel (EP) sweep points on a fixed 18-node disaggregated layout (srt-slurm PR#173 topology), each wired to a newCONFIG_FILErecipe underbenchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/.The sweep varies decode EP/TP (12→40) and prefill worker count while keeping prefill at TP/EP 4 with dp-attn: EP=12 (15P+3D, conc 12000), EP=16 (14P+4D, 8192), EP=24 (12P+6D, 3000), EP=32 (10P+8D, 2500), EP=40 (8P+10D, 2048). New recipes use the nightly-20260520 SGLang image, Dynamo install, 8k/1k
sa-bench, and tuned decode settings (swa-full-tokens-ratio=0.20,max-running-requests=18432,moe-dense-tp-size=1); EP=40 addsep-num-redundant-experts: 16.perf-changelog.yamldocuments the new config keys and parameter alignment.Reviewed by Cursor Bugbot for commit d5728f5. Bugbot is set up for automated code reviews on this repo. Configure here.