Add DSV4 GB300 wide-EP sweep configs (EP=12/16/24/32/40) by yhyang201 · Pull Request #1586 · SemiAnalysisAI/InferenceX

yhyang201 · 2026-05-29T14:08:25Z

Summary

Add 5 new search-space entries and recipe files for DSV4 GB300 non-MTP wide-EP sweep, matching srt-slurm PR#173 topology (18 nodes total).
EP sizes: 12/16/24/32/40, decode nodes: 3/4/6/8/10, concurrencies: 12000/8192/3000/2500/2048.
Uses InferenceX env vars and sglang_config (megamoe, W4A4, nightly-20260520 image), with Weiliang's tuned decode params (swa-full-tokens-ratio=0.20, max-running-requests=18432).

Note

Low Risk
Benchmark and CI config only (YAML recipes and search-space entries); no runtime application or auth changes.

Overview
Extends dsv4-fp4-gb300-dynamo-sglang with five wide expert-parallel (EP) sweep points on a fixed 18-node disaggregated layout (srt-slurm PR#173 topology), each wired to a new CONFIG_FILE recipe under benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/.

The sweep varies decode EP/TP (12→40) and prefill worker count while keeping prefill at TP/EP 4 with dp-attn: EP=12 (15P+3D, conc 12000), EP=16 (14P+4D, 8192), EP=24 (12P+6D, 3000), EP=32 (10P+8D, 2500), EP=40 (8P+10D, 2048). New recipes use the nightly-20260520 SGLang image, Dynamo install, 8k/1k sa-bench, and tuned decode settings (swa-full-tokens-ratio=0.20, max-running-requests=18432, moe-dense-tp-size=1); EP=40 adds ep-num-redundant-experts: 16.

perf-changelog.yaml documents the new config keys and parameter alignment.

^{Reviewed by Cursor Bugbot for commit d5728f5. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds 5 new search-space entries and recipe files matching srt-slurm PR#173 wide-EP sweep topology (18 nodes total).

github-actions · 2026-05-29T14:08:38Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-05-29T14:20:24Z

+      ep-num-redundant-experts: 16
+      enable-dp-attention: true
+      enable-dp-lm-head: true
+      max-running-requests: 18432


🟡 Nit: max-running-requests: 18432 is not divisible by data-parallel-size: 40 (18432/40 = 460.8), unlike the four sibling EP=12/16/24/32 files added in this same PR which all divide cleanly. Practical impact here is zero since the benchmark sets concurrencies: 2048 (far below the cap), but for consistency with the sibling files consider 18400 (= 460×40) or 18480 (= 462×40).

Extended reasoning...

Bug

In disagg-gb300-8p1d-dep4-dep40-18-c2048.yaml (decode block, line 148), max-running-requests: 18432 is set alongside data-parallel-size: 40. But 18432 / 40 = 460.8 — not an integer.

Why this looks like a copy/paste oversight

The value 18432 = 192 × 96 is the LCM-aligned choice that divides evenly across the other four sweep points added in this same PR:

File (EP) dp_size 18432 / dp_size

dep12 (c12000) 12 1536

dep16 (c8192) 16 1152

dep24 (c3000) 24 768

dep32 (c2500) 32 576

dep40 (c2048) 40 460.8 ← broken

Every other GB300 recipe currently in the tree also has max-running-requests exactly divisible by its data-parallel-size. The EP=40 file is the only outlier, and 40 was clearly just missed when 18432 was selected for the sweep.

Runtime behavior

SGLang's DP-attention path floor-divides max_running_requests across DP ranks rather than erroring out, so this won't crash — it just results in 460 per rank × 40 = 18400 effective capacity (32 slots silently dropped, ~0.17%).

Why this is still worth flagging (but only as a nit)

Addressing the refutation: yes, the benchmark sets concurrencies: 2048, which is ~9× below either 18432 or 18400, so the cap is genuinely never reached and no benchmark number will change. There is no functional regression and no test will fail. That is exactly why this is filed as a nit, not as a blocking issue.

The reason it's still worth a one-line fix:

The four sibling files in this same PR observe the divisibility invariant — the EP=40 file is internally inconsistent with its own cohort.

Anyone copying this recipe to a different concurrency (e.g., raising concurrencies to actually exercise the cap, which is a natural follow-up sweep) will inherit the rounding inconsistency.

The fix is one character: change 18432 → 18400 (or 18480) on the single line at file:148.

Step-by-step proof

Open benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/8k1k/disagg-gb300-8p1d-dep4-dep40-18-c2048.yaml.

In the backend.sglang_config.decode block, observe:

data-parallel-size: 40 (line ~144)

max-running-requests: 18432 (line 148)

Compute 18432 ÷ 40 = 460.8 → not an integer.

Repeat the same check on the four sibling files added in this PR (dep12/16/24/32) — all evaluate to clean integers (1536 / 1152 / 768 / 576). The invariant holds for every other file in the PR and every pre-existing GB300 recipe in the tree.

Suggested fix

# decode block, line 148 max-running-requests: 18400 # = 460 × 40, matches the per-rank floor SGLang would compute anyway

github-actions · 2026-05-30T02:43:50Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26642196095
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26642196095

Add DSV4 GB300 wide-EP sweep configs (EP=12/16/24/32/40)

03c20f7

Adds 5 new search-space entries and recipe files matching srt-slurm PR#173 wide-EP sweep topology (18 nodes total).

yhyang201 requested a review from a team May 29, 2026 14:08

yhyang201 requested review from jgangani and kedarpotdar-nv as code owners May 29, 2026 14:08

github-project-automation Bot added this to InferenceMAX Board May 29, 2026

Append perf-changelog entry for PR #1586

d5728f5

yhyang201 added the full-sweep-enabled label May 29, 2026

claude Bot reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DSV4 GB300 wide-EP sweep configs (EP=12/16/24/32/40)#1586

Add DSV4 GB300 wide-EP sweep configs (EP=12/16/24/32/40)#1586
yhyang201 wants to merge 2 commits into
mainfrom
add-dsv4-gb300-weiliang-wideep-sweep

yhyang201 commented May 29, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

claude Bot May 29, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

File (EP)	dp_size	18432 / dp_size
dep12 (c12000)	12	1536
dep16 (c8192)	16	1152
dep24 (c3000)	24	768
dep32 (c2500)	32	576
dep40 (c2048)	40	460.8 ← broken

Conversation

yhyang201 commented May 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

claude Bot May 29, 2026

Choose a reason for hiding this comment

Bug

Why this looks like a copy/paste oversight

Runtime behavior

Why this is still worth flagging (but only as a nit)

Step-by-step proof

Suggested fix

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yhyang201 commented May 29, 2026 •

edited by cursor Bot

Loading