Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8270,7 +8270,7 @@ kimik2.5-fp4-gb200-dynamo-trt:
dp-attn: true

kimik2.5-fp4-gb200-dynamo-vllm:
image: vllm/vllm-openai:v0.18.0-cu130
image: vllm/vllm-openai:v0.21.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing -ubuntu2404 suffix for GB200 vLLM image tag

High Severity

The image tag vllm/vllm-openai:v0.21.0 lacks the -ubuntu2404 suffix required for GB200 runners. All other GB200 dynamo-vllm configs in this file (dsv4-fp4-gb200-dynamo-vllm, dsv4-fp4-gb200-dynamo-vllm-mtp2, dsv4-fp4-gb300-dynamo-vllm) use the -ubuntu2404 variant which provides CUDA 13.0 and aarch64 support needed by Grace Blackwell hardware. The old image correctly used -cu130. The v0.21.0-ubuntu2404 tag exists and is the appropriate image for this runner.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 424a294. Configure here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The bumped image vllm/vllm-openai:v0.21.0 is the x86_64 tag, but the gb200 runner is Grace+Blackwell (ARM64). Every other gb200/gb300 vllm entry in this file uses the -ubuntu2404 suffix (e.g. dsv4-fp4-gb200-dynamo-vllm pins v0.20.0-ubuntu2404), and the srt-slurm recipe disagg-gb200-mid-curve-megamoe-mtp2.yaml already pins vllm/vllm-openai:v0.21.0-ubuntu2404 — proving the ARM64 tag exists. Change to vllm/vllm-openai:v0.21.0-ubuntu2404 to match the gb200 convention; the prior v0.18.0-cu130 value also had an arch/CUDA suffix, so dropping to the bare tag will almost certainly fail at container start with a manifest/exec-format error.

Extended reasoning...

What the bug is

The diff bumps kimik2.5-fp4-gb200-dynamo-vllm from vllm/vllm-openai:v0.18.0-cu130 to the bare tag vllm/vllm-openai:v0.21.0. The bare vX.Y.Z tag on vllm/vllm-openai is the x86_64 image. The runner: gb200 field directly below this line targets a Grace+Blackwell node, whose CPU is ARM64 (aarch64). Pulling the x86 image on an ARM64 host will either fail with no matching manifest for linux/arm64 or fall back to an emulated/wrong image that fails with exec format error.

Convention evidence in this file

Every other gb200/gb300 vllm entry in .github/configs/nvidia-master.yaml uses the -ubuntu2404 ARM-suitable suffix:

Config runner image tag
dsv4-fp4-gb200-dynamo-vllm gb200 v0.20.0-ubuntu2404
dsv4-fp4-gb200-dynamo-vllm-mtp2 gb200 v0.20.1-ubuntu2404
dsv4-fp4-gb300-dynamo-vllm gb300-nv v0.20.0-ubuntu2404
kimik2.5-fp4-gb200-dynamo-vllm (this PR) gb200 v0.21.0 ← missing suffix

Conversely, every bare v0.21.0 (no suffix) usage in this file is on an x86 runner (h200, b200, b300, h100). The bare tag is correct only for x86 hosts.

Proof the correct tag exists upstream

The repository already pins the exact correct tag for gb200: benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/8k1k/disagg-gb200-mid-curve-megamoe-mtp2.yaml uses vllm/vllm-openai:v0.21.0-ubuntu2404 (lines 5 and 143). So this is not a new tag we need vLLM to publish — it is already published and already used elsewhere in the repo for gb200.

Why the PR description's justification doesn't apply

The PR description says it "matches the v0.21.0 tag convention from PR #1461 (dsv4-fp8-h200-vllm)" and "the recent kimik2.5-fp4-b200-vllm-agentic bump". Both of those configs are x86 (H200 host and B200-dgxc host), so plain v0.21.0 is correct for them. They are exactly the wrong precedent for an ARM64 GB200 entry. The author appears to have copied the tag from an x86 sibling without noticing the architecture difference. The prior value v0.18.0-cu130 had a -cu130 (CUDA 13) suffix, which is what provided the ARM64-compatible image — that signal was lost in the bump.

Step-by-step proof of failure

  1. CI dispatches the benchmark to a gb200 runner (line 8276: runner: gb200). GB200 = Grace ARM64 CPU + Blackwell GPU.
  2. The runner attempts docker pull vllm/vllm-openai:v0.21.0 (per line 8273).
  3. The bare v0.21.0 tag on Docker Hub for vllm/vllm-openai has only a linux/amd64 manifest (the ARM64 build is published under the -ubuntu2404 suffix — confirmed because the repo's own disagg-gb200-mid-curve-megamoe-mtp2.yaml pins v0.21.0-ubuntu2404 specifically because that's the ARM build).
  4. Docker on ARM64 will fail with no matching manifest for linux/arm64/v8 in the manifest list entries (or, if a single-arch x86 manifest is matched anyway, the container will fail to start with exec /usr/bin/python: exec format error).
  5. The benchmark never runs; the job fails at container start.

Fix

Change line 8273 from:

  image: vllm/vllm-openai:v0.21.0

to:

  image: vllm/vllm-openai:v0.21.0-ubuntu2404

This matches the established gb200 convention in nvidia-master.yaml and the existing disagg-gb200-mid-curve-megamoe-mtp2.yaml recipe. The perf-changelog entry should be updated to reference the suffixed tag as well.

model: nvidia/Kimi-K2.5-NVFP4
model-prefix: kimik2.5
runner: gb200
Expand Down
7 changes: 7 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3192,3 +3192,10 @@
- "Add GLM-5-FP8 models.yaml flags, setup_deps.sh (aiter gluon + transformers glm_moe_dsa), GLM-5 env tuning in env.sh"
- "Add multinode launch script glm5_fp8_mi355x_sglang-disagg.sh; server.sh sources setup_deps.sh"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1572

- config-keys:
- kimik2.5-fp4-gb200-dynamo-vllm
description:
- "Bump vLLM image from vllm/vllm-openai:v0.18.0-cu130 to vllm/vllm-openai:v0.21.0"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1582