Skip to content

docs: add inference_guide with validated 7B+ models (Ascend NPU)#268

Open
EdisonSu768 wants to merge 5 commits into
masterfrom
docs/inference-guide-validated-models
Open

docs: add inference_guide with validated 7B+ models (Ascend NPU)#268
EdisonSu768 wants to merge 5 commits into
masterfrom
docs/inference-guide-validated-models

Conversation

@EdisonSu768

@EdisonSu768 EdisonSu768 commented Jun 18, 2026

Copy link
Copy Markdown
Member

What

New docs/en/inference_guide/ section documenting already-validated open-weight LLM inference, mirroring the structure of training_guides/training-runtimes. All content is derived from verified deployments + benchmarks (no fabricated numbers).

Contents

Two validated models above the 7B/8B class:

  • Qwen3-14B (dense, BF16) — recommended engine vLLM-Ascend
  • Qwen3-30B-A3B (MoE, BF16) — recommended engine MindIE

Per the four asks:

  • Models (2, ≥7B/8B) ✅
  • Runtime images (GPU/NPU) — NPU vLLM-Ascend + MindIE catalog, with an NVIDIA GPU note ✅
  • Runtime YAML examples — namespace-scoped ServingRuntime + InferenceService (not ClusterServingRuntime), one per model/engine/TP combination ✅
  • Model + runtime + image benchmark results — measured guidellm open-loop per-replica numbers (4 workloads × rate 1–9), with the dense→vLLM / MoE→MindIE engine-selection finding ✅

Files

docs/en/inference_guide/
├── index.mdx                         # overview, runtime-image catalog, engine selection, methodology, deploy steps
├── qwen3-14b.mdx                     # dense model card + benchmark tables
├── qwen3-30b-a3b.mdx                 # MoE model card + benchmark tables
└── assets/
    ├── qwen3-14b/qwen3-14b-vllm-ascend-tp1.yaml
    ├── qwen3-14b/qwen3-14b-vllm-ascend-tp2.yaml
    ├── qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yaml
    └── qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml

Scope note

Every validated 7B+ model currently runs on Ascend 910B NPU; there is no verified GPU benchmark at this size yet (only Qwen3.5-0.8B on A30). The doc therefore leads with NPU and lists the NVIDIA GPU runtime as the platform default with that gap flagged, rather than inventing GPU numbers.

Verification

  • yarn lint → 0 errors / 0 warnings
  • yarn build → all 3 pages render
  • All 4 asset YAMLs parse as valid multi-doc YAML

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added ready-to-deploy configurations for Qwen3-30B-A3B model inference on Ascend 910B3 hardware with MindIE and vLLM-Ascend inference engines.
    • Support for tensor parallelism (TP=2 and TP=4) configurations.
  • Documentation

    • Added comprehensive inference deployment guide with per-model benchmarking documentation and performance metrics across Chat, Code, and RAG workloads.

New `inference_guide/` section documenting already-validated open-weight
LLM inference, mirroring the structure of `training_guides/training-runtimes`:

- Two validated models above the 7B/8B class:
  - Qwen3-14B (dense, BF16) — recommended engine vLLM-Ascend
  - Qwen3-30B-A3B (MoE, BF16) — recommended engine MindIE
- Runtime images catalog (NPU vLLM-Ascend + MindIE; NVIDIA GPU note)
- Per-model namespace-scoped `ServingRuntime` + `InferenceService` assets
  (not ClusterServingRuntime), one per engine/TP combination
- Measured open-loop per-replica benchmark tables (guidellm, 4 workloads,
  rate 1-9) with the dense→vLLM / MoE→MindIE engine-selection finding

Lint and build pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

Adds a new inference guide section for Qwen3-30B-A3B on Huawei Ascend 910B3: three KServe YAML asset files (MindIE TP=2, MindIE TP=4, vLLM-Ascend TP=4) and two MDX pages (top-level inference guide index, Qwen3-30B-A3B model page). Twelve custom dictionary terms are also appended to the cspell wordlist.

Changes

Inference Guide: Qwen3-30B-A3B on Ascend 910B3

Layer / File(s) Summary
Inference guide index page
docs/en/inference_guide/index.mdx, .cspell/terms.txt
Adds the top-level inference guide MDX page with validated model listing, runtime image tag table (vLLM-Ascend v0.18, MindIE), guidellm benchmark methodology definition (open-loop, four workloads, saturation capacity), kubectl deploy instructions with curl test example, and caveats (namespace scoping, Ascend910B3 resource key, HCCL/Modelcar permission modes). Appends 12 inference-related terms to the cspell dictionary.
MindIE TP=2 and TP=4 KServe manifests
docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp2.yaml, docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml
Adds two MindIE ServingRuntime + InferenceService YAML pairs. Both embed a bash startup script that uses npu-smi to detect NPU count, derives WORLD_SIZE/device IDs, validates the model config, rewrites MindIE config.json via extensive sed substitutions, and launches mindieservice_daemon. TP=2 allocates 2 Ascend910B3 cards; TP=4 allocates 4. Both use hostIPC: true and root securityContext.
vLLM-Ascend TP=4 KServe manifest
docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yaml
Adds a ServingRuntime using vLLM-Ascend v0.18 (OpenAI-compatible server) and a matching InferenceService with TP=4, expert-parallel, eager mode, max sequence/length limits, chunked prefill, prefix caching, HCCL/OMP/PyTorch-NPU environment variables, 4 Ascend910B3 card resources, and root securityContext.
Qwen3-30B-A3B documentation page
docs/en/inference_guide/qwen3-30b-a3b.mdx
Adds the model page covering model identity (30B/3B-active, BF16), validated hardware×stack matrix (MindIE and vLLM-Ascend at TP=2/TP=4), deployment instructions with asset YAML links and a MindIE root/writable-volume warning, benchmark overview saturation tables and rate-1 latency snapshot, and expandable full open-loop sweep tables (TTFT/E2E/ITL/TPS) for all four engine×TP combinations across Chat/Code/RAG/Long-RAG workloads.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • alauda/aml-docs#234: Both PRs update Ascend vLLM runtime deployment guidance around Modelcar permission modes and securityContext/UID-related settings (runAsUser: 0, runAsNonRoot: false), which this PR also references in its MindIE deployment warning and InferenceService security contexts.

Poem

🐇 Hoppity-hop, the rabbit has arrived,
With MindIE scripts and vLLM configs contrived!
Four Ascend cards hum in TP=4 delight,
sed patches flying, npu-smi burns bright. ✨
Benchmarks in tables, TTFT laid bare—
The bunny deploys Qwen3 with flair! 🚀

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a new inference guide documentation section with validated models for Ascend NPU, which aligns with the core deliverable of the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/inference-guide-validated-models

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml`:
- Around line 60-107: The bash script in the diff lacks strict error handling
mode, which means failed commands like `source` will be silently ignored and the
script will continue executing with a potentially incomplete configuration. Add
strict mode directives (set -e and optionally set -u and set -o pipefail)
immediately after the shebang and before the help() function definition to
ensure the script fails immediately if any command fails, preventing
mindieservice_daemon from starting with partial or broken configuration.

In `@docs/en/inference_guide/index.mdx`:
- Around line 78-85: The documentation instructs users to edit the manifest file
but then applies the remote URL directly using kubectl apply, which bypasses any
local edits. This means the user's changes to metadata.namespace, image tags,
and storageUri are ignored, leaving the deployment with unintended defaults. To
fix this, modify the instructions to first download the remote YAML file to a
local location using curl or wget (storing it in a variable or file), then edit
that local file, and finally apply the local file path instead of the remote URL
in the kubectl apply command.

In `@docs/en/inference_guide/qwen3-14b.mdx`:
- Around line 43-47: The bash code snippet includes a comment stating to "edit
namespace / image tag / storageUri first" but then immediately applies the
remote file directly without demonstrating any editing step, creating a mismatch
between the instructions and the actual command. Either modify the bash commands
to show how to download the file first (using curl or wget), edit it locally,
and then apply the local copy, or update the introductory comment to accurately
reflect that the remote file is being applied directly without local
modifications.

In `@docs/en/inference_guide/qwen3-30b-a3b.mdx`:
- Around line 49-53: The bash snippet instructs users to edit namespace, image
tag, and storageUri values before applying, but then immediately applies from a
remote URL without incorporating those edits. Restructure the snippet to
download the manifest file first using curl or wget into a local variable, then
apply the local file after editing. Alternatively, show how to apply the remote
URL with kubectl set or sed to inject the edited values, ensuring the documented
edit steps actually take effect when kubectl apply is executed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ded490a7-b19e-4d10-bb06-8121804fb4c9

📥 Commits

Reviewing files that changed from the base of the PR and between 5cf3cff and 5aab0b6.

📒 Files selected for processing (7)
  • docs/en/inference_guide/assets/qwen3-14b/qwen3-14b-vllm-ascend-tp1.yaml
  • docs/en/inference_guide/assets/qwen3-14b/qwen3-14b-vllm-ascend-tp2.yaml
  • docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml
  • docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yaml
  • docs/en/inference_guide/index.mdx
  • docs/en/inference_guide/qwen3-14b.mdx
  • docs/en/inference_guide/qwen3-30b-a3b.mdx

Comment on lines +60 to +107
#!/bin/bash
# run_mindie.sh — start MindIE Service for a given model.
# Required: --model-name, --model-path. Optional: --ip, --max-seq-len,
# --max-iter-times, --world-size, ... (run with --help for the full list).
help() { awk -F'### ' '/^###/ { print $2 }' "$0"; }
if [[ $# == 0 ]] || [[ "$1" == "--help" ]]; then help; exit 1; fi

total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs)
if [[ -z "$total_count" ]]; then
echo "Error: unable to read device info (npu-smi). Check permissions/devices."
exit 1
fi
echo "$total_count device(s) detected!"

echo "Setting toolkit envs..."
source /usr/local/Ascend/ascend-toolkit/set_env.sh
echo "Setting MindIE envs..."
source /usr/local/Ascend/mindie/set_env.sh

MF_SCRIPTS_ROOT=$(realpath "$(dirname "$0")")
export PYTHONPATH=$MF_SCRIPTS_ROOT/../:$PYTHONPATH

export MIES_INSTALL_PATH=/usr/local/Ascend/mindie/latest/mindie-service
CONFIG_FILE=${MIES_INSTALL_PATH}/conf/config.json

# defaults
BACKEND_TYPE="atb"; MAX_SEQ_LEN=16384; MAX_PREFILL_TOKENS=16384
MAX_ITER_TIMES=1536; MAX_INPUT_TOKEN_LEN=12288; TRUNCATION=false
HTTPS_ENABLED=false; MULTI_NODES_INFER_ENABLED=false; NPU_MEM_SIZE=-1
MAX_PREFILL_BATCH_SIZE=50; TEMPLATE_TYPE="Standard"; MAX_PREEMPT_COUNT=0
SUPPORT_SELECT_BATCH=false; IP_ADDRESS="0.0.0.0"; PORT=8080
MANAGEMENT_IP_ADDRESS="127.0.0.2"; MANAGEMENT_PORT=1026; METRICS_PORT=1027

while [[ "$#" -gt 0 ]]; do
case $1 in
--model-path) MODEL_WEIGHT_PATH="$2"; shift ;;
--model-name) MODEL_NAME="$2"; shift ;;
--max-seq-len) MAX_SEQ_LEN="$2"; shift ;;
--max-iter-times) MAX_ITER_TIMES="$2"; shift ;;
--max-input-token-len) MAX_INPUT_TOKEN_LEN="$2"; shift ;;
--max-prefill-tokens) MAX_PREFILL_TOKENS="$2"; shift ;;
--world-size) WORLD_SIZE="$2"; shift ;;
--ip) IP_ADDRESS="$2"; shift ;;
--port) PORT="$2"; shift ;;
*) echo "Unknown parameter: $1"; exit 1 ;;
esac
shift
done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Startup script should fail fast on command errors.

Without strict mode, failed source/chmod/sed steps can be ignored and mindieservice_daemon may start with partial config.

🔧 Suggested fix
             #!/bin/bash
+            set -euo pipefail
             # run_mindie.sh — start MindIE Service for a given model.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#!/bin/bash
# run_mindie.sh — start MindIE Service for a given model.
# Required: --model-name, --model-path. Optional: --ip, --max-seq-len,
# --max-iter-times, --world-size, ... (run with --help for the full list).
help() { awk -F'### ' '/^###/ { print $2 }' "$0"; }
if [[ $# == 0 ]] || [[ "$1" == "--help" ]]; then help; exit 1; fi
total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs)
if [[ -z "$total_count" ]]; then
echo "Error: unable to read device info (npu-smi). Check permissions/devices."
exit 1
fi
echo "$total_count device(s) detected!"
echo "Setting toolkit envs..."
source /usr/local/Ascend/ascend-toolkit/set_env.sh
echo "Setting MindIE envs..."
source /usr/local/Ascend/mindie/set_env.sh
MF_SCRIPTS_ROOT=$(realpath "$(dirname "$0")")
export PYTHONPATH=$MF_SCRIPTS_ROOT/../:$PYTHONPATH
export MIES_INSTALL_PATH=/usr/local/Ascend/mindie/latest/mindie-service
CONFIG_FILE=${MIES_INSTALL_PATH}/conf/config.json
# defaults
BACKEND_TYPE="atb"; MAX_SEQ_LEN=16384; MAX_PREFILL_TOKENS=16384
MAX_ITER_TIMES=1536; MAX_INPUT_TOKEN_LEN=12288; TRUNCATION=false
HTTPS_ENABLED=false; MULTI_NODES_INFER_ENABLED=false; NPU_MEM_SIZE=-1
MAX_PREFILL_BATCH_SIZE=50; TEMPLATE_TYPE="Standard"; MAX_PREEMPT_COUNT=0
SUPPORT_SELECT_BATCH=false; IP_ADDRESS="0.0.0.0"; PORT=8080
MANAGEMENT_IP_ADDRESS="127.0.0.2"; MANAGEMENT_PORT=1026; METRICS_PORT=1027
while [[ "$#" -gt 0 ]]; do
case $1 in
--model-path) MODEL_WEIGHT_PATH="$2"; shift ;;
--model-name) MODEL_NAME="$2"; shift ;;
--max-seq-len) MAX_SEQ_LEN="$2"; shift ;;
--max-iter-times) MAX_ITER_TIMES="$2"; shift ;;
--max-input-token-len) MAX_INPUT_TOKEN_LEN="$2"; shift ;;
--max-prefill-tokens) MAX_PREFILL_TOKENS="$2"; shift ;;
--world-size) WORLD_SIZE="$2"; shift ;;
--ip) IP_ADDRESS="$2"; shift ;;
--port) PORT="$2"; shift ;;
*) echo "Unknown parameter: $1"; exit 1 ;;
esac
shift
done
#!/bin/bash
set -euo pipefail
# run_mindie.sh — start MindIE Service for a given model.
# Required: --model-name, --model-path. Optional: --ip, --max-seq-len,
# --max-iter-times, --world-size, ... (run with --help for the full list).
help() { awk -F'### ' '/^###/ { print $2 }' "$0"; }
if [[ $# == 0 ]] || [[ "$1" == "--help" ]]; then help; exit 1; fi
total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs)
if [[ -z "$total_count" ]]; then
echo "Error: unable to read device info (npu-smi). Check permissions/devices."
exit 1
fi
echo "$total_count device(s) detected!"
echo "Setting toolkit envs..."
source /usr/local/Ascend/ascend-toolkit/set_env.sh
echo "Setting MindIE envs..."
source /usr/local/Ascend/mindie/set_env.sh
MF_SCRIPTS_ROOT=$(realpath "$(dirname "$0")")
export PYTHONPATH=$MF_SCRIPTS_ROOT/../:$PYTHONPATH
export MIES_INSTALL_PATH=/usr/local/Ascend/mindie/latest/mindie-service
CONFIG_FILE=${MIES_INSTALL_PATH}/conf/config.json
# defaults
BACKEND_TYPE="atb"; MAX_SEQ_LEN=16384; MAX_PREFILL_TOKENS=16384
MAX_ITER_TIMES=1536; MAX_INPUT_TOKEN_LEN=12288; TRUNCATION=false
HTTPS_ENABLED=false; MULTI_NODES_INFER_ENABLED=false; NPU_MEM_SIZE=-1
MAX_PREFILL_BATCH_SIZE=50; TEMPLATE_TYPE="Standard"; MAX_PREEMPT_COUNT=0
SUPPORT_SELECT_BATCH=false; IP_ADDRESS="0.0.0.0"; PORT=8080
MANAGEMENT_IP_ADDRESS="127.0.0.2"; MANAGEMENT_PORT=1026; METRICS_PORT=1027
while [[ "$#" -gt 0 ]]; do
case $1 in
--model-path) MODEL_WEIGHT_PATH="$2"; shift ;;
--model-name) MODEL_NAME="$2"; shift ;;
--max-seq-len) MAX_SEQ_LEN="$2"; shift ;;
--max-iter-times) MAX_ITER_TIMES="$2"; shift ;;
--max-input-token-len) MAX_INPUT_TOKEN_LEN="$2"; shift ;;
--max-prefill-tokens) MAX_PREFILL_TOKENS="$2"; shift ;;
--world-size) WORLD_SIZE="$2"; shift ;;
--ip) IP_ADDRESS="$2"; shift ;;
--port) PORT="$2"; shift ;;
*) echo "Unknown parameter: $1"; exit 1 ;;
esac
shift
done
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml`
around lines 60 - 107, The bash script in the diff lacks strict error handling
mode, which means failed commands like `source` will be silently ignored and the
script will continue executing with a potentially incomplete configuration. Add
strict mode directives (set -e and optionally set -u and set -o pipefail)
immediately after the shebang and before the help() function definition to
ensure the script fails immediately if any command fails, preventing
mindieservice_daemon from starting with partial or broken configuration.

Comment thread docs/en/inference_guide/index.mdx
Comment thread docs/en/inference_guide/qwen3-14b.mdx Outdated
Comment thread docs/en/inference_guide/qwen3-30b-a3b.mdx
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 18, 2026

Copy link
Copy Markdown

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: 46b1d6d
Status: ✅  Deploy successful!
Preview URL: https://a16cec19.alauda-ai.pages.dev
Branch Preview URL: https://docs-inference-guide-validat.alauda-ai.pages.dev

View logs

zgsu and others added 4 commits June 18, 2026 07:30
… +3 models

- Host all YAML assets + HTML reports under docs/public/ so customers download
  from the docs site (site-absolute /inference_guide/... links), not GitHub.
- Show the complete benchmark data: full 22-column open-loop sweeps (rate 1-9 x
  4 workloads x both engines x TP, TTFT/E2E/ITL/TPS at p90/p95/p99/mean) in
  collapsible <details>, plus the rendered HTML reports as downloadable artifacts.
  Tables generated faithfully from the source reports (no hand-transcription).
- Add three more validated models (5 total):
  - DeepSeek-R1-Distill-Llama-8B (dense, mature Llama path anchor)
  - DeepSeek-R1-Distill-Llama-70B (dense, TP=8; accuracy openllm 6-task mean 0.722)
  - GLM-5.1-W4A8 (MoE, W4A8 quantized, TP=8; Partner-Guide chatbot sweep)
  Each with a namespace-scoped ServingRuntime + InferenceService asset.
- Add domain terms to the cspell dictionary.

Lint and build pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the YAML assets and HTML reports from docs/public/ back under
docs/en/inference_guide/{assets,reports}/ and link them via GitHub
(tree/raw URLs for YAML, blob URL for reports) — matching the existing
training_guides/training-runtimes convention. Reverts the docs-site
public-hosting approach.

Lint and build pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Remove the copied model-auto HTML benchmark reports (and their links) — do
  not ship them in our docs.
- Keep all benchmark *results* (saturation-capacity tables, rate-1 snapshots,
  the full 22-column open-loop sweeps inline, accuracy table, GLM chatbot
  table) but remove the *analysis*: Tuning notes / Insights sections, the
  "Picking an engine" recommendations, and interpretive prose / "recommended"
  labels. Pages now present verified facts, configs, and data only.

Lint and build pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Qwen3-30B-A3B

Apply the rate=1 chatbot ITL P90 ≈ 30ms SLO. Only Qwen3-30B-A3B (MindIE TP=2,
ITL P90 30.8ms / mean 29.0) meets it; remove the models that do not:
- Qwen3-14B (44.6ms), DeepSeek-R1-Distill-Llama-8B (~38ms),
  DeepSeek-R1-Distill-Llama-70B (56ms), GLM-5.1-W4A8 (218ms) — pages + assets.

Add the SLO-compliant MindIE TP=2 asset (the TP=4 asset is 39.8ms, over SLO) and
lead the deploy section with it. Trim the index runtime catalog and analysis text
left over from the removed models.

Lint and build pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/en/inference_guide/qwen3-30b-a3b.mdx (1)

29-31: ⚡ Quick win

Clarify TP=2 availability for vLLM deployment assets.

The validation matrix states vLLM TP=2/TP=4, but the deploy table links only vLLM TP=4. Add a one-line note clarifying whether TP=2 is benchmark-only or provide the TP=2 asset link to avoid reader confusion.

Also applies to: 44-47

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/inference_guide/qwen3-30b-a3b.mdx` around lines 29 - 31, The
validation matrix for vLLM-Ascend indicates support for both TP=2 and TP=4
configurations, but the corresponding deployment table link only references
TP=4, creating ambiguity about TP=2 availability. Add a one-line clarifying note
in or near the vLLM-Ascend row entries that explicitly states whether TP=2 is
benchmark-only or provide the actual deployment asset link for TP=2 to resolve
the discrepancy. Apply the same clarification to the other affected rows
mentioned in the "Also applies to" section (lines 44-47).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp2.yaml`:
- Around line 65-69: The validation for the total_count variable only checks if
it is empty using the -z test, but does not verify that it is a positive
integer. If total_count is zero or contains non-numeric characters, the device
ID generation logic downstream will produce invalid topology configurations.
Enhance the validation condition to check not only that total_count is non-empty
but also that it contains only digits and is greater than zero, rejecting any
non-numeric or zero values with an appropriate error message before the value is
used in device ID generation.

---

Nitpick comments:
In `@docs/en/inference_guide/qwen3-30b-a3b.mdx`:
- Around line 29-31: The validation matrix for vLLM-Ascend indicates support for
both TP=2 and TP=4 configurations, but the corresponding deployment table link
only references TP=4, creating ambiguity about TP=2 availability. Add a one-line
clarifying note in or near the vLLM-Ascend row entries that explicitly states
whether TP=2 is benchmark-only or provide the actual deployment asset link for
TP=2 to resolve the discrepancy. Apply the same clarification to the other
affected rows mentioned in the "Also applies to" section (lines 44-47).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ad4c58bd-11d7-4bac-804b-ff593ac0fe27

📥 Commits

Reviewing files that changed from the base of the PR and between 5aab0b6 and 46b1d6d.

📒 Files selected for processing (6)
  • .cspell/terms.txt
  • docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp2.yaml
  • docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml
  • docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yaml
  • docs/en/inference_guide/index.mdx
  • docs/en/inference_guide/qwen3-30b-a3b.mdx
✅ Files skipped from review due to trivial changes (1)
  • .cspell/terms.txt
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml
  • docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yaml

Comment on lines +65 to +69
total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs)
if [[ -z "$total_count" ]]; then
echo "Error: unable to read device info (npu-smi). Check permissions/devices."
exit 1
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate total_count as a positive integer before building device IDs.

Line 66 only rejects empty output. If total_count is 0 or non-numeric, Line 113 can generate invalid topology and fail later with less actionable errors.

Suggested patch
-            if [[ -z "$total_count" ]]; then
-                echo "Error: unable to read device info (npu-smi). Check permissions/devices."
+            if [[ -z "$total_count" ]] || ! [[ "$total_count" =~ ^[0-9]+$ ]] || [[ "$total_count" -lt 1 ]]; then
+                echo "Error: invalid device count from npu-smi: '$total_count'. Check permissions/devices."
                 exit 1
             fi
             echo "$total_count device(s) detected!"
@@
             # TP follows the allocated device count.
             WORLD_SIZE=$total_count
             NPU_DEVICE_IDS=$(seq -s, 0 $(($WORLD_SIZE - 1)))

Also applies to: 112-114

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp2.yaml`
around lines 65 - 69, The validation for the total_count variable only checks if
it is empty using the -z test, but does not verify that it is a positive
integer. If total_count is zero or contains non-numeric characters, the device
ID generation logic downstream will produce invalid topology configurations.
Enhance the validation condition to check not only that total_count is non-empty
but also that it contains only digits and is greater than zero, rejecting any
non-numeric or zero values with an appropriate error message before the value is
used in device ID generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant