docs: add inference_guide with validated 7B+ models (Ascend NPU) by EdisonSu768 · Pull Request #268 · alauda/aml-docs

EdisonSu768 · 2026-06-18T06:52:08Z

What

New docs/en/inference_guide/ section documenting already-validated open-weight LLM inference, mirroring the structure of training_guides/training-runtimes. All content is derived from verified deployments + benchmarks (no fabricated numbers).

Models (2, ≥7B/8B) ✅
Runtime images (GPU/NPU) — NPU vLLM-Ascend + MindIE catalog, with an NVIDIA GPU note ✅
Runtime YAML examples — namespace-scoped ServingRuntime + InferenceService (not ClusterServingRuntime), one per model/engine/TP combination ✅
Model + runtime + image benchmark results — measured guidellm open-loop per-replica numbers (4 workloads × rate 1–9), with the dense→vLLM / MoE→MindIE engine-selection finding ✅

Files

docs/en/inference_guide/
├── index.mdx                         # overview, runtime-image catalog, engine selection, methodology, deploy steps
├── qwen3-14b.mdx                     # dense model card + benchmark tables
├── qwen3-30b-a3b.mdx                 # MoE model card + benchmark tables
└── assets/
    ├── qwen3-14b/qwen3-14b-vllm-ascend-tp1.yaml
    ├── qwen3-14b/qwen3-14b-vllm-ascend-tp2.yaml
    ├── qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yaml
    └── qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml

Scope note

Every validated 7B+ model currently runs on Ascend 910B NPU; there is no verified GPU benchmark at this size yet (only Qwen3.5-0.8B on A30). The doc therefore leads with NPU and lists the NVIDIA GPU runtime as the platform default with that gap flagged, rather than inventing GPU numbers.

Verification

yarn lint → 0 errors / 0 warnings
yarn build → all 3 pages render
All 4 asset YAMLs parse as valid multi-doc YAML

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added ready-to-deploy configurations for Qwen3-30B-A3B model inference on Ascend 910B3 hardware with MindIE and vLLM-Ascend inference engines.
- Support for tensor parallelism (TP=2 and TP=4) configurations.
Documentation
- Added comprehensive inference deployment guide with per-model benchmarking documentation and performance metrics across Chat, Code, and RAG workloads.

New `inference_guide/` section documenting already-validated open-weight LLM inference, mirroring the structure of `training_guides/training-runtimes`: - Two validated models above the 7B/8B class: - Qwen3-14B (dense, BF16) — recommended engine vLLM-Ascend - Qwen3-30B-A3B (MoE, BF16) — recommended engine MindIE - Runtime images catalog (NPU vLLM-Ascend + MindIE; NVIDIA GPU note) - Per-model namespace-scoped `ServingRuntime` + `InferenceService` assets (not ClusterServingRuntime), one per engine/TP combination - Measured open-loop per-replica benchmark tables (guidellm, 4 workloads, rate 1-9) with the dense→vLLM / MoE→MindIE engine-selection finding Lint and build pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-18T06:52:25Z

Walkthrough

Adds a new inference guide section for Qwen3-30B-A3B on Huawei Ascend 910B3: three KServe YAML asset files (MindIE TP=2, MindIE TP=4, vLLM-Ascend TP=4) and two MDX pages (top-level inference guide index, Qwen3-30B-A3B model page). Twelve custom dictionary terms are also appended to the cspell wordlist.

Changes

Inference Guide: Qwen3-30B-A3B on Ascend 910B3

Layer / File(s)	Summary
Inference guide index page `docs/en/inference_guide/index.mdx`, `.cspell/terms.txt`	Adds the top-level inference guide MDX page with validated model listing, runtime image tag table (vLLM-Ascend v0.18, MindIE), guidellm benchmark methodology definition (open-loop, four workloads, saturation capacity), kubectl deploy instructions with curl test example, and caveats (namespace scoping, Ascend910B3 resource key, HCCL/Modelcar permission modes). Appends 12 inference-related terms to the cspell dictionary.
MindIE TP=2 and TP=4 KServe manifests `docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp2.yaml`, `docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml`	Adds two MindIE ServingRuntime + InferenceService YAML pairs. Both embed a bash startup script that uses `npu-smi` to detect NPU count, derives `WORLD_SIZE`/device IDs, validates the model config, rewrites MindIE `config.json` via extensive `sed` substitutions, and launches `mindieservice_daemon`. TP=2 allocates 2 Ascend910B3 cards; TP=4 allocates 4. Both use `hostIPC: true` and root securityContext.
vLLM-Ascend TP=4 KServe manifest `docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yaml`	Adds a ServingRuntime using vLLM-Ascend v0.18 (OpenAI-compatible server) and a matching InferenceService with TP=4, expert-parallel, eager mode, max sequence/length limits, chunked prefill, prefix caching, HCCL/OMP/PyTorch-NPU environment variables, 4 Ascend910B3 card resources, and root securityContext.
Qwen3-30B-A3B documentation page `docs/en/inference_guide/qwen3-30b-a3b.mdx`	Adds the model page covering model identity (30B/3B-active, BF16), validated hardware×stack matrix (MindIE and vLLM-Ascend at TP=2/TP=4), deployment instructions with asset YAML links and a MindIE root/writable-volume warning, benchmark overview saturation tables and rate-1 latency snapshot, and expandable full open-loop sweep tables (TTFT/E2E/ITL/TPS) for all four engine×TP combinations across Chat/Code/RAG/Long-RAG workloads.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

alauda/aml-docs#234: Both PRs update Ascend vLLM runtime deployment guidance around Modelcar permission modes and securityContext/UID-related settings (runAsUser: 0, runAsNonRoot: false), which this PR also references in its MindIE deployment warning and InferenceService security contexts.

Poem

🐇 Hoppity-hop, the rabbit has arrived,
With MindIE scripts and vLLM configs contrived!
Four Ascend cards hum in TP=4 delight,
sed patches flying, npu-smi burns bright. ✨
Benchmarks in tables, TTFT laid bare—
The bunny deploys Qwen3 with flair! 🚀

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a new inference guide documentation section with validated models for Ascend NPU, which aligns with the core deliverable of the PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/inference-guide-validated-models

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml`:
- Around line 60-107: The bash script in the diff lacks strict error handling
mode, which means failed commands like `source` will be silently ignored and the
script will continue executing with a potentially incomplete configuration. Add
strict mode directives (set -e and optionally set -u and set -o pipefail)
immediately after the shebang and before the help() function definition to
ensure the script fails immediately if any command fails, preventing
mindieservice_daemon from starting with partial or broken configuration.

In `@docs/en/inference_guide/index.mdx`:
- Around line 78-85: The documentation instructs users to edit the manifest file
but then applies the remote URL directly using kubectl apply, which bypasses any
local edits. This means the user's changes to metadata.namespace, image tags,
and storageUri are ignored, leaving the deployment with unintended defaults. To
fix this, modify the instructions to first download the remote YAML file to a
local location using curl or wget (storing it in a variable or file), then edit
that local file, and finally apply the local file path instead of the remote URL
in the kubectl apply command.

In `@docs/en/inference_guide/qwen3-14b.mdx`:
- Around line 43-47: The bash code snippet includes a comment stating to "edit
namespace / image tag / storageUri first" but then immediately applies the
remote file directly without demonstrating any editing step, creating a mismatch
between the instructions and the actual command. Either modify the bash commands
to show how to download the file first (using curl or wget), edit it locally,
and then apply the local copy, or update the introductory comment to accurately
reflect that the remote file is being applied directly without local
modifications.

In `@docs/en/inference_guide/qwen3-30b-a3b.mdx`:
- Around line 49-53: The bash snippet instructs users to edit namespace, image
tag, and storageUri values before applying, but then immediately applies from a
remote URL without incorporating those edits. Restructure the snippet to
download the manifest file first using curl or wget into a local variable, then
apply the local file after editing. Alternatively, show how to apply the remote
URL with kubectl set or sed to inject the edited values, ensuring the documented
edit steps actually take effect when kubectl apply is executed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ded490a7-b19e-4d10-bb06-8121804fb4c9

📥 Commits

Reviewing files that changed from the base of the PR and between 5cf3cff and 5aab0b6.

📒 Files selected for processing (7)

docs/en/inference_guide/assets/qwen3-14b/qwen3-14b-vllm-ascend-tp1.yaml
docs/en/inference_guide/assets/qwen3-14b/qwen3-14b-vllm-ascend-tp2.yaml
docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml
docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yaml
docs/en/inference_guide/index.mdx
docs/en/inference_guide/qwen3-14b.mdx
docs/en/inference_guide/qwen3-30b-a3b.mdx

coderabbitai · 2026-06-18T06:59:02Z

+            #!/bin/bash
+            # run_mindie.sh — start MindIE Service for a given model.
+            # Required: --model-name, --model-path. Optional: --ip, --max-seq-len,
+            # --max-iter-times, --world-size, ... (run with --help for the full list).
+            help() { awk -F'### ' '/^###/ { print $2 }' "$0"; }
+            if [[ $# == 0 ]] || [[ "$1" == "--help" ]]; then help; exit 1; fi
+
+            total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs)
+            if [[ -z "$total_count" ]]; then
+                echo "Error: unable to read device info (npu-smi). Check permissions/devices."
+                exit 1
+            fi
+            echo "$total_count device(s) detected!"
+
+            echo "Setting toolkit envs..."
+            source /usr/local/Ascend/ascend-toolkit/set_env.sh
+            echo "Setting MindIE envs..."
+            source /usr/local/Ascend/mindie/set_env.sh
+
+            MF_SCRIPTS_ROOT=$(realpath "$(dirname "$0")")
+            export PYTHONPATH=$MF_SCRIPTS_ROOT/../:$PYTHONPATH
+
+            export MIES_INSTALL_PATH=/usr/local/Ascend/mindie/latest/mindie-service
+            CONFIG_FILE=${MIES_INSTALL_PATH}/conf/config.json
+
+            # defaults
+            BACKEND_TYPE="atb"; MAX_SEQ_LEN=16384; MAX_PREFILL_TOKENS=16384
+            MAX_ITER_TIMES=1536; MAX_INPUT_TOKEN_LEN=12288; TRUNCATION=false
+            HTTPS_ENABLED=false; MULTI_NODES_INFER_ENABLED=false; NPU_MEM_SIZE=-1
+            MAX_PREFILL_BATCH_SIZE=50; TEMPLATE_TYPE="Standard"; MAX_PREEMPT_COUNT=0
+            SUPPORT_SELECT_BATCH=false; IP_ADDRESS="0.0.0.0"; PORT=8080
+            MANAGEMENT_IP_ADDRESS="127.0.0.2"; MANAGEMENT_PORT=1026; METRICS_PORT=1027
+
+            while [[ "$#" -gt 0 ]]; do
+                case $1 in
+                    --model-path) MODEL_WEIGHT_PATH="$2"; shift ;;
+                    --model-name) MODEL_NAME="$2"; shift ;;
+                    --max-seq-len) MAX_SEQ_LEN="$2"; shift ;;
+                    --max-iter-times) MAX_ITER_TIMES="$2"; shift ;;
+                    --max-input-token-len) MAX_INPUT_TOKEN_LEN="$2"; shift ;;
+                    --max-prefill-tokens) MAX_PREFILL_TOKENS="$2"; shift ;;
+                    --world-size) WORLD_SIZE="$2"; shift ;;
+                    --ip) IP_ADDRESS="$2"; shift ;;
+                    --port) PORT="$2"; shift ;;
+                    *) echo "Unknown parameter: $1"; exit 1 ;;
+                esac
+                shift
+            done


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Startup script should fail fast on command errors.

Without strict mode, failed source/chmod/sed steps can be ignored and mindieservice_daemon may start with partial config.

🔧 Suggested fix

#!/bin/bash + set -euo pipefail # run_mindie.sh — start MindIE Service for a given model.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#!/bin/bash

# run_mindie.sh — start MindIE Service for a given model.

# Required: --model-name, --model-path. Optional: --ip, --max-seq-len,

# --max-iter-times, --world-size, ... (run with --help for the full list).

help() { awk -F'### ' '/^###/ { print $2 }' "$0"; }

if [[ $# == 0 ]] || [[ "$1" == "--help" ]]; then help; exit 1; fi

total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs)

if [[ -z "$total_count" ]]; then

echo "Error: unable to read device info (npu-smi). Check permissions/devices."

exit 1

fi

echo "$total_count device(s) detected!"

echo "Setting toolkit envs..."

source /usr/local/Ascend/ascend-toolkit/set_env.sh

echo "Setting MindIE envs..."

source /usr/local/Ascend/mindie/set_env.sh

MF_SCRIPTS_ROOT=$(realpath "$(dirname "$0")")

export PYTHONPATH=$MF_SCRIPTS_ROOT/../:$PYTHONPATH

export MIES_INSTALL_PATH=/usr/local/Ascend/mindie/latest/mindie-service

CONFIG_FILE=${MIES_INSTALL_PATH}/conf/config.json

# defaults

BACKEND_TYPE="atb"; MAX_SEQ_LEN=16384; MAX_PREFILL_TOKENS=16384

MAX_ITER_TIMES=1536; MAX_INPUT_TOKEN_LEN=12288; TRUNCATION=false

HTTPS_ENABLED=false; MULTI_NODES_INFER_ENABLED=false; NPU_MEM_SIZE=-1

MAX_PREFILL_BATCH_SIZE=50; TEMPLATE_TYPE="Standard"; MAX_PREEMPT_COUNT=0

SUPPORT_SELECT_BATCH=false; IP_ADDRESS="0.0.0.0"; PORT=8080

MANAGEMENT_IP_ADDRESS="127.0.0.2"; MANAGEMENT_PORT=1026; METRICS_PORT=1027

while [[ "$#" -gt 0 ]]; do

case $1 in

--model-path) MODEL_WEIGHT_PATH="$2"; shift ;;

--model-name) MODEL_NAME="$2"; shift ;;

--max-seq-len) MAX_SEQ_LEN="$2"; shift ;;

--max-iter-times) MAX_ITER_TIMES="$2"; shift ;;

--max-input-token-len) MAX_INPUT_TOKEN_LEN="$2"; shift ;;

--max-prefill-tokens) MAX_PREFILL_TOKENS="$2"; shift ;;

--world-size) WORLD_SIZE="$2"; shift ;;

--ip) IP_ADDRESS="$2"; shift ;;

--port) PORT="$2"; shift ;;

*) echo "Unknown parameter: $1"; exit 1 ;;

esac

shift

done

#!/bin/bash

set -euo pipefail

# run_mindie.sh — start MindIE Service for a given model.

# Required: --model-name, --model-path. Optional: --ip, --max-seq-len,

# --max-iter-times, --world-size, ... (run with --help for the full list).

help() { awk -F'### ' '/^###/ { print $2 }' "$0"; }

if [[ $# == 0 ]] || [[ "$1" == "--help" ]]; then help; exit 1; fi

total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs)

if [[ -z "$total_count" ]]; then

echo "Error: unable to read device info (npu-smi). Check permissions/devices."

exit 1

fi

echo "$total_count device(s) detected!"

echo "Setting toolkit envs..."

source /usr/local/Ascend/ascend-toolkit/set_env.sh

echo "Setting MindIE envs..."

source /usr/local/Ascend/mindie/set_env.sh

MF_SCRIPTS_ROOT=$(realpath "$(dirname "$0")")

export PYTHONPATH=$MF_SCRIPTS_ROOT/../:$PYTHONPATH

export MIES_INSTALL_PATH=/usr/local/Ascend/mindie/latest/mindie-service

CONFIG_FILE=${MIES_INSTALL_PATH}/conf/config.json

# defaults

BACKEND_TYPE="atb"; MAX_SEQ_LEN=16384; MAX_PREFILL_TOKENS=16384

MAX_ITER_TIMES=1536; MAX_INPUT_TOKEN_LEN=12288; TRUNCATION=false

HTTPS_ENABLED=false; MULTI_NODES_INFER_ENABLED=false; NPU_MEM_SIZE=-1

MAX_PREFILL_BATCH_SIZE=50; TEMPLATE_TYPE="Standard"; MAX_PREEMPT_COUNT=0

SUPPORT_SELECT_BATCH=false; IP_ADDRESS="0.0.0.0"; PORT=8080

MANAGEMENT_IP_ADDRESS="127.0.0.2"; MANAGEMENT_PORT=1026; METRICS_PORT=1027

while [[ "$#" -gt 0 ]]; do

case $1 in

--model-path) MODEL_WEIGHT_PATH="$2"; shift ;;

--model-name) MODEL_NAME="$2"; shift ;;

--max-seq-len) MAX_SEQ_LEN="$2"; shift ;;

--max-iter-times) MAX_ITER_TIMES="$2"; shift ;;

--max-input-token-len) MAX_INPUT_TOKEN_LEN="$2"; shift ;;

--max-prefill-tokens) MAX_PREFILL_TOKENS="$2"; shift ;;

--world-size) WORLD_SIZE="$2"; shift ;;

--ip) IP_ADDRESS="$2"; shift ;;

--port) PORT="$2"; shift ;;

*) echo "Unknown parameter: $1"; exit 1 ;;

esac

shift

done

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml` around lines 60 - 107, The bash script in the diff lacks strict error handling mode, which means failed commands like `source` will be silently ignored and the script will continue executing with a potentially incomplete configuration. Add strict mode directives (set -e and optionally set -u and set -o pipefail) immediately after the shebang and before the help() function definition to ensure the script fails immediately if any command fails, preventing mindieservice_daemon from starting with partial or broken configuration.

cloudflare-workers-and-pages · 2026-06-18T07:01:54Z

Deploying alauda-ai with Cloudflare Pages

Latest commit:	`46b1d6d`
Status:	✅ Deploy successful!
Preview URL:	https://a16cec19.alauda-ai.pages.dev
Branch Preview URL:	https://docs-inference-guide-validat.alauda-ai.pages.dev

View logs

… +3 models - Host all YAML assets + HTML reports under docs/public/ so customers download from the docs site (site-absolute /inference_guide/... links), not GitHub. - Show the complete benchmark data: full 22-column open-loop sweeps (rate 1-9 x 4 workloads x both engines x TP, TTFT/E2E/ITL/TPS at p90/p95/p99/mean) in collapsible <details>, plus the rendered HTML reports as downloadable artifacts. Tables generated faithfully from the source reports (no hand-transcription). - Add three more validated models (5 total): - DeepSeek-R1-Distill-Llama-8B (dense, mature Llama path anchor) - DeepSeek-R1-Distill-Llama-70B (dense, TP=8; accuracy openllm 6-task mean 0.722) - GLM-5.1-W4A8 (MoE, W4A8 quantized, TP=8; Partner-Guide chatbot sweep) Each with a namespace-scoped ServingRuntime + InferenceService asset. - Add domain terms to the cspell dictionary. Lint and build pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Move the YAML assets and HTML reports from docs/public/ back under docs/en/inference_guide/{assets,reports}/ and link them via GitHub (tree/raw URLs for YAML, blob URL for reports) — matching the existing training_guides/training-runtimes convention. Reverts the docs-site public-hosting approach. Lint and build pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Remove the copied model-auto HTML benchmark reports (and their links) — do not ship them in our docs. - Keep all benchmark *results* (saturation-capacity tables, rate-1 snapshots, the full 22-column open-loop sweeps inline, accuracy table, GLM chatbot table) but remove the *analysis*: Tuning notes / Insights sections, the "Picking an engine" recommendations, and interpretive prose / "recommended" labels. Pages now present verified facts, configs, and data only. Lint and build pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Qwen3-30B-A3B Apply the rate=1 chatbot ITL P90 ≈ 30ms SLO. Only Qwen3-30B-A3B (MindIE TP=2, ITL P90 30.8ms / mean 29.0) meets it; remove the models that do not: - Qwen3-14B (44.6ms), DeepSeek-R1-Distill-Llama-8B (~38ms), DeepSeek-R1-Distill-Llama-70B (56ms), GLM-5.1-W4A8 (218ms) — pages + assets. Add the SLO-compliant MindIE TP=2 asset (the TP=4 asset is 39.8ms, over SLO) and lead the deploy section with it. Trim the index runtime catalog and analysis text left over from the removed models. Lint and build pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

docs/en/inference_guide/qwen3-30b-a3b.mdx (1)
29-31: ⚡ Quick win

Clarify TP=2 availability for vLLM deployment assets.

The validation matrix states vLLM TP=2/TP=4, but the deploy table links only vLLM TP=4. Add a one-line note clarifying whether TP=2 is benchmark-only or provide the TP=2 asset link to avoid reader confusion.

Also applies to: 44-47
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/inference_guide/qwen3-30b-a3b.mdx` around lines 29 - 31, The
validation matrix for vLLM-Ascend indicates support for both TP=2 and TP=4
configurations, but the corresponding deployment table link only references
TP=4, creating ambiguity about TP=2 availability. Add a one-line clarifying note
in or near the vLLM-Ascend row entries that explicitly states whether TP=2 is
benchmark-only or provide the actual deployment asset link for TP=2 to resolve
the discrepancy. Apply the same clarification to the other affected rows
mentioned in the "Also applies to" section (lines 44-47).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp2.yaml`:
- Around line 65-69: The validation for the total_count variable only checks if
it is empty using the -z test, but does not verify that it is a positive
integer. If total_count is zero or contains non-numeric characters, the device
ID generation logic downstream will produce invalid topology configurations.
Enhance the validation condition to check not only that total_count is non-empty
but also that it contains only digits and is greater than zero, rejecting any
non-numeric or zero values with an appropriate error message before the value is
used in device ID generation.

---

Nitpick comments:
In `@docs/en/inference_guide/qwen3-30b-a3b.mdx`:
- Around line 29-31: The validation matrix for vLLM-Ascend indicates support for
both TP=2 and TP=4 configurations, but the corresponding deployment table link
only references TP=4, creating ambiguity about TP=2 availability. Add a one-line
clarifying note in or near the vLLM-Ascend row entries that explicitly states
whether TP=2 is benchmark-only or provide the actual deployment asset link for
TP=2 to resolve the discrepancy. Apply the same clarification to the other
affected rows mentioned in the "Also applies to" section (lines 44-47).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ad4c58bd-11d7-4bac-804b-ff593ac0fe27

📥 Commits

Reviewing files that changed from the base of the PR and between 5aab0b6 and 46b1d6d.

📒 Files selected for processing (6)

.cspell/terms.txt
docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp2.yaml
docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml
docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yaml
docs/en/inference_guide/index.mdx
docs/en/inference_guide/qwen3-30b-a3b.mdx

✅ Files skipped from review due to trivial changes (1)

.cspell/terms.txt

🚧 Files skipped from review as they are similar to previous changes (2)

docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml
docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yaml

coderabbitai · 2026-06-18T08:40:31Z

+            total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs)
+            if [[ -z "$total_count" ]]; then
+                echo "Error: unable to read device info (npu-smi). Check permissions/devices."
+                exit 1
+            fi


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate total_count as a positive integer before building device IDs.

Line 66 only rejects empty output. If total_count is 0 or non-numeric, Line 113 can generate invalid topology and fail later with less actionable errors.

Suggested patch

- if [[ -z "$total_count" ]]; then - echo "Error: unable to read device info (npu-smi). Check permissions/devices." + if [[ -z "$total_count" ]] || ! [[ "$total_count" =~ ^[0-9]+$ ]] || [[ "$total_count" -lt 1 ]]; then + echo "Error: invalid device count from npu-smi: '$total_count'. Check permissions/devices." exit 1 fi echo "$total_count device(s) detected!" @@ # TP follows the allocated device count. WORLD_SIZE=$total_count NPU_DEVICE_IDS=$(seq -s, 0 $(($WORLD_SIZE - 1)))

Also applies to: 112-114

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp2.yaml` around lines 65 - 69, The validation for the total_count variable only checks if it is empty using the -z test, but does not verify that it is a positive integer. If total_count is zero or contains non-numeric characters, the device ID generation logic downstream will produce invalid topology configurations. Enhance the validation condition to check not only that total_count is non-empty but also that it contains only digits and is greater than zero, rejecting any non-numeric or zero values with an appropriate error message before the value is used in device ID generation.

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

zgsu and others added 4 commits June 18, 2026 07:30

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add inference_guide with validated 7B+ models (Ascend NPU)#268

docs: add inference_guide with validated 7B+ models (Ascend NPU)#268
EdisonSu768 wants to merge 5 commits into
masterfrom
docs/inference-guide-validated-models

EdisonSu768 commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EdisonSu768 commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Contents

Files

Scope note

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying alauda-ai with Cloudflare Pages

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EdisonSu768 commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jun 18, 2026 •

edited

Loading