Document missing hf_ptq.py features in PTQ README by shengliangxu · Pull Request #1185 · NVIDIA/Model-Optimizer

shengliangxu · 2026-04-07T01:03:42Z

What does this PR do?

Add documentation for several hf_ptq.py functionalities that were missing from the PTQ README:

Recipe-based quantization (--recipe): declarative YAML alternative to --qformat for specifying full quantization configuration
KV cache quantization (--kv_cache_qformat): all 8 format choices with usage examples and descriptions
AutoQuantize advanced options: --auto_quantize_method (gradient/kl_div), --auto_quantize_score_size, and --auto_quantize_checkpoint
Expanded supported QFORMAT list from 8 to all 19 available formats
Added simple_eval to the valid tasks list
Clarified that kl_div AutoQuantize method does not require backpropagation

Summary by CodeRabbit

Documentation
- Updated quantization format documentation with supported options and usage examples.
- Added recipe-based quantization guidance with YAML configuration and precedence details.
- Expanded KV cache quantization documentation with available formats and configuration options.
- Enhanced AutoQuantize documentation with advanced options and clarified method-specific requirements.

Add documentation for several hf_ptq.py functionalities that were missing from the PTQ README: - Recipe-based quantization (--recipe): declarative YAML alternative to --qformat for specifying full quantization configuration - KV cache quantization (--kv_cache_qformat): all 8 format choices with usage examples and descriptions - AutoQuantize advanced options: --auto_quantize_method (gradient/kl_div), --auto_quantize_score_size, and --auto_quantize_checkpoint - Expanded supported QFORMAT list from 8 to all 19 available formats - Added simple_eval to the valid tasks list - Clarified that kl_div AutoQuantize method does not require backpropagation Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

github-actions · 2026-04-07T01:08:56Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-08 07:25 UTC

codecov · 2026-04-07T01:18:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.08%. Comparing base (0246041) to head (f407af6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1185       +/-   ##
===========================================
+ Coverage   59.45%   77.08%   +17.62%     
===========================================
  Files         352      352               
  Lines       40343    40343               
===========================================
+ Hits        23987    31099     +7112     
+ Misses      16356     9244     -7112

Flag	Coverage Δ
examples	`44.33% <ø> (+12.23%)`	⬆️
unit	`55.03% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

examples/llm_ptq/README.md

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

coderabbitai · 2026-04-07T22:13:42Z

📝 Walkthrough

Walkthrough

Documentation expansion for the LLM PTQ example README that introduces new sections for recipe-based quantization, KV cache quantization parameters, and AutoQuantize advanced options. Updates include clarified quantization format placeholders, method-specific behavioral notes, and extended task list support.

Changes

Cohort / File(s)	Summary
Documentation `examples/llm_ptq/README.md`	Updated Hugging Face invocation examples with placeholder syntax; added sections for recipe-based quantization (`--recipe` YAML), KV cache quantization (`--kv_cache_qformat` with format table), and AutoQuantize advanced options. Clarified backpropagation requirement applies only to `gradient` method, not `kl_div`. Expanded `--tasks` list to include `simple_eval`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: documenting previously missing hf_ptq.py features in the PTQ README file.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns	✅ Passed	PR contains only documentation changes to README.md with no Python code modifications or security anti-patterns introduced.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch shengliangx/recipe-readme

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/README.md`:
- Line 149: The README lists QFORMAT values but omits fp16 and bf16 accepted by
the --quant flag in scripts/huggingface_example.sh, causing a mismatch with
hf_ptq.py; update the README entry to either add "fp16" and "bf16" to the
Supported `QFORMAT` list or explicitly label the line as "PTQ quantization
formats in hf_ptq.py" to indicate the scope, and mention that
scripts/huggingface_example.sh also accepts fp16 and bf16 for non-PTQ runs so
users know the difference (reference hf_ptq.py, scripts/huggingface_example.sh
and the --quant flag).
- Line 329: Update the README text to use the hyphenated compound adjective
"comma-separated" in both occurrences ("The tasks combo can be specified with a
comma-separated task list" and "please also specify the `--lm_eval_tasks` flag
with comma-separated lm_eval tasks") to match style and improve readability;
locate the sentence referencing the script parser.sh and replace "comma
separated" with "comma-separated".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ae703d73-e4e0-49d2-a078-0a028261c4e6

📥 Commits

Reviewing files that changed from the base of the PR and between af2fe24 and 0374344.

📒 Files selected for processing (1)

examples/llm_ptq/README.md

coderabbitai · 2026-04-07T22:19:27Z

examples/llm_ptq/README.md

+scripts/huggingface_example.sh --model $HF_PATH --quant <QFORMAT> --tp [1|2|4|8]
 ```

+Supported `QFORMAT` values: `fp8`, `fp8_pc_pt`, `fp8_pb_wo`, `int8`, `int8_sq`, `int8_wo`, `int4_awq`, `w4a8_awq`, `nvfp4`, `nvfp4_awq`, `nvfp4_mse`, `nvfp4_mlp_only`, `nvfp4_experts_only`, `nvfp4_omlp_only`, `nvfp4_svdquant`, `nvfp4_local_hessian`, `w4a8_nvfp4_fp8`, `w4a8_mxfp4_fp8`, `mxfp8`.


⚠️ Potential issue | 🟡 Minor

Clarify --quant scope vs script-accepted values.

This list matches hf_ptq.py quant configs, but scripts/huggingface_example.sh also accepts fp16 and bf16. Please either include them here or explicitly label this as “PTQ quantization formats in hf_ptq.py” to avoid contradiction for script users.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/README.md` at line 149, The README lists QFORMAT values but omits fp16 and bf16 accepted by the --quant flag in scripts/huggingface_example.sh, causing a mismatch with hf_ptq.py; update the README entry to either add "fp16" and "bf16" to the Supported `QFORMAT` list or explicitly label the line as "PTQ quantization formats in hf_ptq.py" to indicate the scope, and mention that scripts/huggingface_example.sh also accepts fp16 and bf16 for non-PTQ runs so users know the difference (reference hf_ptq.py, scripts/huggingface_example.sh and the --quant flag).

coderabbitai · 2026-04-07T22:19:27Z

examples/llm_ptq/README.md

+  --auto_quantize_bits 4.75 --auto_quantize_method kl_div --auto_quantize_score_size 64
+```
+
+The example scripts above also have an additional flag `--tasks`, where the actual tasks run in the script can be customized. The allowed tasks are `quant,mmlu,lm_eval,livecodebench,simple_eval` specified in the script [parser](./scripts/parser.sh). The tasks combo can be specified with a comma-separated task list. Some tasks like mmlu can take a long time to run. To run lm_eval tasks, please also specify the `--lm_eval_tasks` flag with comma separated lm_eval tasks [here](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks).


⚠️ Potential issue | 🟡 Minor

Use hyphenated compound adjective.

Change “comma separated” to “comma-separated” for consistency/readability.

🧰 Tools

🪛 LanguageTool

[grammar] ~329-~329: Use a hyphen to join words.
Context: ...fy the --lm_eval_tasks flag with comma separated lm_eval tasks [here](https://g...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/README.md` at line 329, Update the README text to use the hyphenated compound adjective "comma-separated" in both occurrences ("The tasks combo can be specified with a comma-separated task list" and "please also specify the `--lm_eval_tasks` flag with comma-separated lm_eval tasks") to match style and improve readability; locate the sentence referencing the script parser.sh and replace "comma separated" with "comma-separated".

### What does this PR do? Add documentation for several hf_ptq.py functionalities that were missing from the PTQ README: - Recipe-based quantization (--recipe): declarative YAML alternative to --qformat for specifying full quantization configuration - KV cache quantization (--kv_cache_qformat): all 8 format choices with usage examples and descriptions - AutoQuantize advanced options: --auto_quantize_method (gradient/kl_div), --auto_quantize_score_size, and --auto_quantize_checkpoint - Expanded supported QFORMAT list from 8 to all 19 available formats - Added simple_eval to the valid tasks list - Clarified that kl_div AutoQuantize method does not require backpropagation  ## Summary by CodeRabbit * **Documentation** * Updated quantization format documentation with supported options and usage examples. * Added recipe-based quantization guidance with YAML configuration and precedence details. * Expanded KV cache quantization documentation with available formats and configuration options. * Enhanced AutoQuantize documentation with advanced options and clarified method-specific requirements.  --------- Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu requested a review from a team as a code owner April 7, 2026 01:03

shengliangxu requested a review from realAsma April 7, 2026 01:03

realAsma reviewed Apr 7, 2026

View reviewed changes

examples/llm_ptq/README.md Outdated Show resolved Hide resolved

realAsma approved these changes Apr 7, 2026

View reviewed changes

realAsma requested review from Edwardf0t1, kinjalpatel27 and meenchen April 7, 2026 20:47

shengliangxu requested review from a team and removed request for meenchen April 7, 2026 21:43

custom recipes

02294f3

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Merge branch 'main' into shengliangx/recipe-readme

0374344

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

Merge branch 'main' into shengliangx/recipe-readme

f407af6

shengliangxu merged commit 7144482 into main Apr 8, 2026
79 of 88 checks passed

shengliangxu deleted the shengliangx/recipe-readme branch April 8, 2026 07:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document missing hf_ptq.py features in PTQ README#1185

Document missing hf_ptq.py features in PTQ README#1185
shengliangxu merged 4 commits intomainfrom
shengliangx/recipe-readme

shengliangxu commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai bot commented Apr 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 7, 2026

Uh oh!

coderabbitai bot Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shengliangxu commented Apr 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

github-actions bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

coderabbitai bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shengliangxu commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

github-actions bot commented Apr 7, 2026 •

edited

Loading

codecov bot commented Apr 7, 2026 •

edited

Loading

coderabbitai bot commented Apr 7, 2026 •

edited

Loading