Skip to content

Document missing hf_ptq.py features in PTQ README#1185

Merged
shengliangxu merged 4 commits intomainfrom
shengliangx/recipe-readme
Apr 8, 2026
Merged

Document missing hf_ptq.py features in PTQ README#1185
shengliangxu merged 4 commits intomainfrom
shengliangx/recipe-readme

Conversation

@shengliangxu
Copy link
Copy Markdown
Collaborator

@shengliangxu shengliangxu commented Apr 7, 2026

What does this PR do?

Add documentation for several hf_ptq.py functionalities that were missing from the PTQ README:

  • Recipe-based quantization (--recipe): declarative YAML alternative to --qformat for specifying full quantization configuration
  • KV cache quantization (--kv_cache_qformat): all 8 format choices with usage examples and descriptions
  • AutoQuantize advanced options: --auto_quantize_method (gradient/kl_div), --auto_quantize_score_size, and --auto_quantize_checkpoint
  • Expanded supported QFORMAT list from 8 to all 19 available formats
  • Added simple_eval to the valid tasks list
  • Clarified that kl_div AutoQuantize method does not require backpropagation

Summary by CodeRabbit

  • Documentation
    • Updated quantization format documentation with supported options and usage examples.
    • Added recipe-based quantization guidance with YAML configuration and precedence details.
    • Expanded KV cache quantization documentation with available formats and configuration options.
    • Enhanced AutoQuantize documentation with advanced options and clarified method-specific requirements.

Add documentation for several hf_ptq.py functionalities that were missing
from the PTQ README:

- Recipe-based quantization (--recipe): declarative YAML alternative to
  --qformat for specifying full quantization configuration
- KV cache quantization (--kv_cache_qformat): all 8 format choices with
  usage examples and descriptions
- AutoQuantize advanced options: --auto_quantize_method (gradient/kl_div),
  --auto_quantize_score_size, and --auto_quantize_checkpoint
- Expanded supported QFORMAT list from 8 to all 19 available formats
- Added simple_eval to the valid tasks list
- Clarified that kl_div AutoQuantize method does not require backpropagation

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu requested a review from a team as a code owner April 7, 2026 01:03
@shengliangxu shengliangxu requested a review from realAsma April 7, 2026 01:03
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-08 07:25 UTC

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.08%. Comparing base (0246041) to head (f407af6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1185       +/-   ##
===========================================
+ Coverage   59.45%   77.08%   +17.62%     
===========================================
  Files         352      352               
  Lines       40343    40343               
===========================================
+ Hits        23987    31099     +7112     
+ Misses      16356     9244     -7112     
Flag Coverage Δ
examples 44.33% <ø> (+12.23%) ⬆️
unit 55.03% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shengliangxu shengliangxu requested review from a team and removed request for meenchen April 7, 2026 21:43
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

Documentation expansion for the LLM PTQ example README that introduces new sections for recipe-based quantization, KV cache quantization parameters, and AutoQuantize advanced options. Updates include clarified quantization format placeholders, method-specific behavioral notes, and extended task list support.

Changes

Cohort / File(s) Summary
Documentation
examples/llm_ptq/README.md
Updated Hugging Face invocation examples with placeholder syntax; added sections for recipe-based quantization (--recipe YAML), KV cache quantization (--kv_cache_qformat with format table), and AutoQuantize advanced options. Clarified backpropagation requirement applies only to gradient method, not kl_div. Expanded --tasks list to include simple_eval.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: documenting previously missing hf_ptq.py features in the PTQ README file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns ✅ Passed PR contains only documentation changes to README.md with no Python code modifications or security anti-patterns introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shengliangx/recipe-readme

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/README.md`:
- Line 149: The README lists QFORMAT values but omits fp16 and bf16 accepted by
the --quant flag in scripts/huggingface_example.sh, causing a mismatch with
hf_ptq.py; update the README entry to either add "fp16" and "bf16" to the
Supported `QFORMAT` list or explicitly label the line as "PTQ quantization
formats in hf_ptq.py" to indicate the scope, and mention that
scripts/huggingface_example.sh also accepts fp16 and bf16 for non-PTQ runs so
users know the difference (reference hf_ptq.py, scripts/huggingface_example.sh
and the --quant flag).
- Line 329: Update the README text to use the hyphenated compound adjective
"comma-separated" in both occurrences ("The tasks combo can be specified with a
comma-separated task list" and "please also specify the `--lm_eval_tasks` flag
with comma-separated lm_eval tasks") to match style and improve readability;
locate the sentence referencing the script parser.sh and replace "comma
separated" with "comma-separated".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ae703d73-e4e0-49d2-a078-0a028261c4e6

📥 Commits

Reviewing files that changed from the base of the PR and between af2fe24 and 0374344.

📒 Files selected for processing (1)
  • examples/llm_ptq/README.md

scripts/huggingface_example.sh --model $HF_PATH --quant <QFORMAT> --tp [1|2|4|8]
```

Supported `QFORMAT` values: `fp8`, `fp8_pc_pt`, `fp8_pb_wo`, `int8`, `int8_sq`, `int8_wo`, `int4_awq`, `w4a8_awq`, `nvfp4`, `nvfp4_awq`, `nvfp4_mse`, `nvfp4_mlp_only`, `nvfp4_experts_only`, `nvfp4_omlp_only`, `nvfp4_svdquant`, `nvfp4_local_hessian`, `w4a8_nvfp4_fp8`, `w4a8_mxfp4_fp8`, `mxfp8`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify --quant scope vs script-accepted values.

This list matches hf_ptq.py quant configs, but scripts/huggingface_example.sh also accepts fp16 and bf16. Please either include them here or explicitly label this as “PTQ quantization formats in hf_ptq.py” to avoid contradiction for script users.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/README.md` at line 149, The README lists QFORMAT values but
omits fp16 and bf16 accepted by the --quant flag in
scripts/huggingface_example.sh, causing a mismatch with hf_ptq.py; update the
README entry to either add "fp16" and "bf16" to the Supported `QFORMAT` list or
explicitly label the line as "PTQ quantization formats in hf_ptq.py" to indicate
the scope, and mention that scripts/huggingface_example.sh also accepts fp16 and
bf16 for non-PTQ runs so users know the difference (reference hf_ptq.py,
scripts/huggingface_example.sh and the --quant flag).

--auto_quantize_bits 4.75 --auto_quantize_method kl_div --auto_quantize_score_size 64
```

The example scripts above also have an additional flag `--tasks`, where the actual tasks run in the script can be customized. The allowed tasks are `quant,mmlu,lm_eval,livecodebench,simple_eval` specified in the script [parser](./scripts/parser.sh). The tasks combo can be specified with a comma-separated task list. Some tasks like mmlu can take a long time to run. To run lm_eval tasks, please also specify the `--lm_eval_tasks` flag with comma separated lm_eval tasks [here](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use hyphenated compound adjective.

Change “comma separated” to “comma-separated” for consistency/readability.

🧰 Tools
🪛 LanguageTool

[grammar] ~329-~329: Use a hyphen to join words.
Context: ...fy the --lm_eval_tasks flag with comma separated lm_eval tasks [here](https://g...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/README.md` at line 329, Update the README text to use the
hyphenated compound adjective "comma-separated" in both occurrences ("The tasks
combo can be specified with a comma-separated task list" and "please also
specify the `--lm_eval_tasks` flag with comma-separated lm_eval tasks") to match
style and improve readability; locate the sentence referencing the script
parser.sh and replace "comma separated" with "comma-separated".

@shengliangxu shengliangxu merged commit 7144482 into main Apr 8, 2026
79 of 88 checks passed
@shengliangxu shengliangxu deleted the shengliangx/recipe-readme branch April 8, 2026 07:24
Edwardf0t1 pushed a commit that referenced this pull request Apr 9, 2026
### What does this PR do?

Add documentation for several hf_ptq.py functionalities that were
missing from the PTQ README:

- Recipe-based quantization (--recipe): declarative YAML alternative to
--qformat for specifying full quantization configuration
- KV cache quantization (--kv_cache_qformat): all 8 format choices with
usage examples and descriptions
- AutoQuantize advanced options: --auto_quantize_method
(gradient/kl_div), --auto_quantize_score_size, and
--auto_quantize_checkpoint
- Expanded supported QFORMAT list from 8 to all 19 available formats
- Added simple_eval to the valid tasks list
- Clarified that kl_div AutoQuantize method does not require
backpropagation


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Updated quantization format documentation with supported options and
usage examples.
* Added recipe-based quantization guidance with YAML configuration and
precedence details.
* Expanded KV cache quantization documentation with available formats
and configuration options.
* Enhanced AutoQuantize documentation with advanced options and
clarified method-specific requirements.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
kinjalpatel27 pushed a commit that referenced this pull request Apr 13, 2026
### What does this PR do?

Add documentation for several hf_ptq.py functionalities that were
missing from the PTQ README:

- Recipe-based quantization (--recipe): declarative YAML alternative to
--qformat for specifying full quantization configuration
- KV cache quantization (--kv_cache_qformat): all 8 format choices with
usage examples and descriptions
- AutoQuantize advanced options: --auto_quantize_method
(gradient/kl_div), --auto_quantize_score_size, and
--auto_quantize_checkpoint
- Expanded supported QFORMAT list from 8 to all 19 available formats
- Added simple_eval to the valid tasks list
- Clarified that kl_div AutoQuantize method does not require
backpropagation


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Updated quantization format documentation with supported options and
usage examples.
* Added recipe-based quantization guidance with YAML configuration and
precedence details.
* Expanded KV cache quantization documentation with available formats
and configuration options.
* Enhanced AutoQuantize documentation with advanced options and
clarified method-specific requirements.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants