Document missing hf_ptq.py features in PTQ README#1185
Conversation
Add documentation for several hf_ptq.py functionalities that were missing from the PTQ README: - Recipe-based quantization (--recipe): declarative YAML alternative to --qformat for specifying full quantization configuration - KV cache quantization (--kv_cache_qformat): all 8 format choices with usage examples and descriptions - AutoQuantize advanced options: --auto_quantize_method (gradient/kl_div), --auto_quantize_score_size, and --auto_quantize_checkpoint - Expanded supported QFORMAT list from 8 to all 19 available formats - Added simple_eval to the valid tasks list - Clarified that kl_div AutoQuantize method does not require backpropagation Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1185 +/- ##
===========================================
+ Coverage 59.45% 77.08% +17.62%
===========================================
Files 352 352
Lines 40343 40343
===========================================
+ Hits 23987 31099 +7112
+ Misses 16356 9244 -7112
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
📝 WalkthroughWalkthroughDocumentation expansion for the LLM PTQ example README that introduces new sections for recipe-based quantization, KV cache quantization parameters, and AutoQuantize advanced options. Updates include clarified quantization format placeholders, method-specific behavioral notes, and extended task list support. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_ptq/README.md`:
- Line 149: The README lists QFORMAT values but omits fp16 and bf16 accepted by
the --quant flag in scripts/huggingface_example.sh, causing a mismatch with
hf_ptq.py; update the README entry to either add "fp16" and "bf16" to the
Supported `QFORMAT` list or explicitly label the line as "PTQ quantization
formats in hf_ptq.py" to indicate the scope, and mention that
scripts/huggingface_example.sh also accepts fp16 and bf16 for non-PTQ runs so
users know the difference (reference hf_ptq.py, scripts/huggingface_example.sh
and the --quant flag).
- Line 329: Update the README text to use the hyphenated compound adjective
"comma-separated" in both occurrences ("The tasks combo can be specified with a
comma-separated task list" and "please also specify the `--lm_eval_tasks` flag
with comma-separated lm_eval tasks") to match style and improve readability;
locate the sentence referencing the script parser.sh and replace "comma
separated" with "comma-separated".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ae703d73-e4e0-49d2-a078-0a028261c4e6
📒 Files selected for processing (1)
examples/llm_ptq/README.md
| scripts/huggingface_example.sh --model $HF_PATH --quant <QFORMAT> --tp [1|2|4|8] | ||
| ``` | ||
|
|
||
| Supported `QFORMAT` values: `fp8`, `fp8_pc_pt`, `fp8_pb_wo`, `int8`, `int8_sq`, `int8_wo`, `int4_awq`, `w4a8_awq`, `nvfp4`, `nvfp4_awq`, `nvfp4_mse`, `nvfp4_mlp_only`, `nvfp4_experts_only`, `nvfp4_omlp_only`, `nvfp4_svdquant`, `nvfp4_local_hessian`, `w4a8_nvfp4_fp8`, `w4a8_mxfp4_fp8`, `mxfp8`. |
There was a problem hiding this comment.
Clarify --quant scope vs script-accepted values.
This list matches hf_ptq.py quant configs, but scripts/huggingface_example.sh also accepts fp16 and bf16. Please either include them here or explicitly label this as “PTQ quantization formats in hf_ptq.py” to avoid contradiction for script users.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llm_ptq/README.md` at line 149, The README lists QFORMAT values but
omits fp16 and bf16 accepted by the --quant flag in
scripts/huggingface_example.sh, causing a mismatch with hf_ptq.py; update the
README entry to either add "fp16" and "bf16" to the Supported `QFORMAT` list or
explicitly label the line as "PTQ quantization formats in hf_ptq.py" to indicate
the scope, and mention that scripts/huggingface_example.sh also accepts fp16 and
bf16 for non-PTQ runs so users know the difference (reference hf_ptq.py,
scripts/huggingface_example.sh and the --quant flag).
| --auto_quantize_bits 4.75 --auto_quantize_method kl_div --auto_quantize_score_size 64 | ||
| ``` | ||
|
|
||
| The example scripts above also have an additional flag `--tasks`, where the actual tasks run in the script can be customized. The allowed tasks are `quant,mmlu,lm_eval,livecodebench,simple_eval` specified in the script [parser](./scripts/parser.sh). The tasks combo can be specified with a comma-separated task list. Some tasks like mmlu can take a long time to run. To run lm_eval tasks, please also specify the `--lm_eval_tasks` flag with comma separated lm_eval tasks [here](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks). |
There was a problem hiding this comment.
Use hyphenated compound adjective.
Change “comma separated” to “comma-separated” for consistency/readability.
🧰 Tools
🪛 LanguageTool
[grammar] ~329-~329: Use a hyphen to join words.
Context: ...fy the --lm_eval_tasks flag with comma separated lm_eval tasks [here](https://g...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llm_ptq/README.md` at line 329, Update the README text to use the
hyphenated compound adjective "comma-separated" in both occurrences ("The tasks
combo can be specified with a comma-separated task list" and "please also
specify the `--lm_eval_tasks` flag with comma-separated lm_eval tasks") to match
style and improve readability; locate the sentence referencing the script
parser.sh and replace "comma separated" with "comma-separated".
### What does this PR do? Add documentation for several hf_ptq.py functionalities that were missing from the PTQ README: - Recipe-based quantization (--recipe): declarative YAML alternative to --qformat for specifying full quantization configuration - KV cache quantization (--kv_cache_qformat): all 8 format choices with usage examples and descriptions - AutoQuantize advanced options: --auto_quantize_method (gradient/kl_div), --auto_quantize_score_size, and --auto_quantize_checkpoint - Expanded supported QFORMAT list from 8 to all 19 available formats - Added simple_eval to the valid tasks list - Clarified that kl_div AutoQuantize method does not require backpropagation <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Updated quantization format documentation with supported options and usage examples. * Added recipe-based quantization guidance with YAML configuration and precedence details. * Expanded KV cache quantization documentation with available formats and configuration options. * Enhanced AutoQuantize documentation with advanced options and clarified method-specific requirements. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
### What does this PR do? Add documentation for several hf_ptq.py functionalities that were missing from the PTQ README: - Recipe-based quantization (--recipe): declarative YAML alternative to --qformat for specifying full quantization configuration - KV cache quantization (--kv_cache_qformat): all 8 format choices with usage examples and descriptions - AutoQuantize advanced options: --auto_quantize_method (gradient/kl_div), --auto_quantize_score_size, and --auto_quantize_checkpoint - Expanded supported QFORMAT list from 8 to all 19 available formats - Added simple_eval to the valid tasks list - Clarified that kl_div AutoQuantize method does not require backpropagation <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Updated quantization format documentation with supported options and usage examples. * Added recipe-based quantization guidance with YAML configuration and precedence details. * Expanded KV cache quantization documentation with available formats and configuration options. * Enhanced AutoQuantize documentation with advanced options and clarified method-specific requirements. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
What does this PR do?
Add documentation for several hf_ptq.py functionalities that were missing from the PTQ README:
Summary by CodeRabbit