Add calib_include/exclude_modules to calibration algorithms#1043
Add calib_include/exclude_modules to calibration algorithms#1043
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRenamed the base calibration config to Changes
Sequence DiagramsequenceDiagram
participant User
participant Config as CalibrationConfig
participant Mode as wrapped_calib_func
participant Filter as filter_calib_modules
participant Model as QuantizedModel
participant Quant as TensorQuantizers
User->>Config: provide include_modules / exclude_modules
Config-->>Mode: pass config (include/exclude)
Mode->>Filter: enter context with patterns
Filter->>Model: scan modules (fnmatch)
Filter->>Quant: disable quantizers in non-matching modules
Mode->>Model: run calibration routine (collect stats)
rect rgba(100, 150, 200, 0.5)
Model->>Quant: update amax/buffers for matching modules
end
Mode->>Filter: exit context
Filter->>Quant: re-enable previously disabled quantizers
Filter-->>Mode: context ends
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1043 +/- ##
===========================================
- Coverage 70.18% 54.89% -15.30%
===========================================
Files 228 350 +122
Lines 25952 40256 +14304
===========================================
+ Hits 18215 22097 +3882
- Misses 7737 18159 +10422
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/unit/torch/quantization/test_calib.py (1)
469-473: Strengthen the no-op assertion to validate actual no-op behavior.This currently only checks that
amaxis present. It should compare against the baseline snapshot to verify no values changed.💡 Proposed test tightening
# Amaxes should be consistent with standard max calibration (not None) for name in amaxes_before: amax_after = _get_weight_amax(model, name) - assert amax_after is not None, f"{name} should have a valid amax after calibration" + assert torch.allclose(amaxes_before[name], amax_after), ( + f"{name} changed under no-op filter context" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/torch/quantization/test_calib.py` around lines 469 - 473, The test currently only asserts presence of amax after calibration; strengthen it by asserting the amax values did not change: for each name in amaxes_before, fetch amax_after using _get_weight_amax(model, name) and assert equality (or approximate equality if floats) against amaxes_before[name] instead of just checking not None; update the loop that references amaxes_before and _get_weight_amax to perform the comparison (use math.isclose or pytest.approx for float comparisons).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 252-266: The code currently mutates quant_cfg["algorithm"] in
place (when it's a dict or converted from a string), which can leak changes into
shared presets; update the logic to create a copy of the algorithm dict before
adding calib_exclude_modules/calib_include_modules (e.g., if
isinstance(quant_cfg["algorithm"], str) set quant_cfg["algorithm"] = {"method":
...} as a new dict, and if it's a dict replace it with a shallow or deep copy
like new_alg = dict(quant_cfg["algorithm"]) or copy.deepcopy(...) and assign
quant_cfg["algorithm"] = new_alg) then add the calib keys to that copy,
referencing quant_cfg["algorithm"], calib_exclude_modules, and
calib_include_modules when applying the changes.
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 1249-1258: The parsed module-pattern lists
args.calib_exclude_modules and args.calib_include_modules must drop
empty/whitespace-only entries; update the list comprehensions to filter out
items where p.strip() is empty (e.g., use [p.strip() for p in
args.calib_exclude_modules.split(",") if p.strip()] and similarly for
calib_include_modules) and keep the existing conditional that yields None when
the original arg is falsy.
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 107-113: The current loop only inspects modules where
is_quantized_linear(module) is true, so TensorQuantizer instances in non-linear
modules are not disabled when _should_calibrate(name) is false; change the logic
to iterate all modules from model.named_modules(), and for any module whose name
fails _should_calibrate(name) traverse its children to find TensorQuantizer
instances and call disable() on those not already _disabled, appending them to
disabled (i.e., replace the is_quantized_linear(module) guard with a direct
check of _should_calibrate(name) and disable all TensorQuantizer children
accordingly).
---
Nitpick comments:
In `@tests/unit/torch/quantization/test_calib.py`:
- Around line 469-473: The test currently only asserts presence of amax after
calibration; strengthen it by asserting the amax values did not change: for each
name in amaxes_before, fetch amax_after using _get_weight_amax(model, name) and
assert equality (or approximate equality if floats) against amaxes_before[name]
instead of just checking not None; update the loop that references amaxes_before
and _get_weight_amax to perform the comparison (use math.isclose or
pytest.approx for float comparisons).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 244f9095-d6a1-4b28-b47a-e77c51d699d9
📒 Files selected for processing (6)
examples/llm_ptq/example_utils.pyexamples/llm_ptq/hf_ptq.pymodelopt/torch/quantization/config.pymodelopt/torch/quantization/mode.pymodelopt/torch/quantization/model_calib.pytests/unit/torch/quantization/test_calib.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 251-265: The Gemma/smoothquant override replaces
quant_cfg["algorithm"] with a new dict and inadvertently drops
calib_include_modules/calib_exclude_modules; when you set the Gemma override
(the code path that assigns quant_cfg["algorithm"] = {"method": "int8_sq", ...}
or similar), merge or copy any existing
calib_include_modules/calib_exclude_modules from the previous
quant_cfg["algorithm"] (or from the local
calib_include_modules/calib_exclude_modules variables) into the new dict instead
of overwriting them — i.e., build the override dict then set
override_dict["calib_include_modules"]=calib_include_modules (if present) and
override_dict["calib_exclude_modules"]=calib_exclude_modules (if present) before
assigning back to quant_cfg["algorithm"] so the filters are preserved for
Gemma/smoothquant.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a2cf9898-e53a-4981-8c68-5cdd70010bd6
📒 Files selected for processing (5)
examples/llm_ptq/example_utils.pyexamples/llm_ptq/hf_ptq.pymodelopt/torch/quantization/config.pymodelopt/torch/quantization/model_calib.pytests/unit/torch/quantization/test_calib.py
🚧 Files skipped from review as they are similar to previous changes (2)
- modelopt/torch/quantization/config.py
- examples/llm_ptq/hf_ptq.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
examples/llm_ptq/example_utils.py (1)
250-251:⚠️ Potential issue | 🟠 MajorGemma override still drops calibration filters and other algorithm fields.
Replacing
quant_cfg["algorithm"]here discards previously set keys (e.g.,calib_include_modules,calib_exclude_modules, andmoe_calib_experts_ratio), so filtering silently stops working on this path.Suggested fix
- if model_type == "gemma" and "int8_sq" in qformat: - quant_cfg["algorithm"] = {"method": "smoothquant", "alpha": 0.5} + if model_type == "gemma" and "int8_sq" in qformat: + if isinstance(quant_cfg.get("algorithm"), dict): + quant_cfg["algorithm"]["method"] = "smoothquant" + quant_cfg["algorithm"]["alpha"] = 0.5 + else: + quant_cfg["algorithm"] = {"method": "smoothquant", "alpha": 0.5}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/example_utils.py` around lines 250 - 251, In the branch where model_type == "gemma" and "int8_sq" in qformat, don't replace quant_cfg["algorithm"] outright (which discards existing keys like calib_include_modules, calib_exclude_modules, and moe_calib_experts_ratio); instead merge the new algorithm entries into the existing dict (e.g., ensure quant_cfg.get("algorithm", {}) is updated with {"method": "smoothquant", "alpha": 0.5"}) so existing calibration/filtering fields are preserved while setting/overriding only the needed algorithm keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 250-251: In the branch where model_type == "gemma" and "int8_sq"
in qformat, don't replace quant_cfg["algorithm"] outright (which discards
existing keys like calib_include_modules, calib_exclude_modules, and
moe_calib_experts_ratio); instead merge the new algorithm entries into the
existing dict (e.g., ensure quant_cfg.get("algorithm", {}) is updated with
{"method": "smoothquant", "alpha": 0.5"}) so existing calibration/filtering
fields are preserved while setting/overriding only the needed algorithm keys.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fd05d10c-16cc-4a19-aaa2-7c37c90c903f
📒 Files selected for processing (3)
examples/llm_ptq/example_utils.pyexamples/llm_ptq/hf_ptq.pytests/unit/torch/quantization/test_calib.py
💤 Files with no reviewable changes (1)
- examples/llm_ptq/hf_ptq.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/torch/quantization/test_calib.py
realAsma
left a comment
There was a problem hiding this comment.
Looks great! Left some comments - please address it.
|
Questions:
|
| calib_include_modules: list[str] | None = ModeloptField( | ||
| default=None, | ||
| title="Patterns of modules to include in calibration.", | ||
| description=( | ||
| "If provided, only modules whose names match at least one of the fnmatch patterns are " | ||
| "calibrated. Modules that do not match any pattern are skipped and retain their " | ||
| "pre-existing calibration state. " | ||
| "Note: filtering applies only to quantized linear modules; TensorQuantizers in " | ||
| "non-linear modules (e.g. layer norms, embeddings) are unaffected." | ||
| ), | ||
| ) | ||
|
|
||
| calib_exclude_modules: list[str] | None = ModeloptField( | ||
| default=None, | ||
| title="Patterns of modules to exclude from calibration.", | ||
| description=( | ||
| "If provided, modules whose names match at least one of the fnmatch patterns are " | ||
| "skipped during calibration and retain their pre-existing calibration state. " | ||
| "Note: filtering applies only to quantized linear modules; TensorQuantizers in " | ||
| "non-linear modules (e.g. layer norms, embeddings) are unaffected." | ||
| ), | ||
| ) |
There was a problem hiding this comment.
If a user passes both calib_include_modules and calib_exclude_modules, the behavior is implicitly "include first, then exclude"? Do you think we need to either:
- Documented explicitly (what happens if a module matches both?), or
- Validated to raise an error if both are set simultaneously.
There was a problem hiding this comment.
I think the ordering does not actually matter, but we do need to document a clear semantic. These 2 actually are 2 exclude module lists:
- exclude those that not in the incllude_modules
- exclude those that in the exclude_modeuls
There was a problem hiding this comment.
Exclude takes precedence: _should_calibrate checks include_modules first (if the module is not in the include list it returns False), then checks exclude_modules (if matched it returns False). So a module in both lists is excluded. Added a doc note to make this explicit. Thanks for pointing out.
| calib_include_modules: list[str] | None = ModeloptField( | ||
| default=None, | ||
| title="Patterns of modules to include in calibration.", | ||
| description=( | ||
| "If provided, only modules whose names match at least one of the fnmatch patterns are " | ||
| "calibrated. Modules that do not match any pattern are skipped and retain their " | ||
| "pre-existing calibration state. " | ||
| "Note: filtering applies only to quantized linear modules; TensorQuantizers in " | ||
| "non-linear modules (e.g. layer norms, embeddings) are unaffected." | ||
| ), | ||
| ) | ||
|
|
||
| calib_exclude_modules: list[str] | None = ModeloptField( | ||
| default=None, | ||
| title="Patterns of modules to exclude from calibration.", | ||
| description=( | ||
| "If provided, modules whose names match at least one of the fnmatch patterns are " | ||
| "skipped during calibration and retain their pre-existing calibration state. " | ||
| "Note: filtering applies only to quantized linear modules; TensorQuantizers in " | ||
| "non-linear modules (e.g. layer norms, embeddings) are unaffected." | ||
| ), | ||
| ) |
There was a problem hiding this comment.
I think the ordering does not actually matter, but we do need to document a clear semantic. These 2 actually are 2 exclude module lists:
- exclude those that not in the incllude_modules
- exclude those that in the exclude_modeuls
There was a problem hiding this comment.
♻️ Duplicate comments (1)
modelopt/torch/quantization/model_calib.py (1)
91-93:⚠️ Potential issue | 🟠 MajorFiltering still skips non-linear module quantizers.
Line 115 gates filtering behind
is_quantized_linear(module), soTensorQuantizerinstances attached to non-linear modules are never disabled even when their module name should be excluded. This leaves calibration partially unfiltered.💡 Proposed fix
- Note: - Only quantized linear modules (as identified by :func:`is_quantized_linear`) are filtered. - ``TensorQuantizer`` instances inside non-linear quantized modules (e.g. layer norms, - embeddings) are not disabled even if their module name matches a pattern. + Note: + Filters are evaluated against each ``TensorQuantizer`` parent module name. - disabled = [] - for name, module in model.named_modules(): - if is_quantized_linear(module) and not _should_calibrate(name): - for _, child in module.named_modules(): - if isinstance(child, TensorQuantizer) and not child._disabled: - child.disable() - disabled.append(child) + disabled: list[TensorQuantizer] = [] + for quantizer_name, quantizer in model.named_modules(): + if not isinstance(quantizer, TensorQuantizer): + continue + parent_name = quantizer_name.rsplit(".", 1)[0] if "." in quantizer_name else "" + if _should_calibrate(parent_name) or quantizer._disabled: + continue + quantizer.disable() + disabled.append(quantizer)Also applies to: 113-119
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/model_calib.py` around lines 91 - 93, The current filtering only runs when is_quantized_linear(module) is true, so TensorQuantizer instances inside non-linear modules are never disabled; remove that gate and apply the name-pattern exclusion to all modules that contain TensorQuantizer instances. Locate the block in model_calib.py where is_quantized_linear(module) is checked and instead iterate over module._tensor_quantizers (or however TensorQuantizer instances are accessed) for every module, and if the module name matches the exclusion pattern disable each TensorQuantizer (e.g., call the existing disable/enable API or set the same flag used for linear modules); make the same change for the duplicate logic region covering the other occurrence noted (lines referenced in the review).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 91-93: The current filtering only runs when
is_quantized_linear(module) is true, so TensorQuantizer instances inside
non-linear modules are never disabled; remove that gate and apply the
name-pattern exclusion to all modules that contain TensorQuantizer instances.
Locate the block in model_calib.py where is_quantized_linear(module) is checked
and instead iterate over module._tensor_quantizers (or however TensorQuantizer
instances are accessed) for every module, and if the module name matches the
exclusion pattern disable each TensorQuantizer (e.g., call the existing
disable/enable API or set the same flag used for linear modules); make the same
change for the duplicate logic region covering the other occurrence noted (lines
referenced in the review).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 972421ca-3aec-4d1f-9b2e-65039170b3cf
📒 Files selected for processing (2)
modelopt/torch/quantization/config.pymodelopt/torch/quantization/model_calib.py
🚧 Files skipped from review as they are similar to previous changes (1)
- modelopt/torch/quantization/config.py
This is a good concern. Should we make these to exclusive -> that is only one can be specified at a time? |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/unit/torch/quantization/test_mode.py (1)
46-47: Annotateconfig_classas returning the class, not an instance.Line 46 returns
TestConfigitself, so the annotation should betype[CalibrationConfig], notCalibrationConfig. That keeps the test aligned withBaseCalibrateModeDescriptor.config_classand avoids noisy mypy failures.As per coding guidelines, "Use mypy for type checking on Python code (configured in `pyproject.toml`)".🛠️ Proposed typing fix
- def config_class(self) -> CalibrationConfig: + def config_class(self) -> type[CalibrationConfig]: return TestConfig🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/torch/quantization/test_mode.py` around lines 46 - 47, The test's config_class method currently annotated to return a CalibrationConfig instance but actually returns the class TestConfig; change the return type to a class type (use type[CalibrationConfig] or Type[CalibrationConfig]) so it matches BaseCalibrateModeDescriptor.config_class and avoids mypy errors—update the signature of def config_class(self) -> type[CalibrationConfig]: (or Type[CalibrationConfig]) and ensure TestConfig remains the returned symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1048-1049: Existing public base class name QuantizeAlgorithmConfig
was renamed to ModeloptBaseConfig and that breaks imports; restore a deprecated
compatibility alias by defining QuantizeAlgorithmConfig = ModeloptBaseConfig in
the module and emit a DeprecationWarning using the warnings module so users get
a notice (keep the alias in modelopt.torch.quantization.config alongside the new
ModeloptBaseConfig); ensure the alias is added near the class definition so
future imports work while guiding users to migrate.
In `@modelopt/torch/quantization/mode.py`:
- Around line 249-262: The current code applies filter_calib_modules around the
entire sequential_calibrate call, which disables excluded modules during the
recomputation of layer inputs; instead, remove the outer with
filter_calib_modules(...) wrapper and apply filtering only around the per-layer
calibration invocation after inputs are captured. Concretely, stop wrapping
sequential_calibrate(model, ...) with filter_calib_modules; either update
sequential_calibrate to call calib_func(layer, ...) inside a with
filter_calib_modules(model, include_modules, exclude_modules) block for each
layer after inputs are recomputed, or wrap the calib_func argument (func) with a
small wrapper that enters filter_calib_modules only when calling calib_func for
that layer so include_modules/exclude_modules only affect the calibration call,
not the input recomputation. Ensure symbols referenced are forward_loop,
sequential_calibrate, func/calib_func, filter_calib_modules, include_modules,
exclude_modules, and model.
---
Nitpick comments:
In `@tests/unit/torch/quantization/test_mode.py`:
- Around line 46-47: The test's config_class method currently annotated to
return a CalibrationConfig instance but actually returns the class TestConfig;
change the return type to a class type (use type[CalibrationConfig] or
Type[CalibrationConfig]) so it matches BaseCalibrateModeDescriptor.config_class
and avoids mypy errors—update the signature of def config_class(self) ->
type[CalibrationConfig]: (or Type[CalibrationConfig]) and ensure TestConfig
remains the returned symbol.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 879ea3f4-4c3b-4930-8aa5-463e0a51fe32
📒 Files selected for processing (4)
modelopt/torch/quantization/config.pymodelopt/torch/quantization/mode.pytests/unit/torch/quantization/test_calib.pytests/unit/torch/quantization/test_mode.py
A module in both lists is excluded. I added a line in the config doc-string to state it explicitly.
If a module is not in either of the lists, it keeps its previous enable state from quantization. |
I'd keep them independent. Currently exclude takes precedence if both lists include a module. One use case is include_modules=["mlp"], exclude_modules=["mlpgate*"] -- include a broad pattern, then carve out exceptions. (Though I don't have a very complicate use case in mind yet) Wdyt? |
I would like to reiterate that there is really no precedence here: These 2 actually are 2 exclude module lists: exclude those that are not in the include_modules so, the ordering of applying them do not matter, it's exclusion of the union of the 2 |
realAsma
left a comment
There was a problem hiding this comment.
Note:
Only quantized linear modules (as identified by :func:`is_quantized_linear`) are filtered.
What is the reason for this? Can we remove this limitation - this way we can easily turn on and off calibration for various quantizers such as bmm_quantizers
Adds calib_include_modules and calib_exclude_modules fields to QuantizeAlgorithmConfig so users can restrict any calibration algorithm (max, mse, smoothquant, awq, ...) to a subset of the model's layers. Filtering is applied via the new filter_calib_modules context manager, which temporarily disables TensorQuantizer instances in non-matching modules while preserving their pre-existing _amax values. Also exposes --calib_include_modules / --calib_exclude_modules CLI args in the hf_ptq.py example and wires them through build_quant_cfg in example_utils.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
- Fix shared preset mutation in build_quant_cfg by always deep-copying the preset dict before modification (previously only awq path did this) - Document linear-only filtering limitation in filter_calib_modules docstring and calib_include/exclude_modules field descriptions - Filter empty strings from CLI pattern parsing in hf_ptq.py to handle trailing commas gracefully - Strengthen test_filter_no_op_when_none to assert amax value equality rather than just non-None presence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Users should set calib_include_modules / calib_exclude_modules directly in the algorithm dict of their quantization config rather than via dedicated CLI flags. Remove --calib_exclude_modules / --calib_include_modules from hf_ptq.py and the corresponding parameters from build_quant_cfg. Update test_filter_via_config_api to exercise the intended usage path: embedding both fields in the algorithm dict and calling mtq.quantize, covering exclude and include variants and asserting that uncalibrated module _amax buffers are absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
calib_include/exclude_modules is a core library feature accessed via the algorithm config dict; example scripts should not be modified. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
1b3f356 to
df40b6f
Compare
|
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
cjluo-nv
left a comment
There was a problem hiding this comment.
Review
Overall the implementation is in good shape. The filter_calib_modules context manager is well-designed, the integration into wrapped_calib_func gives all algorithms filtering for free, and the test coverage is thorough. A few items to address:
Issues
1. Config field naming inconsistency (docs only)
The PR description and usage examples reference calib_include_modules / calib_exclude_modules, but the actual config fields are include_modules / exclude_modules. The code is fine — just the PR description is misleading.
2. filter_calib_modules only handles linear and BMM modules
Any other quantized module types (conv layers, custom quantized modules) silently pass through unfiltered. The docstring documents this, which is good, but users could be surprised if their pattern matches a module name that isn't linear or BMM — it would silently do nothing for that module.
3. Missing __all__ update in config.py?
CalibrationConfig is the new canonical name. If config.py has an __all__, it should export CalibrationConfig.
4. No deprecation warning on QuantizeAlgorithmConfig alias
The comment says "deprecated, will be removed in a future release" but no DeprecationWarning is emitted. If the intent is to actually deprecate it, consider adding a warning. If it's just a soft rename for now, the alias-only approach is fine.
Minor nits
test_filter_no_op_when_nonere-runsmax_calibrateon an already-calibrated model with the same data/seed, so the amax-equality assertion is trivially true. It proves the context manager is a no-op but would also pass if calibration was swallowed entirely.- 11 commits — consider squashing before merge.
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
What does this PR do?
Type of change: New feature
Adds
calib_include_modulesandcalib_exclude_modulesfields toQuantizeAlgorithmConfigso users can restrict any calibration algorithm (max, mse, smoothquant, awq, …) to a
subset of the model's layers. Patterns are fnmatch wildcards matched against module names
(e.g.
"*lm_head*","*self_attn*").Implementation:
filter_calib_modulescontext manager inmodel_calib.pytemporarily disablesTensorQuantizerinstances in non-matching modules.TensorQuantizer.disable()does notclear
_amax, so excluded modules retain their pre-existing calibration state.QuantizeAlgorithmConfig, so filtering applies uniformly to allalgorithms with no per-algorithm changes.
wrapped_calib_funcinmode.pypops these fields and wraps every calibration call infilter_calib_modulesautomatically.Interaction with
"enable": falseinquant_cfg:Quantizers disabled via
quant_cfg(i.e._disabled=True) are skipped byfilter_calib_modules— they are never added to the restore list and are never re-enabled. Their disabled state is fully
preserved regardless of
calib_exclude/include_modules.Lower-level API:
When calling calibration functions directly (outside
mtq.calibrate()), wrap manually with thecontext manager:
Usage
Testing
Added 6 new unit tests in tests/unit/torch/quantization/test_calib.py:
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
New Features
Refactor
Tests