remove str option for quantization config in torchao by howardzhang-cv · Pull Request #13291 · huggingface/diffusers

howardzhang-cv · 2026-03-19T20:31:22Z

What does this PR do?

Remove the deprecated string-based quant_type path from TorchAoConfig, requiring AOBaseConfig instances instead.

TorchAoConfig.init now only accepts AOBaseConfig subclass instances (e.g. Int8WeightOnlyConfig()) and raises TypeError for strings
Deleted ~200 lines of dead code: _get_torchao_quant_type_to_method, _is_xpu_or_cuda_capability_atleast_8_9, TorchAoJSONEncoder, and all string-parsing branches in post_init, to_dict, from_dict, get_apply_tensor_subclass
Simplified torchao_quantizer.py: removed string-based branches in update_torch_dtype, adjust_target_dtype, get_cuda_warm_up_factor; fixed is_trainable which would crash on AOBaseConfig objects
Converted all test cases from string quant types to their AOBaseConfig equivalents; removed test_floatx_quantization (no replacement for fpx_weight_only)
Updated docs to show only AOBaseConfig-based usage

Testing

python -m pytest tests/quantization/torchao/test_torchao.py -xvs

Who can review?

@sayakpaul

jerryzh168 · 2026-03-19T20:37:56Z

src/diffusers/quantizers/torchao/torchao_quantizer.py

+        if isinstance(quant_type, AOBaseConfig):
+            # Extract size digit using fuzzy match on the class name
+            config_name = quant_type.__class__.__name__
+            size_digit = fuzzy_match_size(config_name)
+
+            # Map the extracted digit to appropriate dtype
+            if size_digit == "4":
                return CustomDtype.INT4
-            elif quant_type == "uintx_weight_only":
-                return self.quantization_config.quant_type_kwargs.get("dtype", torch.uint8)
-            elif quant_type.startswith("uint"):
-                return {
-                    1: torch.uint1,
-                    2: torch.uint2,
-                    3: torch.uint3,
-                    4: torch.uint4,
-                    5: torch.uint5,
-                    6: torch.uint6,
-                    7: torch.uint7,
-                }[int(quant_type[4])]
-            elif quant_type.startswith("float") or quant_type.startswith("fp"):
-                return torch.bfloat16
-
-        elif is_torchao_version(">", "0.9.0"):
-            from torchao.core.config import AOBaseConfig
-
-            quant_type = self.quantization_config.quant_type
-            if isinstance(quant_type, AOBaseConfig):
-                # Extract size digit using fuzzy match on the class name
-                config_name = quant_type.__class__.__name__
-                size_digit = fuzzy_match_size(config_name)
-
-                # Map the extracted digit to appropriate dtype
-                if size_digit == "4":
-                    return CustomDtype.INT4
-                else:
-                    # Default to int8
-                    return torch.int8
+            else:
+                # Default to int8
+                return torch.int8


this seems a bit fragile, I think it's from transformers originally, not sure if this is still needed in transformers though, it might have been refactored after 5.0 update

Sorry might be a dumb question, what part are you referring to that's from transformers originally? Is it the entire adjust_target_dtype function?

src/diffusers/quantizers/torchao/torchao_quantizer.py

jerryzh168 · 2026-03-19T20:40:20Z

src/diffusers/quantizers/quantization_config.py

-                    f"Requested quantization type: {self.quant_type} is not supported or is an incorrect `quant_type` name. If you think the "
-                    f"provided quantization type should be supported, please open an issue at https://github.com/huggingface/diffusers/issues."
-                )
+        if is_torchao_version("<=", "0.9.0"):


separate PR: I feel we should just have a single assertion for torchao to be a relatively recent version (e.g. 0.15) and remove all these version checks

Actually yeah I was going to ask you about that as well. There's a couple version checks scattered around right now. Would be cleaner to just remove all of them.

Yeah we should mandate a minimum version requirement here.

Do we want to wait to do this in a separate PR? I changed and set it to 0.9.0 because that's when AOBaseConfig was supported. Moving to 0.15.0 might be cleaner in a separate PR in case we need to revert for whatever reason?

sayakpaul

Thanks a lot for starting this work!

I think we can merge this fairly soon.

src/diffusers/quantizers/torchao/torchao_quantizer.py

sayakpaul · 2026-03-20T04:39:12Z

src/diffusers/quantizers/quantization_config.py

-                    f"Requested quantization type: {self.quant_type} is not supported or is an incorrect `quant_type` name. If you think the "
-                    f"provided quantization type should be supported, please open an issue at https://github.com/huggingface/diffusers/issues."
-                )
+        if is_torchao_version("<=", "0.9.0"):


Yeah we should mandate a minimum version requirement here.

sayakpaul · 2026-03-20T04:39:27Z

src/diffusers/quantizers/quantization_config.py

+        if not isinstance(self.quant_type, AOBaseConfig):
+            raise TypeError(f"quant_type must be an AOBaseConfig instance, got {type(self.quant_type).__name__}")


src/diffusers/quantizers/quantization_config.py

sayakpaul · 2026-03-20T04:48:53Z

@bot /style

github-actions · 2026-03-20T04:49:19Z

Style bot fixed some files and pushed the changes.

HuggingFaceDocBuilderDev · 2026-03-20T04:56:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

Thanks for working on this. I will run the tests on my end to see if we didn't miss anything.

We need to update the test suite here as well:

diffusers/tests/models/testing_utils/quantization.py

Line 798 in a9855c4

class TorchAoConfigMixin:

docs/source/en/quantization/torchao.md

sayakpaul · 2026-03-21T03:56:47Z

src/diffusers/quantizers/torchao/torchao_quantizer.py

-                logger.warning(
-                    f"You are trying to set torch_dtype to {torch_dtype} for int4/int8/uintx quantization, but "
-                    f"only bfloat16 is supported right now. Please set `torch_dtype=torch.bfloat16`."
-                )


Should we not implement an equivalent of

if isinstance(quant_type, str) and (quant_type.startswith("int") or quant_type.startswith("uint")): if torch_dtype is not None and torch_dtype != torch.bfloat16: logger.warning( f"You are trying to set torch_dtype to {torch_dtype} for int4/int8/uintx quantization, but " f"only bfloat16 is supported right now. Please set `torch_dtype=torch.bfloat16`." )

?

From what I understand, I think this is outdated and not needed anymore? @jerryzh168 maybe can you confirm?

@howardzhang-cv oh I meant this is not needed for transformers (after their 5.0 update). but for diffusers I think we should preserve the behavior, until they do a similar update like transformers, wondering if this is planned? @sayakpaul

but for diffusers I think we should preserve the behavior, until they do a similar update like transformers,

What is this similar update?

Yes, I agree with Jerry. Let's preserve the behaviour.

sayakpaul · 2026-03-24T03:21:42Z

@howardzhang-cv thanks for the updates but I guess the following are remaining to be updated?

diffusers/tests/models/testing_utils/quantization.py

Line 798 in a9855c4

class TorchAoConfigMixin:

sayakpaul

Thanks for the further updates.

Will wait for @jerryzh168 to provide an update on https://github.com/huggingface/diffusers/pull/13291/changes#r2977963964

sayakpaul · 2026-03-24T03:24:40Z

docs/source/en/quantization/torchao.md

-| **Unsigned Integer quantization** | `uintx_weight_only` | `uint1wo`, `uint2wo`, `uint3wo`, `uint4wo`, `uint5wo`, `uint6wo`, `uint7wo` |
-
-Some quantization methods are aliases (for example, `int8wo` is the commonly used shorthand for `int8_weight_only`). This allows using the quantization methods described in the torchao docs as-is, while also making it convenient to remember their shorthand notations.
+| **Category** | **Configuration Classes** |


There's practically nothing preventing the users from using the configs supported through TorchAO and they might not be limited to the ones we're including the in following table. For example, we can use the more recent NVFP4 and MXFP8 schemes (their respective config classes) here as well.

So, how about we provide examples to the popular config classes like Int8DynamicActivationInt4WeightConfig, Int8WeightOnlyConfig, and Float8DynamicActivationFloat8WeightConfig (with hyperlinks) and then provide a link to available config options (this will be a TorchAO doc link) for the users to explore?

https://docs.pytorch.org/ao/main/workflows/inference.html#inference-workflows will be useful I think

howardzhang-cv · 2026-03-24T23:17:52Z

@howardzhang-cv thanks for the updates but I guess the following are remaining to be updated?

diffusers/tests/models/testing_utils/quantization.py

Line 798 in a9855c4

class TorchAoConfigMixin:

Nice catch, I updated it here to fit the new option

howardzhang-cv · 2026-03-24T23:21:16Z

@sayakpaul I made some more minor changes:

Fixed the documentation as per your comment
Fixed some of the broken links in the example popular configs and removed some no longer supported ones
Fixed some documentation the cogvideox markdown file
Fixed the quantization.py file you were mentioning in the comment above.

@jerryzh168 when you get a chance, can you take a look at the src/diffusers/quantizers/torchao/torchao_quantizer.py comment above when you get a chance?

sayakpaul

Thanks! Left some further minor comments. Major comment being the use of version=2 where possible.

sayakpaul · 2026-03-25T02:41:33Z

docs/source/en/api/pipelines/cogvideox.md

 pipeline_quant_config = PipelineQuantizationConfig(
  quant_backend="torchao",
-  quant_kwargs={"quant_type": "int8wo"},
+  quant_kwargs={"quant_type": Int8WeightOnlyConfig()},


Perhaps we promote the use of version=2?

sayakpaul · 2026-03-25T02:42:02Z

docs/source/en/quantization/torchao.md

-
-Refer to the [official torchao documentation](https://docs.pytorch.org/ao/stable/index.html) for a better understanding of the available quantization methods and the exhaustive list of configuration options available.
+| **Category** | **Configuration Classes** |
+|---|---|


Yes this is cool!

sayakpaul · 2026-03-25T02:42:35Z

src/diffusers/quantizers/torchao/torchao_quantizer.py

-                logger.warning(
-                    f"You are trying to set torch_dtype to {torch_dtype} for int4/int8/uintx quantization, but "
-                    f"only bfloat16 is supported right now. Please set `torch_dtype=torch.bfloat16`."
-                )


Yes, I agree with Jerry. Let's preserve the behaviour.

sayakpaul · 2026-03-25T02:44:00Z

tests/models/testing_utils/quantization.py

-    if is_torchao_version(">=", "0.9.0"):
-        pass


I am guessing we're already fixing the minimum version to be 0.9.0?

sayakpaul · 2026-03-25T03:24:24Z

I ran the tests and I am getting (pytest tests/models/transformers/test_models_transformer_flux.py::TestFluxTransformerTorchAo):

FAILED tests/models/transformers/test_models_transformer_flux.py::TestFluxTransformerTorchAo::test_torchao_quantization_num_parameters[int4wo] - ImportError: Requires mslk >= 1.0.0
FAILED tests/models/transformers/test_models_transformer_flux.py::TestFluxTransformerTorchAo::test_torchao_quantization_memory_footprint[int4wo] - ImportError: Requires mslk >= 1.0.0
FAILED tests/models/transformers/test_models_transformer_flux.py::TestFluxTransformerTorchAo::test_torchao_quantization_inference[int4wo] - ImportError: Requires mslk >= 1.0.0
FAILED tests/models/transformers/test_models_transformer_flux.py::TestFluxTransformerTorchAo::test_torchao_quantization_inference[int8dq] - RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16
FAILED tests/models/transformers/test_models_transformer_flux.py::TestFluxTransformerTorchAo::test_torchao_dequantize - NotImplementedError: QuantizationMethod.TORCHAO has no implementation of `dequantize`, please raise an issue on GitHub.

But these are failing on main as well. The mslk error seems unexpected. We didn't face it previously. I am on torchao 0.17.0.dev20260320+cu128 and torch 2.12.0.dev20260319+cu128.

I will look into the last two separately.

sayakpaul · 2026-03-25T03:25:37Z

Some of the failing tests are also relevant to this PR, let's fix them. Example:
https://github.com/huggingface/diffusers/actions/runs/23516839193/job/68468358140?pr=13291

jerryzh168 reviewed Mar 19, 2026

View reviewed changes

src/diffusers/quantizers/torchao/torchao_quantizer.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Mar 19, 2026

View reviewed changes

src/diffusers/quantizers/torchao/torchao_quantizer.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Mar 19, 2026

View reviewed changes

remove str option for quantization config in torchao

d49eb2e

howardzhang-cv force-pushed the update_torchao_test branch from ed295a7 to d49eb2e Compare March 19, 2026 21:42

howardzhang-cv marked this pull request as ready for review March 19, 2026 21:43

sayakpaul reviewed Mar 20, 2026

View reviewed changes

github-actions bot and others added 2 commits March 20, 2026 04:49

Apply style fixes

b8cb4b2

Merge branch 'main' into update_torchao_test

2449132

minor fixes

292e65b

howardzhang-cv requested a review from sayakpaul March 20, 2026 21:29

sayakpaul approved these changes Mar 21, 2026

View reviewed changes

howardzhang-cv and others added 2 commits March 23, 2026 11:11

Merge branch 'huggingface:main' into update_torchao_test

6f1a3dd

Added AOBaseConfig docs to torchao.md

f86e039

sayakpaul reviewed Mar 24, 2026

View reviewed changes

minor fixes for removing str option torchao

a53a16e

sayakpaul reviewed Mar 25, 2026

View reviewed changes

sayakpaul mentioned this pull request Mar 25, 2026

[tests] fix torchao tests #13330

Open

		if not isinstance(self.quant_type, AOBaseConfig):
		raise TypeError(f"quant_type must be an AOBaseConfig instance, got {type(self.quant_type).__name__}")

Conversation

howardzhang-cv commented Mar 19, 2026

What does this PR do?

Testing

Who can review?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jerryzh168 Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 20, 2026

Uh oh!

sayakpaul left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

howardzhang-cv Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Mar 24, 2026

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

howardzhang-cv commented Mar 24, 2026

Uh oh!

howardzhang-cv commented Mar 24, 2026

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Mar 19, 2026 •

edited

Loading

github-actions bot commented Mar 20, 2026 •

edited

Loading

sayakpaul left a comment •

edited

Loading

howardzhang-cv Mar 23, 2026 •

edited

Loading