huggingface · amd-inechakr · May 8, 2026 · May 8, 2026 · May 20, 2026 · Jun 23, 2026
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -182,6 +182,8 @@
     title: NVIDIA ModelOpt
   - local: quantization/autoround
     title: AutoRound
+  - local: quantization/quark
+    title: Quark
   title: Quantization
 - isExpanded: false
   sections:

diff --git a/docs/source/en/quantization/overview.md b/docs/source/en/quantization/overview.md
@@ -28,7 +28,7 @@ There are two ways to use [`~quantizers.PipelineQuantizationConfig`] depending o
 
 Initialize [`~quantizers.PipelineQuantizationConfig`] with the following parameters.
 
-- `quant_backend` specifies which quantization backend to use. Currently supported backends include: `bitsandbytes_4bit`, `bitsandbytes_8bit`, `gguf`, `quanto`, and `torchao`.
+- `quant_backend` specifies which quantization backend to use. Currently supported backends include: `bitsandbytes_4bit`, `bitsandbytes_8bit`, `gguf`, `quanto`, `torchao`, and `quark`.
 - `quant_kwargs` specifies the quantization arguments to use.
 
 > [!TIP]

diff --git a/docs/source/en/quantization/quark.md b/docs/source/en/quantization/quark.md
@@ -0,0 +1,112 @@
+<!--Copyright 2025 - 2026 Advanced Micro Devices, Inc. and The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Quark
+
+[Quark](https://quark.docs.amd.com/latest/) is AMD's deep-learning quantization toolkit. It is agnostic to specific data types, algorithms, and hardware, and primarily targets AMD CPUs and GPUs. Quark supports a broad range of strategies — INT8, INT4, FP8, MX, FP4, SVDQuant, SmoothQuant, AWQ, GPTQ, QuaRot, SpinQuant — combinable across diffusion submodules (UNet, transformer, VAE).
+
+The Diffusers integration mirrors the [Transformers integration](https://huggingface.co/docs/transformers/quantization/quark): models exported with `quark.torch.export_safetensors` can be loaded back through `DiffusionPipeline.from_pretrained` / `ModelMixin.from_pretrained` without per-layer setup code.
+
+To use Quark with Diffusers, install Quark:
+
+```bash
+pip install amd-quark
+```
+
+## Loading a pre-quantized model
+
+If a model on the Hub already carries a `quantization_config` block in `config.json`, no extra setup is needed:
+
+```python
+import torch
+from diffusers import DiffusionPipeline
+
+pipe = DiffusionPipeline.from_pretrained(
+    "amd/sd3-quark-int8",
+    torch_dtype=torch.float16,
+).to("cuda")
+
+image = pipe("A cat on a windowsill", num_inference_steps=30).images[0]
+```
+
+The dispatch is automatic: the loader sees `quant_method = "quark"` and instantiates `QuarkDiffusersQuantizer`.
+
+## On-the-fly weight-only quantization
+
+Pass `QuarkConfig(...)` against a vanilla fp16/bf16 model to quantize weights at load time:
+
+```python
+import torch
+from diffusers import StableDiffusion3Pipeline, QuarkConfig
+
+# A QConfig that produces INT8 weight-only quantization (no activation quantizers).
+# Build with quark.torch.quantization.config.config.QConfig and pass its dict.
+quark_config_dict = ...  # see https://quark.docs.amd.com/latest/
+
+quantization_config = QuarkConfig(quant_method="quark", **quark_config_dict)
+pipe = StableDiffusion3Pipeline.from_pretrained(
+    "stabilityai/stable-diffusion-3-medium-diffusers",
+    quantization_config=quantization_config,
+    torch_dtype=torch.float16,
+).to("cuda")
+```
+
+This works for any QConfig that does not declare activation quantizers (input or output `QTensorConfig`). Examples: INT8 weight-only, MXFP4 weight-only.
+
+For activation-quantized configurations (SmoothQuant, SVDQuant w4a4, FP8 with calibrated activations, etc.), `from_pretrained` will raise a `NotImplementedError` directing you to the offline path.
+
+## Producing a quantized checkpoint
+
+For configurations that need calibration data, use the offline workflow:
+
+```python
+import torch
+from diffusers import StableDiffusion3Pipeline
+from quark.torch import ModelQuantizer, export_safetensors
+from quark.torch.utils.diffusers import get_calib_dataloader
+
+pipe = StableDiffusion3Pipeline.from_pretrained(
+    "stabilityai/stable-diffusion-3-medium-diffusers",
+    torch_dtype=torch.float16,
+).to("cuda")
+
+prompts = [
+    "A serene lake reflecting mountains at sunset",
+    "A futuristic city with flying cars at night",
+]
+dataloader = get_calib_dataloader(pipe, pipe.transformer, prompts, n_steps=20)
+
+qconfig = ...  # SVDQuant / SmoothQuant / FP8 + activation calibration
+pipe.transformer = ModelQuantizer(qconfig).quantize_model(pipe.transformer, dataloader)
+
+export_safetensors(pipe.transformer, "sd3-quark-svdquant/transformer")
+```
+
+The exported directory then reloads through `from_pretrained` per the first section.
+
+## Support matrix
+
+| Feature | Supported |
+| --- | --- |
+| Data types | INT8, INT4, INT2, BFloat16, Float16, FP8 (E4M3/E5M2), FP6, FP4, OCP MX, MX6, MX9, BFP16 |
+| Pre-quantization transforms | SmoothQuant, QuaRot, SpinQuant, AWQ |
+| Quantization algorithms | GPTQ, SVDQuant |
+| Operators | `nn.Linear`, `nn.Conv2d`, `nn.ConvTranspose2d`, `nn.Embedding`, `nn.EmbeddingBag` |
+| Granularity | per-tensor, per-channel, per-group, per-block, per-layer, per-layer-type |
+| Activation calibration | min/max, percentile, histogram, MSE |
+| Quantization strategy | weight-only, static, dynamic, with or without output quantization |
+| `torch.compile` | yes (after `ModelQuantizer.freeze`) |
+
+## Resources
+
+- Quark documentation: <https://quark.docs.amd.com/latest/>
+- Quark-quantized models on the Hub: <https://huggingface.co/models?other=quark>
diff --git a/src/diffusers/__init__.py b/src/diffusers/__init__.py
@@ -17,6 +17,7 @@
     is_onnx_available,
     is_opencv_available,
     is_optimum_quanto_available,
+    is_quark_available,
     is_scipy_available,
     is_sentencepiece_available,
     is_torch_available,
@@ -136,6 +137,18 @@
 else:
     _import_structure["quantizers.quantization_config"].append("AutoRoundConfig")
 
+try:
+    if not is_torch_available() and not is_accelerate_available() and not is_quark_available():
+        raise OptionalDependencyNotAvailable()
+except OptionalDependencyNotAvailable:
+    from .utils import dummy_quark_objects
+
+    _import_structure["utils.dummy_quark_objects"] = [
+        name for name in dir(dummy_quark_objects) if not name.startswith("_")
+    ]
+else:
+    _import_structure["quantizers.quantization_config"].append("QuarkConfig")
+
 try:
     if not is_onnx_available():
         raise OptionalDependencyNotAvailable()
@@ -1017,6 +1030,14 @@
     else:
         from .quantizers.quantization_config import AutoRoundConfig
 
+    try:
+        if not is_quark_available():
+            raise OptionalDependencyNotAvailable()
+    except OptionalDependencyNotAvailable:
+        from .utils.dummy_quark_objects import *
+    else:
+        from .quantizers.quantization_config import QuarkConfig
+
     try:
         if not is_onnx_available():
             raise OptionalDependencyNotAvailable()

diff --git a/src/diffusers/quantizers/auto.py b/src/diffusers/quantizers/auto.py
@@ -30,9 +30,11 @@
     QuantizationConfigMixin,
     QuantizationMethod,
     QuantoConfig,
+    QuarkConfig,
     TorchAoConfig,
 )
 from .quanto import QuantoQuantizer
+from .quark import QuarkDiffusersQuantizer
 from .torchao import TorchAoHfQuantizer
 
 
@@ -44,6 +46,7 @@
     "torchao": TorchAoHfQuantizer,
     "modelopt": NVIDIAModelOptQuantizer,
     "auto-round": AutoRoundQuantizer,
+    "quark": QuarkDiffusersQuantizer,
 }
 
 AUTO_QUANTIZATION_CONFIG_MAPPING = {
@@ -54,6 +57,7 @@
     "torchao": TorchAoConfig,
     "modelopt": NVIDIAModelOptConfig,
     "auto-round": AutoRoundConfig,
+    "quark": QuarkConfig,
 }
 
 

diff --git a/src/diffusers/quantizers/quantization_config.py b/src/diffusers/quantizers/quantization_config.py
@@ -23,6 +23,7 @@
 from __future__ import annotations
 
 import copy
+import dataclasses
 import importlib.metadata
 import json
 import os
@@ -33,7 +34,7 @@
 
 from packaging import version
 
-from ..utils import deprecate, is_torch_available, is_torchao_version, logging
+from ..utils import deprecate, is_quark_available, is_torch_available, is_torchao_version, logging
 
 
 if is_torch_available():
@@ -49,6 +50,7 @@ class QuantizationMethod(str, Enum):
     QUANTO = "quanto"
     MODELOPT = "modelopt"
     AUTOROUND = "auto-round"
+    QUARK = "quark"
 
 
 @dataclass
@@ -828,3 +830,82 @@ def from_dict(cls, config_dict: dict, return_unused_kwargs: bool = False, **kwar
         # (e.g. quant_method is set automatically)
         config_dict = {k: v for k, v in config_dict.items() if k != "quant_method"}
         return super().from_dict(config_dict, return_unused_kwargs=return_unused_kwargs, **kwargs)
+
+
+class QuarkConfig(QuantizationConfigMixin):
+    """Configuration for AMD [Quark](https://quark.docs.amd.com/latest/) quantized diffusion models.
+
+    Mirrors ``transformers.utils.quantization_config.QuarkConfig`` so that a model serialized by
+    ``quark.torch.export_safetensors`` reloads with the same schema in either library.
+
+    The ``quantization_config`` section of ``config.json`` is forwarded to this constructor as keyword arguments. Two
+    on-disk layouts are accepted:
+
+    * **Native.** Produced by ``custom_mode='quark'``. Contains a flat dump of ``QConfig.to_dict()`` together with a
+      top-level ``"export"`` block holding the ``JsonExporterConfig`` fields.
+    * **Custom mode (legacy).** Produced by ``custom_mode='awq'`` or ``custom_mode='fp8'``. ``quant_method`` carries
+      the custom mode tag and the rest of the body matches the AutoAWQ / native-FP8 schemas.
+    """
+
+    def __init__(self, quant_config_dict: dict[str, Any] | None = None, **kwargs):
+        if quant_config_dict is not None:
+            kwargs = {**quant_config_dict, **kwargs}
+
+        if not (is_torch_available() and is_quark_available()):
+            raise ImportError(
+                "Quark is not installed. Install it with `pip install amd-quark` or "
+                "refer to https://quark.docs.amd.com/latest/install.html."
+            )
+
+        from quark import __version__ as quark_version
+        from quark.torch.export.config.config import JsonExporterConfig
+        from quark.torch.export.main_export.quant_config_parser import QuantConfigParser
+        from quark.torch.quantization.config.config import QConfig
+
+        self.custom_mode = kwargs.get("quant_method", QuantizationMethod.QUARK.value)
+        self.legacy = "export" not in kwargs
+
+        if self.custom_mode in ["awq", "fp8"]:
+            self.quant_config = QuantConfigParser.from_custom_config(kwargs, is_bias_quantized=False)
+            self.json_export_config = JsonExporterConfig()
+        else:
+            self.quant_config = QConfig.from_dict(kwargs)
+
+        if "export" in kwargs:
+            export_kwargs = dict(kwargs["export"])
+            # ``min_kv_scale`` is amd-quark>=0.8 only.  Drop with a warning on older versions.
+            if "min_kv_scale" in export_kwargs and version.parse(quark_version) < version.parse("0.8"):
+                min_kv_scale = export_kwargs.pop("min_kv_scale")
+                logger.warning(
+                    "Found `min_kv_scale=%s` in the model config.json's `quantization_config.export` block, but "
+                    "this parameter is supported only for amd-quark>=0.8. Ignoring. Please upgrade `amd-quark`.",
+                    min_kv_scale,
+                )
+            self.json_export_config = JsonExporterConfig(**export_kwargs)
+        elif self.custom_mode == QuantizationMethod.QUARK.value:
+            self.json_export_config = JsonExporterConfig()
+
+        self.quant_method = QuantizationMethod.QUARK
+
+    def to_dict(self) -> dict[str, Any]:
+        """Serialize to the JSON-friendly kwargs form accepted by ``__init__``.
+
+        The default ``QuantizationConfigMixin.to_dict`` does
+        ``copy.deepcopy(self.__dict__)``, which would embed the live Quark
+        ``QConfig`` and ``JsonExporterConfig`` dataclasses (not JSON-serializable
+        through ``json.dumps``).  Mirror what
+        ``quark.torch.export.api.QuarkSafetensorsExporter`` writes into
+        ``config.json``: a flat dump of ``QConfig.to_dict()`` plus a top-level
+        ``"export"`` block holding the ``JsonExporterConfig`` fields.
+        """
+        config_dict: dict[str, Any] = {}
+        if self.quant_config is not None:
+            config_dict.update(self.quant_config.to_dict())
+        config_dict["quant_method"] = self.custom_mode
+        if self.json_export_config is not None:
+            config_dict["export"] = dataclasses.asdict(self.json_export_config)
+        return config_dict
+
+    def to_diff_dict(self) -> dict[str, Any]:
+        """No meaningful "default" QuarkConfig to diff against — return ``to_dict``."""
+        return self.to_dict()
diff --git a/src/diffusers/quantizers/quark/__init__.py b/src/diffusers/quantizers/quark/__init__.py
@@ -0,0 +1,14 @@
+# Copyright 2025 - 2026 Advanced Micro Devices, Inc. and The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .quark_quantizer import QuarkDiffusersQuantizer