Add Nunchaku Lite single-file quantization#14100
Conversation
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks for getting started! Just did a first pass and left high-level reviews.
| def __init__(self, compute_dtype: "torch.dtype" | None = None): | ||
| self.quant_method = QuantizationMethod.NUNCHAKU_LITE | ||
| self.compute_dtype = compute_dtype | ||
| self.pre_quantized = True |
There was a problem hiding this comment.
Can we also guide the readers on how to obtain the checkpoints?
Also, can we ensure torch.compile compatibility?
There was a problem hiding this comment.
The kernels are compatible with torch.compile, as well as SVDQLinear and AWQLinear, I will make a test to assure that the compatibility still remains when we integrate to diffusers
Can we also guide the readers on how to obtain the checkpoints?
I'm a little confused here. Could you help provide more context
There was a problem hiding this comment.
I'm a little confused here. Could you help provide more context
How are the example checkpoints obtained? I think we're only dealing with pre-quantized checkpoints in this PR?
There was a problem hiding this comment.
Yes we are only dealing with pre-quantized checkpoint here. Perhaps we can leave a comment that said the checkpoints is quantized with diffuse-compressor + run diffuser format converter?
| @@ -0,0 +1,161 @@ | |||
| import json | |||
There was a problem hiding this comment.
For tests, WDYT of adding a mixin to https://github.com/huggingface/diffusers/blob/main/tests/models/testing_utils/quantization.py and then extending a popular model like Flux to use that mixin?
There was a problem hiding this comment.
Yes, let's do it that way
|
I have just implemented the native loading feature, which now can load by
import torch
from diffusers import ErnieImagePipeline
pipe = ErnieImagePipeline.from_pretrained(
"rootonchair/ERNIE-Image-Turbo-nunchaku-lite-int4",
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
prompt="A modern red armchair in a quiet studio, soft window light, realistic product photography",
height=1024,
width=1024,
num_inference_steps=8,
guidance_scale=1.0,
use_pe=False,
).images[0]
image.save("ernie-image-turbo-nunchaku-lite-int4.png")Quantization config now change to: If we agree to use this schema, I will remove the old metadata/from_single_file approach |
sayakpaul
left a comment
There was a problem hiding this comment.
Looking good. I think we can remove all metadata related code?
| def is_serializable(self): | ||
| return False | ||
|
|
||
| @property |
There was a problem hiding this comment.
We should set is_compileable() property too:
diffusers/src/diffusers/quantizers/base.py
Line 263 in 9159a58
| def __init__(self, compute_dtype: "torch.dtype" | None = None): | ||
| self.quant_method = QuantizationMethod.NUNCHAKU_LITE | ||
| self.compute_dtype = compute_dtype | ||
| self.pre_quantized = True |
There was a problem hiding this comment.
I'm a little confused here. Could you help provide more context
How are the example checkpoints obtained? I think we're only dealing with pre-quantized checkpoints in this PR?
| self.svdq_w4a4 = svdq_w4a4 | ||
| self.awq_w4a16 = awq_w4a16 |
There was a problem hiding this comment.
Do we need any validation around these two?
There was a problem hiding this comment.
Sure, we does need it


What does this PR do?
Adds Nunchaku Lite single-file checkpoint loading for Diffusers models.
This introduces
NunchakuLiteQuantizationConfigand a new Nunchaku Lite quantizer that can patch supported nn.Linear modules into runtime SVDQ/AWQ linear layers before strict checkpoint loading. The loader reads safetensors metadata duringfrom_single_fileso Nunchaku Lite checkpoints can use their embedded runtime manifest to decide which modules to replace.Deprecated API
New API for
from_single_fileuseFixes # (issue)
Before submitting
.ai/review-rules.md?documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.