Fix group offloading for quanto-quantized models#14038
Conversation
…ath for quantized tensor subclasses
|
Group offloading should have been fixed, though with #13276. Can you check again? |
|
Hi @sayakpaul, thanks. Yes, I rechecked against #13276 before opening this. #13276 makes group offloading work for torchao by swapping the subclass (
Both #12610 and #13281 are still open. I confirmed on current main (with #13276) vs this PROn approach: I deliberately mirrored the existing |
|
Can we focus on one issue at a time? Therefore, I would suggest splitting the PR into two. |
|
Thanks @sayakpaul. I split the TorchAO |
What does this PR do?
Fixes #12610
Group offloading moves a group's parameters between CPU and the accelerator by reassigning
param.data:This is correct for plain tensors but wrong for quanto tensor subclasses. A quanto
WeightQBytesTensorstores the real payload in internal tensors such as_dataand_scale; replacing.dataonly swaps the outer wrapper and leaves those internal tensors on the source device. The next matmul then fails withmat2 is on cpu, different from cuda:0.#13276 fixed the same subclass-storage issue for TorchAO tensors by swapping the full tensor subclass instead of assigning
.data, but quanto tensors still fall through to the plain tensor path. This PR adds the corresponding quanto path and keeps the TorchAO stream fix split out in #14112.Changes
QTensorparameters without importing optimum-quanto unless it is installed.torch.utils.swap_tensorsfor quanto onload/offload instead of assigning.data.__tensor_flatten__().pin_memory()does not preserve the quanto subclass.Tests
Environment: NVIDIA RTX 4090,
torch==2.8.0+cu128,optimum-quanto==0.2.7.Reproduction and before/after
Minimal standalone repro for #12610:
On main, this fails with:
With this PR, quanto group offload matches the fully-on-accelerator quantized baseline across
leaf_level,block_level, non-stream,use_stream, andrecord_streamconfigs. The maximum absolute difference is0.0.Regression tests:
Both tests fail on main with the device mismatch and pass with this PR.
Before submitting
.ai/review-rules.md?Who can review?
cc @sayakpaul