feat(mm): add Qwen Image single-file checkpoint loader with fp8 support#9253
feat(mm): add Qwen Image single-file checkpoint loader with fp8 support#9253Pfannkuchensack wants to merge 11 commits into
Conversation
…h fp8 support Adds Main_Checkpoint_QwenImage_Config and QwenImageCheckpointModel so that single-file safetensors checkpoints (e.g. Qwen-Image-Edit 2511 fp8_scaled from Civitai) can be imported. ComfyUI-style fp8 weights are dequantized to bf16 at load time; the existing default_settings.fp8_storage toggle then optionally re-casts to fp8 for VRAM savings. Also wires _apply_fp8_layerwise_casting into the Qwen Image diffusers loader so the fp8 storage option works across all three formats (diffusers, single- file checkpoint, GGUF stays untouched as it carries its own quantization). Shared variant inference (marker tensor → filename heuristic) and transformer architecture auto-detection are extracted into module-level helpers so the GGUF and checkpoint loaders stay in sync.
lstein
left a comment
There was a problem hiding this comment.
Code review
Solid, well-documented change. Good refactor of the duplicated GGUF logic into _strip_comfyui_prefix / _build_qwen_image_transformer_config / _infer_qwen_image_variant, tests pass, and ruff is clean. A few issues worth addressing.
1. [Medium] Detection and loading disagree about the ComfyUI prefix
The loader strips model.diffusion_model. / diffusion_model. prefixes via _strip_comfyui_prefix (qwen_image.py:240, :292), but the config probe never strips them. Main_Checkpoint_QwenImage_Config.from_model_on_disk (main.py:1383) calls _has_qwen_image_keys(sd) on the raw state dict, and that check uses strict startswith("txt_in.") / "txt_norm." / "img_in.") (main.py:1338-1340). ModelOnDisk.load_state_dict (model_on_disk.py:81) does no prefix normalization.
So a ComfyUI checkpoint whose keys are actually prefixed (model.diffusion_model.txt_in.weight) will fail identification and never reach the new loader — even though the loader was specifically built to strip that prefix. The config docstring explicitly claims "Covers… ComfyUI-style fp8_scaled checkpoints" (main.py:1361-1363), so this is a real gap.
The same inconsistency is pre-existing in Main_GGUF_QwenImage_Config, which suggests the files tested so far have bare keys and the prefix-stripping is defensive. Two ways to resolve:
- If prefixed files are a real input → strip the prefix in
_has_qwen_image_keys(or before calling it) so detection and loading agree. - If they're not → the
_strip_comfyui_prefixcalls are effectively dead and the docstring overstates coverage.
Worth confirming which, since right now the two paths can't both be right.
2. [Low] QwenVLEncoderCheckpointLoader still inlines the now-extracted helpers
The PR extracted _dequantize_comfyui_fp8 (qwen_image.py:51) and _strip_quantization_metadata (qwen_image.py:84), but QwenVLEncoderCheckpointLoader._load_text_encoder_from_singlefile (qwen_image.py:429-472) still carries verbatim copies of both blocks — ~45 lines, identical down to the comments. Since these are now module-level helpers in the same file, the encoder loader should call them. Leaving two copies means a future fix to the dequant logic has to be applied twice.
3. [Low] make_room estimate is taken before the float32→bf16 cast
In _load_from_singlefile (qwen_image.py:305-310), new_sd_size is computed with model_dtype.itemsize (bf16 = 2 bytes) and make_room is called before the cast loop. But at that moment the dequantized weights in sd are float32 (weight_float * scale_float → fp32, qwen_image.py:79), so the actual transient footprint is ~2× the estimate. For a model this size that's a non-trivial undercount feeding the cache eviction logic. The QwenVL loader (qwen_image.py:517) avoids this by computing the size after casting with actual element_size(). Consider reordering (cast, then size, then make_room) for consistency, or estimating with fp32 width.
4. [Nit] Missing type hint on _infer_qwen_image_variant
def _infer_qwen_image_variant(sd: ..., path) (main.py:1346) — path is untyped; it should be Path. The function relies on path.stem.
5. [Nit] Filename "edit" substring heuristic is broad
_infer_qwen_image_variant treats any "edit" substring in the stem as the Edit variant (main.py:1355). Names like credited, edited, or unedited would false-positive. This is moved-not-new logic, and the marker-tensor check takes precedence, so it's low risk — but a word-boundary match would be safer if you touch it.
Checked, not issues
- The
override_fields.pop("variant", None) or _infer_...pattern (main.py:1387) is safe:QwenImageVariantTypeis astrEnum, soGenerateis truthy (covered bytest_explicit_variant_override_not_overwritten). _dequantize_comfyui_fp8'sweight_keyis always bound, sinceweight_scale_keysis pre-filtered to keys ending in one ofscale_suffixes._strip_quantization_metadatacorrectly removes the.scale_inputkeys that_dequantize_comfyui_fp8leaves behind.- The new config correctly rejects GGUF and non-Qwen state dicts, with tests covering both — so it won't double-match with
Main_GGUF_QwenImage_Config.
Recommendation: address #1 (verify/fix the prefix detection gap) before merge; #2–#5 are cleanups that can ride along or follow up.
🤖 Generated with Claude Code
lstein
left a comment
There was a problem hiding this comment.
So far I haven't been able to run generations with Qwen Image Edit 2511 fp8 . Generation gets to the text encoder loading message and then the whole InvokeAI process dies with "Killed". Sometimes it brings the shell down with it, and once it locked up my machine and I had to cold reboot.
I get the same behavior regardless of whether fp8 storage is active or not.
…edit heuristic, dedupe fp8 helpers - strip ComfyUI key prefixes in _has_qwen_image_keys so prefixed checkpoints are identified and reach the loader - match "edit" as a filename token instead of any substring (no credited/edited/unedited false positives) - reuse _dequantize_comfyui_fp8 / _strip_quantization_metadata in the QwenVL encoder loader - size make_room reservation after the bf16 cast to avoid fp32 undercount - add Path type hint on _infer_qwen_image_variant
|
Attempts to generate using qwenImageEdit2511_fp8.safetensors from Civitai reproducibly have a hard crash. Stack trace appended. Also note the log messages indicating that the system tries to load the text encoder twice. |
…ilence int8 warning - qwen_image: dequantize ComfyUI fp8_scaled weights directly to compute_dtype instead of a full-precision float32 intermediate. The previous path materialised a 4-byte/param copy of the whole model before downcasting, spiking peak RAM to ~2x the final bf16 size (~80GB for the 20B transformer). bf16 shares float32's exponent range and fp8 has only 3 mantissa bits, so no meaningful precision loss. - qwen3_encoder: reject checkpoints that bundle a Qwen-VL visual tower (visual.blocks.* / visual.patch_embed.*). A Qwen2.5-VL file satisfies the Qwen3 key heuristic too, so it matched both configs and the tiebreak misrouted it to Qwen3Encoder, hiding it from the Qwen Image loader's encoder field. Qwen3 (text) and QwenVLEncoder (vision+language) are now mutually exclusive. - bnb_llm_int8: silence the per-matmul "inputs will be cast from bfloat16 to float16" UserWarning. LLM.int8 only supports fp16 activations; the bf16->fp16 cast is correct and intended, so the warning is pure log spam on every layer.

Summary
Adds
Main_Checkpoint_QwenImage_ConfigandQwenImageCheckpointModelso that single-file safetensors checkpoints (e.g. Qwen-Image-Edit 2511 fp8_scaled from Civitai) can be imported. ComfyUI-style fp8 weights are dequantized at load time; the existingdefault_settings.fp8_storagetoggle then optionally re-casts to fp8 for VRAM savings.Also wires
_apply_fp8_layerwise_castinginto the Qwen Image diffusers loader so the fp8 storage option works across all three formats (diffusers, single-file checkpoint; GGUF stays untouched as it carries its own quantization).Shared variant inference (marker tensor → filename heuristic) and transformer architecture auto-detection are extracted into module-level helpers so the GGUF and checkpoint loaders stay in sync.
Additional fixes in this PR:
model.layers.*/model.embed_tokens.weight), so it matched bothQwen3Encoder_Checkpoint_ConfigandQwenVLEncoder_Checkpoint_Config; the tiebreak misrouted it toQwen3Encoder, hiding it from the Qwen Image loader's encoder field. The Qwen3 single-file/GGUF configs now reject state dicts carrying a Qwen-VL visual tower (visual.blocks.*/visual.patch_embed.*), making the two mutually exclusive. Text-only Qwen3 encoders (Z-Image, FLUX.2 Klein) are unaffected.MatMul8bitLt: inputs will be cast from bfloat16 to float16UserWarningon every matmul of every layer (LLM.int8 only supports fp16 activations; the bf16→fp16 cast is correct and intended). Suppressed once at import.Related Issues / Discussions
Qwen-Image-Edit 2511 fp8_scaled.
QA Instructions
Running a quantized transformer (GGUF or fp8 single-file) together with a standalone VAE + standalone Qwen2.5-VL encoder avoids ever downloading the full ~40 GB diffusers pipeline.
qwen-image-edit-2511-Q4_K_M.ggufqwen_2.5_vl_7b_fp8_scaled.safetensorsMain/QwenImage/Checkpoint(notDiffusers, notGGUFQuantized), with the variant (editvsgenerate) inferred correctly:edit__index_timestep_zero__→editgenerateQwenVLEncoder/Checkpoint(notQwen3Encoder) and must be selectable in the Qwen2.5-VL Encoder field of the Main Model – Qwen Image loader node.FP8 layerwise casting enabled for <model> ...should appear.Qwen3Encoder.Testing Status
Tested locally with:
qwen_2.5_vl_7b_fp8_scaled.safetensors(standalone Qwen2.5-VL encoder)qwen-image-edit-2511-Q4_K_M.gguf(GGUF transformer)Merge Plan
Standard merge — no DB schema changes, no migrations needed. The new config class registers in the discriminator union but only matches files that are explicitly Qwen Image single-file checkpoints (not GGUF, not diffusers), so it cannot accidentally re-classify existing models. Note: the Qwen2.5-VL/Qwen3 disambiguation only affects new classifications — an encoder imported before this PR stays
Qwen3Encoderuntil re-imported.Checklist
What's Newcopy (if doing a release after this PR)