feat(mm): add Qwen Image single-file checkpoint loader with fp8 support by Pfannkuchensack · Pull Request #9253 · invoke-ai/InvokeAI

Pfannkuchensack · 2026-05-30T05:48:05Z

Summary

Adds Main_Checkpoint_QwenImage_Config and QwenImageCheckpointModel so that single-file safetensors checkpoints (e.g. Qwen-Image-Edit 2511 fp8_scaled from Civitai) can be imported. ComfyUI-style fp8 weights are dequantized at load time; the existing default_settings.fp8_storage toggle then optionally re-casts to fp8 for VRAM savings.

Also wires _apply_fp8_layerwise_casting into the Qwen Image diffusers loader so the fp8 storage option works across all three formats (diffusers, single-file checkpoint; GGUF stays untouched as it carries its own quantization).

Shared variant inference (marker tensor → filename heuristic) and transformer architecture auto-detection are extracted into module-level helpers so the GGUF and checkpoint loaders stay in sync.

Additional fixes in this PR:

Memory-efficient dequantization. ComfyUI fp8_scaled weights are now dequantized directly to the compute dtype (bf16) instead of via a full-precision float32 intermediate. The previous path materialised a 4-byte/param copy of the entire model before downcasting, spiking peak RAM to ~2× the final bf16 size (~80 GB for the 20B transformer). bf16 shares float32's exponent range and fp8 carries only 3 mantissa bits, so there is no meaningful precision loss. Applies to both the transformer and the single-file Qwen2.5-VL encoder loaders.
Qwen2.5-VL vs. Qwen3 encoder disambiguation. A single-file Qwen2.5-VL encoder satisfies the Qwen3 key heuristic (model.layers.* / model.embed_tokens.weight), so it matched both Qwen3Encoder_Checkpoint_Config and QwenVLEncoder_Checkpoint_Config; the tiebreak misrouted it to Qwen3Encoder, hiding it from the Qwen Image loader's encoder field. The Qwen3 single-file/GGUF configs now reject state dicts carrying a Qwen-VL visual tower (visual.blocks.* / visual.patch_embed.*), making the two mutually exclusive. Text-only Qwen3 encoders (Z-Image, FLUX.2 Klein) are unaffected.
Silenced bitsandbytes log spam. The LLM.int8 path emitted a MatMul8bitLt: inputs will be cast from bfloat16 to float16 UserWarning on every matmul of every layer (LLM.int8 only supports fp16 activations; the bf16→fp16 cast is correct and intended). Suppressed once at import.

Related Issues / Discussions

Qwen-Image-Edit 2511 fp8_scaled.

QA Instructions

Running a quantized transformer (GGUF or fp8 single-file) together with a standalone VAE + standalone Qwen2.5-VL encoder avoids ever downloading the full ~40 GB diffusers pipeline.

Import a Qwen Image single-file checkpoint via the Model Manager. Tested files:
- GGUF transformer: qwen-image-edit-2511-Q4_K_M.gguf
- fp8_scaled Qwen2.5-VL encoder: qwen_2.5_vl_7b_fp8_scaled.safetensors
- (Optional) a Qwen-Image-Edit 2511 fp8_scaled transformer safetensors from Civitai, and a plain bf16/fp16 safetensors if available.
Confirm classification:
- The transformer checkpoint → Main / QwenImage / Checkpoint (not Diffusers, not GGUFQuantized), with the variant (edit vs generate) inferred correctly:
  - filename containing "edit" (case-insensitive) → edit
  - state dict containing __index_timestep_zero__ → edit
  - otherwise → generate
  - explicit override in import options must win.
- The Qwen2.5-VL encoder → QwenVLEncoder / Checkpoint (not Qwen3Encoder) and must be selectable in the Qwen2.5-VL Encoder field of the Main Model – Qwen Image loader node.
In the loader node, mix and match: GGUF/checkpoint Transformer + standalone Qwen Image VAE + standalone Qwen2.5-VL Encoder, leaving Component Source empty. Generate end-to-end and confirm a sensible image. For an Edit variant, verify the reference image actually conditions the output (dual modulation works).
Toggle FP8 Storage in the model's default settings and re-generate:
- Log line FP8 layerwise casting enabled for <model> ... should appear.
- Transformer VRAM should drop ~50%; output should remain visually equivalent.
- Repeat the toggle test for a diffusers-format Qwen Image model (previously fp8_storage was a no-op there).
Regression check — re-import a GGUF Qwen Image model and a diffusers folder Qwen Image model; both must still load and infer correctly (loader helpers were extracted, behavior should be identical). Confirm a standalone text-only Qwen3 encoder (Z-Image / FLUX.2 Klein) still classifies as Qwen3Encoder.

Run the relevant tests:

uv run --extra cuda pytest \
  tests/backend/model_manager/configs/test_qwen_image_checkpoint_variant_detection.py \
  tests/backend/model_manager/configs/ \
  tests/backend/model_manager/load/test_load_default_fp8.py

Testing Status

Tested locally with:

qwen_2.5_vl_7b_fp8_scaled.safetensors (standalone Qwen2.5-VL encoder)
qwen-image-edit-2511-Q4_K_M.gguf (GGUF transformer)

Merge Plan

Standard merge — no DB schema changes, no migrations needed. The new config class registers in the discriminator union but only matches files that are explicitly Qwen Image single-file checkpoints (not GGUF, not diffusers), so it cannot accidentally re-classify existing models. Note: the Qwen2.5-VL/Qwen3 disambiguation only affects new classifications — an encoder imported before this PR stays Qwen3Encoder until re-imported.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration — n/a, backend only
Documentation added / updated (if applicable) — n/a, no user-facing config changes
Updated What's New copy (if doing a release after this PR)

…h fp8 support Adds Main_Checkpoint_QwenImage_Config and QwenImageCheckpointModel so that single-file safetensors checkpoints (e.g. Qwen-Image-Edit 2511 fp8_scaled from Civitai) can be imported. ComfyUI-style fp8 weights are dequantized to bf16 at load time; the existing default_settings.fp8_storage toggle then optionally re-casts to fp8 for VRAM savings. Also wires _apply_fp8_layerwise_casting into the Qwen Image diffusers loader so the fp8 storage option works across all three formats (diffusers, single- file checkpoint, GGUF stays untouched as it carries its own quantization). Shared variant inference (marker tensor → filename heuristic) and transformer architecture auto-detection are extracted into module-level helpers so the GGUF and checkpoint loaders stay in sync.

lstein

Code review

Solid, well-documented change. Good refactor of the duplicated GGUF logic into _strip_comfyui_prefix / _build_qwen_image_transformer_config / _infer_qwen_image_variant, tests pass, and ruff is clean. A few issues worth addressing.

1. [Medium] Detection and loading disagree about the ComfyUI prefix

The loader strips model.diffusion_model. / diffusion_model. prefixes via _strip_comfyui_prefix (qwen_image.py:240, :292), but the config probe never strips them. Main_Checkpoint_QwenImage_Config.from_model_on_disk (main.py:1383) calls _has_qwen_image_keys(sd) on the raw state dict, and that check uses strict startswith("txt_in.") / "txt_norm." / "img_in.") (main.py:1338-1340). ModelOnDisk.load_state_dict (model_on_disk.py:81) does no prefix normalization.

So a ComfyUI checkpoint whose keys are actually prefixed (model.diffusion_model.txt_in.weight) will fail identification and never reach the new loader — even though the loader was specifically built to strip that prefix. The config docstring explicitly claims "Covers… ComfyUI-style fp8_scaled checkpoints" (main.py:1361-1363), so this is a real gap.

The same inconsistency is pre-existing in Main_GGUF_QwenImage_Config, which suggests the files tested so far have bare keys and the prefix-stripping is defensive. Two ways to resolve:

If prefixed files are a real input → strip the prefix in _has_qwen_image_keys (or before calling it) so detection and loading agree.
If they're not → the _strip_comfyui_prefix calls are effectively dead and the docstring overstates coverage.

Worth confirming which, since right now the two paths can't both be right.

2. [Low] `QwenVLEncoderCheckpointLoader` still inlines the now-extracted helpers

The PR extracted _dequantize_comfyui_fp8 (qwen_image.py:51) and _strip_quantization_metadata (qwen_image.py:84), but QwenVLEncoderCheckpointLoader._load_text_encoder_from_singlefile (qwen_image.py:429-472) still carries verbatim copies of both blocks — ~45 lines, identical down to the comments. Since these are now module-level helpers in the same file, the encoder loader should call them. Leaving two copies means a future fix to the dequant logic has to be applied twice.

3. [Low] `make_room` estimate is taken before the float32→bf16 cast

In _load_from_singlefile (qwen_image.py:305-310), new_sd_size is computed with model_dtype.itemsize (bf16 = 2 bytes) and make_room is called before the cast loop. But at that moment the dequantized weights in sd are float32 (weight_float * scale_float → fp32, qwen_image.py:79), so the actual transient footprint is ~2× the estimate. For a model this size that's a non-trivial undercount feeding the cache eviction logic. The QwenVL loader (qwen_image.py:517) avoids this by computing the size after casting with actual element_size(). Consider reordering (cast, then size, then make_room) for consistency, or estimating with fp32 width.

4. [Nit] Missing type hint on `_infer_qwen_image_variant`

def _infer_qwen_image_variant(sd: ..., path) (main.py:1346) — path is untyped; it should be Path. The function relies on path.stem.

5. [Nit] Filename `"edit"` substring heuristic is broad

_infer_qwen_image_variant treats any "edit" substring in the stem as the Edit variant (main.py:1355). Names like credited, edited, or unedited would false-positive. This is moved-not-new logic, and the marker-tensor check takes precedence, so it's low risk — but a word-boundary match would be safer if you touch it.

Checked, not issues

The override_fields.pop("variant", None) or _infer_... pattern (main.py:1387) is safe: QwenImageVariantType is a str Enum, so Generate is truthy (covered by test_explicit_variant_override_not_overwritten).
_dequantize_comfyui_fp8's weight_key is always bound, since weight_scale_keys is pre-filtered to keys ending in one of scale_suffixes.
_strip_quantization_metadata correctly removes the .scale_input keys that _dequantize_comfyui_fp8 leaves behind.
The new config correctly rejects GGUF and non-Qwen state dicts, with tests covering both — so it won't double-match with Main_GGUF_QwenImage_Config.

Recommendation: address #1 (verify/fix the prefix detection gap) before merge; #2–#5 are cleanups that can ride along or follow up.

🤖 Generated with Claude Code

lstein

So far I haven't been able to run generations with Qwen Image Edit 2511 fp8 . Generation gets to the text encoder loading message and then the whole InvokeAI process dies with "Killed". Sometimes it brings the shell down with it, and once it locked up my machine and I had to cold reboot.

I get the same behavior regardless of whether fp8 storage is active or not.

…edit heuristic, dedupe fp8 helpers - strip ComfyUI key prefixes in _has_qwen_image_keys so prefixed checkpoints are identified and reach the loader - match "edit" as a filename token instead of any substring (no credited/edited/unedited false positives) - reuse _dequantize_comfyui_fp8 / _strip_quantization_metadata in the QwenVL encoder loader - size make_room reservation after the bf16 cast to avoid fp32 undercount - add Path type hint on _infer_qwen_image_variant

lstein · 2026-06-06T16:29:51Z

Attempts to generate using qwenImageEdit2511_fp8.safetensors from Civitai reproducibly have a hard crash. Stack trace appended. Also note the log messages indicating that the system tries to load the text encoder twice.

[2026-06-06 12:25:02,934]::[InvokeAI]::INFO --> Executing queue item 20, session 519d17e7-633d-41db-b8c2-3364a06afd36
[2026-06-06 12:25:03,243]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '53272211-ed62-4a29-b2fd-0611f25be722:vae' (AutoencoderKLQwenImage) onto cuda device in 0.06s. Total model size: 242.03MB, VRAM: 242.03MB (100.0%)
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 14.58it/s]
[2026-06-06 12:25:16,667]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '3804fea9-d0a3-4fff-8fd0-d062fbe4d068:text_encoder' (Qwen2_5_VLForConditionalGeneration) onto cuda device in 11.89s. Total model size: 15816.05MB, VRAM: 9439.05MB (59.7%)
[2026-06-06 12:25:24,566]::[InvokeAI]::WARNING --> Loading 0.0 MB into VRAM, but only -38.1875 MB were requested. This is the minimum set of weights in VRAM required to run the model.
[2026-06-06 12:25:24,569]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '3804fea9-d0a3-4fff-8fd0-d062fbe4d068:text_encoder' (Qwen2_5_VLForConditionalGeneration) onto cuda device in 0.02s. Total model size: 15816.05MB, VRAM: 9392.10MB (59.4%)
Killed

Pfannkuchensack · 2026-06-15T23:59:02Z

i need to dig a bit deeper. it is running but it needs a lots of vram/ram

…ilence int8 warning - qwen_image: dequantize ComfyUI fp8_scaled weights directly to compute_dtype instead of a full-precision float32 intermediate. The previous path materialised a 4-byte/param copy of the whole model before downcasting, spiking peak RAM to ~2x the final bf16 size (~80GB for the 20B transformer). bf16 shares float32's exponent range and fp8 has only 3 mantissa bits, so no meaningful precision loss. - qwen3_encoder: reject checkpoints that bundle a Qwen-VL visual tower (visual.blocks.* / visual.patch_embed.*). A Qwen2.5-VL file satisfies the Qwen3 key heuristic too, so it matched both configs and the tiebreak misrouted it to Qwen3Encoder, hiding it from the Qwen Image loader's encoder field. Qwen3 (text) and QwenVLEncoder (vision+language) are now mutually exclusive. - bnb_llm_int8: silence the per-matmul "inputs will be cast from bfloat16 to float16" UserWarning. LLM.int8 only supports fp16 activations; the bf16->fp16 cast is correct and intended, so the warning is pure log spam on every layer.

Pfannkuchensack requested review from JPPhoto, blessedcoolant, dunkeroni and lstein as code owners May 30, 2026 05:48

github-actions Bot added python PRs that change python files backend PRs that change backend files python-tests PRs that change python tests labels May 30, 2026

Pfannkuchensack added 2 commits May 30, 2026 07:55

Merge branch 'main' into feat/qwen-image-checkpoint-loader

6e97d89

Merge branch 'main' into feat/qwen-image-checkpoint-loader

61cbc8d

lstein self-assigned this Jun 3, 2026

lstein added the 6.13.5 Library Updates label Jun 3, 2026

lstein added this to Invoke - Community Roadmap Jun 3, 2026

lstein moved this to 6.13.5 LIBRARY UPDATES in Invoke - Community Roadmap Jun 3, 2026

Pfannkuchensack and others added 2 commits June 3, 2026 21:36

Merge branch 'main' into feat/qwen-image-checkpoint-loader

e77cd22

chore(frontend): openapi & typegen

833e487

github-actions Bot added the frontend PRs that change frontend files label Jun 5, 2026

Merge branch 'main' into feat/qwen-image-checkpoint-loader

e13a666

lstein reviewed Jun 5, 2026

View reviewed changes

lstein requested changes Jun 5, 2026

View reviewed changes

Pfannkuchensack added 2 commits June 8, 2026 22:27

Merge branch 'main' into feat/qwen-image-checkpoint-loader

2dbd00c

Merge branch 'main' into feat/qwen-image-checkpoint-loader

4ed37db

Pfannkuchensack added 2 commits June 16, 2026 03:36

Chore Ruff

8b8e034

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mm): add Qwen Image single-file checkpoint loader with fp8 support#9253

feat(mm): add Qwen Image single-file checkpoint loader with fp8 support#9253
Pfannkuchensack wants to merge 11 commits into
invoke-ai:mainfrom
Pfannkuchensack:feat/qwen-image-checkpoint-loader

Pfannkuchensack commented May 30, 2026 •

edited

Loading

Uh oh!

lstein left a comment

Uh oh!

lstein left a comment

Uh oh!

lstein commented Jun 6, 2026

Uh oh!

Pfannkuchensack commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Pfannkuchensack commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issues / Discussions

QA Instructions

Testing Status

Merge Plan

Checklist

Uh oh!

lstein left a comment

Choose a reason for hiding this comment

Code review

1. [Medium] Detection and loading disagree about the ComfyUI prefix

2. [Low] QwenVLEncoderCheckpointLoader still inlines the now-extracted helpers

3. [Low] make_room estimate is taken before the float32→bf16 cast

4. [Nit] Missing type hint on _infer_qwen_image_variant

5. [Nit] Filename "edit" substring heuristic is broad

Checked, not issues

Uh oh!

lstein left a comment

Choose a reason for hiding this comment

Uh oh!

lstein commented Jun 6, 2026

Uh oh!

Pfannkuchensack commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pfannkuchensack commented May 30, 2026 •

edited

Loading

2. [Low] `QwenVLEncoderCheckpointLoader` still inlines the now-extracted helpers

3. [Low] `make_room` estimate is taken before the float32→bf16 cast

4. [Nit] Missing type hint on `_infer_qwen_image_variant`

5. [Nit] Filename `"edit"` substring heuristic is broad