Skip to content

[Bug] GlmImagePipeline silently corrupts weights on MPS accelerator #13227

@yingding

Description

@yingding

Describe the bug

When loading zai-org/GLM-Image with device_map="mps" in diffusers, some model parameters become silently corrupted during GlmImagePipeline.from_pretrained call.

The corruption:

Happens only when tensors are placed directly on MPS during loading
Is non-deterministic across dtypes
  • float32 + MPS: weights corrupted, bias OK
  • float16 + MPS: bias corrupted, weights OK

Does not occur when loading on CPU first and then moving to MPS

This results in extreme values (~1e37), LayerNorm overflow, and NaN / zero outputs (all-black images).

Reproduction

❌ Corrupted

from diffusers.pipelines.glm_image import GlmImagePipeline
import torch

pipe = GlmImagePipeline.from_pretrained(
    "zai-org/GLM-Image",
    torch_dtype=torch.float32,
    device_map="mps",
)

✅ Correct workaround

from diffusers.pipelines.glm_image import GlmImagePipeline
import torch

pipe = GlmImagePipeline.from_pretrained(
    "zai-org/GLM-Image",
    torch_dtype=torch.float32,
)
pipe.to("mps")

Logs

Device: mps, dtype: torch.float32
Keyword arguments {'trust_remote_code': True} are not expected by GlmImagePipeline and will be ignored.

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/1011 [00:00<?, ?it/s]�[A

Loading weights:   0%|          | 1/1011 [00:01<19:02,  1.13s/it]�[A

Loading weights:   1%|1         | 11/1011 [00:01<01:31, 10.97it/s]�[A

Loading weights:   2%|1         | 17/1011 [00:01<01:00, 16.51it/s]�[A

Loading weights:   2%|2         | 21/1011 [00:01<00:50, 19.57it/s]�[A

Loading weights:   4%|4         | 43/1011 [00:01<00:20, 47.21it/s]�[A

Loading weights:   5%|4         | 50/1011 [00:01<00:21, 44.71it/s]�[A

Loading weights:   7%|6         | 70/1011 [00:02<00:17, 54.72it/s]�[A

Loading weights:   8%|8         | 83/1011 [00:02<00:17, 53.99it/s]�[A

Loading weights:   9%|9         | 96/1011 [00:02<00:15, 57.71it/s]�[A

Loading weights:  11%|#         | 109/1011 [00:02<00:15, 59.35it/s]�[A

Loading weights:  12%|#2        | 122/1011 [00:02<00:13, 67.90it/s]�[A

Loading weights:  13%|#3        | 135/1011 [00:03<00:13, 65.96it/s]�[A

Loading weights:  15%|#4        | 148/1011 [00:03<00:14, 61.61it/s]�[A

Loading weights:  16%|#5        | 161/1011 [00:03<00:13, 64.29it/s]�[A

Loading weights:  17%|#7        | 174/1011 [00:03<00:12, 66.14it/s]�[A

Loading weights:  18%|#8        | 187/1011 [00:04<00:13, 59.70it/s]�[A

Loading weights:  20%|#9        | 200/1011 [00:04<00:12, 63.92it/s]�[A

Loading weights:  21%|##1       | 213/1011 [00:04<00:10, 75.40it/s]�[A

Loading weights:  22%|##2       | 226/1011 [00:04<00:11, 66.29it/s]�[A

Loading weights:  24%|##3       | 239/1011 [00:04<00:12, 64.24it/s]�[A

Loading weights:  25%|##4       | 252/1011 [00:05<00:11, 64.00it/s]�[A

Loading weights:  26%|##6       | 265/1011 [00:05<00:11, 66.00it/s]�[A

Loading weights:  27%|##7       | 278/1011 [00:05<00:11, 66.18it/s]�[A

Loading weights:  29%|##8       | 291/1011 [00:05<00:11, 60.74it/s]�[A

Loading weights:  30%|###       | 304/1011 [00:05<00:11, 62.85it/s]�[A

Loading weights:  31%|###1      | 317/1011 [00:06<00:11, 63.08it/s]�[A

Loading weights:  33%|###2      | 330/1011 [00:06<00:11, 60.75it/s]�[A

Loading weights:  34%|###3      | 343/1011 [00:06<00:11, 60.35it/s]�[A

Loading weights:  35%|###5      | 356/1011 [00:06<00:10, 62.33it/s]�[A

Loading weights:  36%|###6      | 369/1011 [00:06<00:09, 71.00it/s]�[A

Loading weights:  38%|###7      | 382/1011 [00:07<00:09, 65.62it/s]�[A

Loading weights:  39%|###9      | 395/1011 [00:07<00:09, 65.50it/s]�[A

Loading weights:  40%|####      | 408/1011 [00:07<00:09, 66.05it/s]�[A

Loading weights:  42%|####1     | 421/1011 [00:07<00:09, 64.67it/s]�[A

Loading weights:  43%|####2     | 434/1011 [00:07<00:09, 62.73it/s]�[A

Loading weights:  44%|####4     | 447/1011 [00:08<00:09, 60.69it/s]�[A

Loading weights:  45%|####5     | 460/1011 [00:08<00:08, 63.92it/s]�[A

Loading weights:  47%|####6     | 473/1011 [00:08<00:08, 61.30it/s]�[A

Loading weights:  48%|####8     | 486/1011 [00:08<00:08, 61.70it/s]�[A

Loading weights:  49%|####9     | 499/1011 [00:08<00:08, 61.26it/s]�[A

Loading weights:  56%|#####5    | 565/1011 [00:09<00:02, 160.02it/s]�[A

Loading weights:  61%|######    | 613/1011 [00:09<00:01, 217.39it/s]�[A

Loading weights:  64%|######4   | 649/1011 [00:09<00:01, 246.45it/s]�[A

Loading weights:  69%|######9   | 699/1011 [00:09<00:01, 299.36it/s]�[A

Loading weights:  75%|#######4  | 755/1011 [00:09<00:00, 358.18it/s]�[A

Loading weights:  79%|#######8  | 796/1011 [00:09<00:00, 358.00it/s]�[A

Loading weights:  83%|########3 | 843/1011 [00:09<00:00, 377.86it/s]�[A

Loading weights:  89%|########9 | 901/1011 [00:09<00:00, 414.18it/s]�[A

Loading weights:  94%|#########4| 951/1011 [00:09<00:00, 436.78it/s]�[A

Loading weights:  99%|#########8| 997/1011 [00:10<00:00, 441.35it/s]�[A
Loading weights: 100%|##########| 1011/1011 [00:10<00:00, 100.58it/s]

Loading pipeline components...:  14%|#4        | 1/7 [00:10<01:01, 10.25s/it]
Loading pipeline components...:  29%|##8       | 2/7 [00:11<00:23,  4.78s/it]

Loading weights:   0%|          | 0/111 [00:00<?, ?it/s]�[A

Loading weights:  25%|##5       | 28/111 [00:00<00:00, 279.41it/s]�[A

Loading weights:  56%|#####5    | 62/111 [00:00<00:00, 306.42it/s]�[A

Loading weights:  87%|########7 | 97/111 [00:00<00:00, 320.95it/s]�[A
Loading weights: 100%|##########| 111/111 [00:00<00:00, 314.30it/s]

Loading pipeline components...:  43%|####2     | 3/7 [00:11<00:11,  2.77s/it]
Loading pipeline components...:  57%|#####7    | 4/7 [00:11<00:05,  1.72s/it]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]�[A

Loading checkpoint shards:  33%|###3      | 1/3 [00:02<00:04,  2.33s/it]�[A

Loading checkpoint shards:  67%|######6   | 2/3 [00:04<00:02,  2.34s/it]�[A

Loading checkpoint shards: 100%|##########| 3/3 [00:06<00:00,  2.11s/it]�[A
Loading checkpoint shards: 100%|##########| 3/3 [00:06<00:00,  2.17s/it]

Loading pipeline components...:  86%|########5 | 6/7 [00:18<00:02,  2.54s/it]
Loading pipeline components...: 100%|##########| 7/7 [00:18<00:00,  1.98s/it]
Loading pipeline components...: 100%|##########| 7/7 [00:18<00:00,  2.67s/it]

=== Transformer top-level children ===
  rope: GlmImageRotaryPosEmbed
  image_projector: GlmImageImageProjector
  glyph_projector: FeedForward
  prior_token_embedding: Embedding
  prior_projector: FeedForward
  time_condition_embed: GlmImageCombinedTimestepSizeEmbeddings
  transformer_blocks: ModuleList
  norm_out: GlmImageAdaLayerNormContinuous
  proj_out: Linear
  Hooking 30 transformer_blocks individually...

=== Block[0] sub-modules ===
  block0.norm1: GlmImageAdaLayerNormZero
    block0.norm1.norm: LayerNorm
    block0.norm1.norm_context: LayerNorm
    block0.norm1.linear: Linear
  block0.attn1: Attention
    block0.attn1.norm_q: LayerNorm
    block0.attn1.norm_k: LayerNorm
    block0.attn1.to_q: Linear
    block0.attn1.to_k: Linear
    block0.attn1.to_v: Linear
    block0.attn1.to_out: ModuleList
  block0.norm2: LayerNorm
  block0.norm2_context: LayerNorm
  block0.ff: FeedForward
    block0.ff.net: ModuleList

=== Running 1-step inference ===

  0%|          | 0/1 [00:00<?, ?it/s]  OK  rope output[0]: shape=[4608, 128] min=-1 max=1
  OK  rope output[1]: shape=[4608, 128] min=-1 max=1
  OK  rope INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
  *** NaN in image_projector output[0]: shape=[1, 4608, 4096] NaN=41472/18874368 clean_min=-2.985e+30 clean_max=3.325e+30
  OK  image_projector INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
  OK  glyph_projector output[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  OK  glyph_projector INPUT[0]: shape=[1, 1, 1472] min=-0.3495 max=0.4175
  OK  prior_token_embedding output[0]: shape=[1, 4608, 4096] min=-0.1943 max=0.1709
  OK  prior_token_embedding INPUT[0]: shape=[1, 4608] min=149 max=1.632e+04
  OK  prior_projector output[0]: shape=[1, 4608, 4096] min=-0.2291 max=0.1654
  OK  prior_projector INPUT[0]: shape=[1, 4608, 4096] min=-0.1943 max=0.1709
  *** NaN in time_condition_embed output[0]: shape=[1, 512] NaN=512/512
  OK  time_condition_embed INPUT[0]: shape=[1] min=999 max=999
  OK  time_condition_embed INPUT[1]: shape=[1, 2] min=1024 max=1152
  OK  time_condition_embed INPUT[2]: shape=[1, 2] min=0 max=0
  *** NaN in block0.norm1.norm output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm1.norm INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  block0.norm1.norm_context output[0]: shape=[1, 1, 4096] min=-6.959 max=4.698
  OK  block0.norm1.norm_context INPUT[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in block0.norm1.linear output[0]: shape=[1, 49152] NaN=49152/49152
  *** NaN in block0.norm1.linear INPUT[0]: shape=[1, 512] NaN=512/512
  *** NaN in block0.norm1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm1 output[1]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[2]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[3]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[4]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[5]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[6]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[7]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[8]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[9]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  block0.norm1 INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in block0.norm1 INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in block0.attn1.to_q output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_q INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_k output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_k INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_v output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_v INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_q output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_q INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_k output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_k INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.attn1 output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm2 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm2 INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm2_context output[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm2_context INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.ff output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.ff INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.ff output[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.ff INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[0] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[0] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[0] INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  transformer_blocks[0] INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in transformer_blocks[0] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[1] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[1] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[1] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[1] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[1] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[2] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[2] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[2] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[2] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[2] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[3] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[3] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[3] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[3] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[3] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[4] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[4] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[4] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[4] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[4] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[5] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[5] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[5] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[5] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[5] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[6] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[6] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[6] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[6] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[6] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[7] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[7] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[7] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[7] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[7] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[8] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[8] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[8] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[8] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[8] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[9] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[9] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[9] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[9] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[9] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[10] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[10] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[10] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[10] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[10] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[11] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[11] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[11] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[11] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[11] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[12] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[12] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[12] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[12] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[12] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[13] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[13] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[13] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[13] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[13] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[14] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[14] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[14] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[14] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[14] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[15] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[15] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[15] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[15] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[15] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[16] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[16] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[16] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[16] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[16] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[17] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[17] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[17] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[17] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[17] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[18] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[18] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[18] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[18] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[18] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[19] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[19] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[19] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[19] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[19] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[20] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[20] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[20] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[20] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[20] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[21] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[21] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[21] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[21] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[21] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[22] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[22] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[22] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[22] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[22] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[23] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[23] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[23] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[23] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[23] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[24] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[24] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[24] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[24] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[24] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[25] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[25] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[25] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[25] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[25] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[26] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[26] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[26] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[26] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[26] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[27] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[27] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[27] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[27] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[27] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[28] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[28] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[28] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[28] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[28] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[29] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[29] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[29] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[29] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[29] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in norm_out output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in norm_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in norm_out INPUT[1]: shape=[1, 512] NaN=512/512
  *** NaN in proj_out output[0]: shape=[1, 4608, 64] NaN=294912/294912
  *** NaN in proj_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  OK  rope output[0]: shape=[4608, 128] min=-1 max=1
  OK  rope output[1]: shape=[4608, 128] min=-1 max=1
  OK  rope INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
  *** NaN in image_projector output[0]: shape=[1, 4608, 4096] NaN=41472/18874368 clean_min=-2.985e+30 clean_max=3.325e+30
  OK  image_projector INPUT[0]: shape=[1, 16, 128, 144] min=-4.407 max=4.957
  OK  glyph_projector output[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  OK  glyph_projector INPUT[0]: shape=[1, 1, 1472] min=-0.3495 max=0.4175
  OK  prior_token_embedding output[0]: shape=[1, 4608, 4096] min=-0.1943 max=0.1709
  OK  prior_token_embedding INPUT[0]: shape=[1, 4608] min=149 max=1.632e+04
  OK  prior_projector output[0]: shape=[1, 4608, 4096] min=-0.2224 max=0.1475
  OK  prior_projector INPUT[0]: shape=[1, 4608, 4096] min=-0 max=0
  *** NaN in time_condition_embed output[0]: shape=[1, 512] NaN=512/512
  OK  time_condition_embed INPUT[0]: shape=[1] min=999 max=999
  OK  time_condition_embed INPUT[1]: shape=[1, 2] min=1024 max=1152
  OK  time_condition_embed INPUT[2]: shape=[1, 2] min=0 max=0
  *** NaN in block0.norm1.norm output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm1.norm INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  block0.norm1.norm_context output[0]: shape=[1, 1, 4096] min=-6.959 max=4.698
  OK  block0.norm1.norm_context INPUT[0]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in block0.norm1.linear output[0]: shape=[1, 49152] NaN=49152/49152
  *** NaN in block0.norm1.linear INPUT[0]: shape=[1, 512] NaN=512/512
  *** NaN in block0.norm1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm1 output[1]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[2]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[3]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[4]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[5]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[6]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[7]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[8]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 output[9]: shape=[1, 4096] NaN=4096/4096
  *** NaN in block0.norm1 INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  block0.norm1 INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in block0.norm1 INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in block0.attn1.to_q output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_q INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_k output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_k INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_v output[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.to_v INPUT[0]: shape=[1, 4609, 4096] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_q output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_q INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_k output[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1.norm_k INPUT[0]: shape=[1, 4609, 32, 128] NaN=18878464/18878464
  *** NaN in block0.attn1 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.attn1 output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm2 output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm2 INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.norm2_context output[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.norm2_context INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.ff output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.ff INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in block0.ff output[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in block0.ff INPUT[0]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[0] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[0] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[0] INPUT[0]: shape=[1, 4608, 4096] NaN=41472/18874368
  OK  transformer_blocks[0] INPUT[1]: shape=[1, 1, 4096] min=-0.2323 max=0.1581
  *** NaN in transformer_blocks[0] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[1] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[1] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[1] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[1] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[1] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[2] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[2] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[2] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[2] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[2] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[3] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[3] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[3] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[3] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[3] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[4] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[4] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[4] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[4] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[4] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[5] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[5] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[5] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[5] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[5] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[6] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[6] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[6] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[6] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[6] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[7] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[7] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[7] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[7] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[7] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[8] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[8] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[8] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[8] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[8] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[9] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[9] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[9] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[9] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[9] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[10] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[10] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[10] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[10] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[10] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[11] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[11] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[11] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[11] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[11] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[12] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[12] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[12] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[12] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[12] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[13] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[13] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[13] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[13] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[13] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[14] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[14] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[14] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[14] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[14] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[15] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[15] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[15] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[15] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[15] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[16] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[16] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[16] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[16] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[16] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[17] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[17] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[17] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[17] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[17] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[18] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[18] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[18] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[18] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[18] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[19] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[19] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[19] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[19] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[19] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[20] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[20] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[20] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[20] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[20] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[21] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[21] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[21] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[21] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[21] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[22] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[22] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[22] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[22] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[22] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[23] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[23] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[23] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[23] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[23] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[24] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[24] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[24] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[24] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[24] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[25] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[25] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[25] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[25] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[25] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[26] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[26] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[26] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[26] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[26] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[27] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[27] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[27] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[27] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[27] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[28] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[28] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[28] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[28] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[28] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in transformer_blocks[29] output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[29] output[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[29] INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in transformer_blocks[29] INPUT[1]: shape=[1, 1, 4096] NaN=4096/4096
  *** NaN in transformer_blocks[29] INPUT[2]: shape=[1, 512] NaN=512/512
  *** NaN in norm_out output[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in norm_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368
  *** NaN in norm_out INPUT[1]: shape=[1, 512] NaN=512/512
  *** NaN in proj_out output[0]: shape=[1, 4608, 64] NaN=294912/294912
  *** NaN in proj_out INPUT[0]: shape=[1, 4608, 4096] NaN=18874368/18874368

100%|##########| 1/1 [01:09<00:00, 69.09s/it]
100%|##########| 1/1 [01:09<00:00, 69.09s/it]

Final latents NaN: True
Layers with NaN: ['image_projector', 'time_condition_embed', 'block0.norm1.norm', 'block0.norm1.linear', 'block0.norm1', 'block0.attn1.to_q', 'block0.attn1.to_k', 'block0.attn1.to_v', 'block0.attn1.norm_q', 'block0.attn1.norm_k', 'block0.attn1', 'block0.norm2', 'block0.norm2_context', 'block0.ff', 'transformer_blocks[0]', 'transformer_blocks[1]', 'transformer_blocks[2]', 'transformer_blocks[3]', 'transformer_blocks[4]', 'transformer_blocks[5]', 'transformer_blocks[6]', 'transformer_blocks[7]', 'transformer_blocks[8]', 'transformer_blocks[9]', 'transformer_blocks[10]', 'transformer_blocks[11]', 'transformer_blocks[12]', 'transformer_blocks[13]', 'transformer_blocks[14]', 'transformer_blocks[15]', 'transformer_blocks[16]', 'transformer_blocks[17]', 'transformer_blocks[18]', 'transformer_blocks[19]', 'transformer_blocks[20]', 'transformer_blocks[21]', 'transformer_blocks[22]', 'transformer_blocks[23]', 'transformer_blocks[24]', 'transformer_blocks[25]', 'transformer_blocks[26]', 'transformer_blocks[27]', 'transformer_blocks[28]', 'transformer_blocks[29]', 'norm_out', 'proj_out']
Done.

System Info

macOS (Apple Silicon), MPS backend
Python 3.12
PyTorch with MPS support
diffusers 0.37.0 (GlmImagePipeline)
torch 2.10.0
Model: zai-org/GLM-Image (bf16 safetensors)

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions