[LTX2] LTX2 Video VAE implementation by prishajain1 · Pull Request #346 · AI-Hypercomputer/maxdiffusion

prishajain1 · 2026-03-05T05:58:46Z

This PR adds Video VAE component for LTX-2. This implementation ensures numerical and shapes parity with the reference PyTorch/Diffusers logic.

New files added:

autoencoder_kl_ltx2.py : Video VAE component for LTX2
test_video_vae_ltx2.py : unittests for Video VAE

github-actions · 2026-03-05T05:58:54Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

entrpn · 2026-03-10T16:18:58Z

src/maxdiffusion/tests/ltx2/test_video_vae_ltx2.py

@@ -0,0 +1,284 @@
+"""


can we add a test for end to end encode/decode correctness? Like this one https://github.com/AI-Hypercomputer/maxdiffusion/blob/main/src/maxdiffusion/tests/wan_vae_test.py#L488

It doesn't need to run in the github runner, but at least it can be ran manually for validation.

Ran it and achieved an ssim_score of 0.9985
I will add this test in the incoming t2v pipeline PR as this test required loading vae weights.

mbohlool · 2026-03-12T19:28:09Z

src/maxdiffusion/models/ltx2/autoencoder_kl_ltx2.py

+    self.tile_sample_stride_width = tile_sample_stride_width or self.tile_sample_stride_width
+    self.tile_sample_stride_num_frames = tile_sample_stride_num_frames or self.tile_sample_stride_num_frames
+
+  def blend_v(self, a: jax.Array, b: jax.Array, blend_extent: int) -> jax.Array:


These for-loops are making jax tracing time huge, gemini suggests a more jax compatible solution for these three methods:

def blend_v(self, a: jax.Array, b: jax.Array, blend_extent: int) -> jax.Array: blend_extent = min(a.shape[2], b.shape[2], blend_extent) if blend_extent <= 0: return b # Create broadcastable blending weights: (1, 1, blend_extent, 1, 1) y = jnp.arange(blend_extent, dtype=a.dtype).reshape(1, 1, -1, 1, 1) val = a[:, :, -blend_extent:, :, :] * (1.0 - y / blend_extent) + \ b[:, :, :blend_extent, :, :] * (y / blend_extent) return b.at[:, :, :blend_extent, :, :].set(val) def blend_h(self, a: jax.Array, b: jax.Array, blend_extent: int) -> jax.Array: blend_extent = min(a.shape[3], b.shape[3], blend_extent) if blend_extent <= 0: return b # Create broadcastable blending weights: (1, 1, 1, blend_extent, 1) x = jnp.arange(blend_extent, dtype=a.dtype).reshape(1, 1, 1, -1, 1) val = a[:, :, :, -blend_extent:, :] * (1.0 - x / blend_extent) + \ b[:, :, :, :blend_extent, :] * (x / blend_extent) return b.at[:, :, :, :blend_extent, :].set(val) def blend_t(self, a: jax.Array, b: jax.Array, blend_extent: int) -> jax.Array: blend_extent = min(a.shape[1], b.shape[1], blend_extent) if blend_extent <= 0: return b # Create broadcastable blending weights: (1, blend_extent, 1, 1, 1) x = jnp.arange(blend_extent, dtype=a.dtype).reshape(1, -1, 1, 1, 1) val = a[:, -blend_extent:, :, :, :] * (1.0 - x / blend_extent) + \ b[:, :blend_extent, :, :, :] * (x / blend_extent) return b.at[:, :blend_extent, :, :, :].set(val)

mbohlool · 2026-03-12T19:29:47Z

src/maxdiffusion/models/ltx2/autoencoder_kl_ltx2.py

+      self.per_channel_scale2 = None
+
+    if timestep_conditioning:
+      self.scale_shift_table = nnx.Param(jax.random.normal(rngs.params(), (4, in_channels)) / (in_channels**0.5))


please add dtype to jax.random.normal call or it will be defaulted to float32 or float64.

mbohlool · 2026-03-12T19:31:23Z

src/maxdiffusion/models/ltx2/autoencoder_kl_ltx2.py

+    # Compute mean of squared values along channel dimension.
+    mean_sq = jnp.mean(jnp.square(x), axis=channel_dim, keepdims=True)
+    rms = jnp.sqrt(mean_sq + self.eps)
+    return x / rms


better to use jax.lax.rsqrt:

return x * jax.lax.rsqrt(mean_sq + self.eps)

mbohlool · 2026-03-12T19:33:41Z

src/maxdiffusion/models/ltx2/autoencoder_kl_ltx2.py

+  ):
+    self.stride = _canonicalize_tuple(stride, 3, "stride")
+    self.group_size = (in_channels * self.stride[0] * self.stride[1] * self.stride[2]) // out_channels
+


nit: add assert self.group_size > 0?

prishajain1 requested a review from entrpn as a code owner March 5, 2026 05:58

prishajain1 changed the title ~~LTX2 Video VAE implementation~~ [LTX2] LTX2 Video VAE implementation Mar 6, 2026

entrpn reviewed Mar 10, 2026

View reviewed changes

prishajain1 requested a review from entrpn March 11, 2026 18:30

mbohlool requested changes Mar 12, 2026

View reviewed changes

Add LTX2 Video VAE implementation

8971606

prishajain1 force-pushed the prisha/ltx2_vae branch from d1d4bf6 to 8971606 Compare March 13, 2026 04:21

prishajain1 requested a review from mbohlool March 13, 2026 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LTX2] LTX2 Video VAE implementation#346

[LTX2] LTX2 Video VAE implementation#346
prishajain1 wants to merge 1 commit intomainfrom
prisha/ltx2_vae

prishajain1 commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

entrpn Mar 10, 2026

Uh oh!

prishajain1 Mar 11, 2026

Uh oh!

mbohlool Mar 12, 2026

Uh oh!

mbohlool Mar 12, 2026

Uh oh!

mbohlool Mar 12, 2026

Uh oh!

mbohlool Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

prishajain1 commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

entrpn Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

prishajain1 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

mbohlool Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

mbohlool Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

mbohlool Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

mbohlool Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants