feat: Add Motif-Video model and pipelines by tarekziade · Pull Request #13748 · huggingface/diffusers

tarekziade · 2026-05-14T06:43:42Z

What does this PR do?

This PR adds support for Motif-Video - a text-to-video (T2V) and image-to-video (I2V) diffusion model from Motif Technologies. The implementation includes the transformer architecture, both pipeline variants, guiding configurations, and comprehensive documentation.

Changes

New Files

Model: src/diffusers/models/transformers/transformer_motif_video.py - MotifVideoTransformer3DModel
Pipelines:
- src/diffusers/pipelines/motif_video/pipeline_motif_video.py - Text-to-Video
- src/diffusers/pipelines/motif_video/pipeline_motif_video_image2video.py - Image-to-Video
Output: src/diffusers/pipelines/motif_video/pipeline_output.py
Tests:
- tests/pipelines/motif_video/test_motif_video.py
- tests/pipelines/motif_video/test_motif_video_image2video.py
Documentation:
- docs/source/en/api/models/motif_video_transformer_3d.md
- docs/source/en/api/pipelines/motif_video.md

Key Features

Architecture: DiT-based transformer with T5Gemma2Encoder for text encoding
Flow Match: Uses FlowMatchEulerDiscreteScheduler
Guiding: Supports ClassifierFreeGuidance, SkipLayerGuidance, and AdaptiveProjectedGuidance
Video Processing: Wan-style VAE for video encoding/decoding

Version Requirements

transformers>=5.1.0 - Required for T5Gemma2Encoder (critical bug fix in PR #43633)
The pipeline includes a version check that raises a clear error with upgrade instructions if the transformers version is too old

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…dance support Add complete Motif Video implementation to diffusers: New Models: - Add MotifVideoTransformer3DModel with T5Gemma2Encoder for multimodal conditioning - Supports text-to-video and image-to-video generation with vision tower integration New Pipelines: - Add MotifVideoPipeline for text-to-video generation - Default resolution: 736x1280, 121 frames, 25 fps - Supports classifier-free guidance and AdaptiveProjectedGuidance - Add MotifVideoImage2VideoPipeline for image-to-video generation - First frame conditioning with vision encoder - Same defaults as T2V pipeline Enhanced Guidance: - Update AdaptiveProjectedGuidance with normalization_dims parameter - Support "spatial" normalization for 5D tensors (per-frame spatial normalization) - Support custom dimension lists for flexible normalization - Update AdaptiveProjectedMixGuidance with same parameter Documentation & Tests: - Add comprehensive API documentation for transformer and pipelines - Add test suites for both T2V and I2V pipelines - Register all new components in __init__ files - Add dummy objects for torch and transformers backends Total: 18 files changed, 3416 insertions(+), 2 deletions(-)

… into feat/motif-video

Co-authored-by: Beomgyu Kim <beomgyu.kim@motiftech.io>

- Use default Attention class with custom MotifVideoAttnProcessor2_0 - Inline cross-attention in transformer blocks - Use dispatch_attention_fn for backend support - Inherit AttentionMixin for attn_processors/set_attn_processor - Move TransformerBlockRegistry to _helpers.py - Add _repeated_blocks for regional compilation

… into feat/motif-video

tarekziade · 2026-05-14T06:44:19Z

@claude-2-serge can you do a final review?

github-actions · 2026-05-14T06:49:10Z

🤗 Serge says:

⚠️ Review failed: HTTPError: 400 Client Error: Bad Request for url: https://api.anthropic.com/v1/chat/completions

HuggingFaceDocBuilderDev · 2026-05-14T06:53:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tarekziade · 2026-05-14T06:55:45Z

@claude-2-serge can you do a final review?

github-actions

🤗 Serge says:

Overall this is a solid addition of the Motif-Video model and pipelines. The architecture, registration, tests, and docs are well-structured. I found one correctness bug (dead code from an unreachable branch that masks a missing negative-prompt expansion path), a few incorrect return-type annotations, and a missing newline at end of file.

Correctness

Dead code / unreachable branch in encode_prompt (both pipelines): The outer guard if negative_prompt_embeds is None and negative_prompt is not None: guarantees negative_prompt is not None inside the block, so the inner if negative_prompt is None: on line 328 (T2V) / line 340 (I2V) is dead code. More importantly, when a user passes pre-computed negative_prompt_embeds (not None), the duplication for num_videos_per_prompt is skipped entirely — the positive embeddings are always duplicated (lines 317-323) but negative embeddings are only duplicated inside the freshly-computed branch. With num_videos_per_prompt > 1 and pre-computed negative embeds, shapes will mismatch.
Incorrect return-type annotations: MotifVideoSingleTransformerBlock.forward is annotated -> torch.Tensor but returns a Tuple[torch.Tensor, torch.Tensor]. Same for MotifVideoRotaryPosEmbed.forward.

Style / Minor

docs/source/en/api/pipelines/motif_video.md is missing a trailing newline.

Dead code analysis (advisory)

Under the default config (enable_text_cross_attention_dual=False, enable_text_cross_attention_single=False, num_decoder_layers=0, image_embed_dim=None):

MotifVideoCrossAttention and MotifVideoCrossAttnProcessor2_0 are instantiated but self.cross_attn is always None in both block types, so the cross-attention forward path is never exercised.
MotifVideoImageProjection (self.image_embedder) is not created when image_embed_dim is None.
The decoder path (num_decoder_layers > 0) including decoder_hidden_states = hidden_states.clone() is never reached.

These are likely exercised by specific checkpoint configs (e.g. the published 2B model may set these), so they are advisory only.

31 LLM turns · 37 tool calls · 179.3s · 2606633 in / 6824 out tokens

github-actions · 2026-05-14T06:59:12Z

+        # Compute negative embeddings if needed
+        if negative_prompt_embeds is None and negative_prompt is not None:
+            # Prepare negative_prompt to match batch_size
+            if negative_prompt is None:


Bug: This if negative_prompt is None: branch is unreachable — the outer guard on line 326 already ensures negative_prompt is not None.

More importantly, when a user passes pre-computed negative_prompt_embeds (skipping this entire block), the negative embeddings are not duplicated for num_videos_per_prompt, while the positive embeddings always are (lines 317-323). This will cause a shape mismatch when num_videos_per_prompt > 1 and negative_prompt_embeds is pre-provided.

Suggested fix: move the negative-embed duplication (repeat / repeat_interleave) outside the if negative_prompt_embeds is None block so it applies to both freshly-computed and pre-provided negative embeddings, and remove the dead if negative_prompt is None: branch.

github-actions · 2026-05-14T06:59:12Z

+        # Compute negative embeddings if needed
+        if negative_prompt_embeds is None and negative_prompt is not None:
+            # Prepare negative_prompt to match batch_size
+            if negative_prompt is None:


Same dead-code / missing-duplication bug as in the T2V pipeline's encode_prompt. The inner if negative_prompt is None: is unreachable, and pre-provided negative_prompt_embeds won't be duplicated for num_videos_per_prompt.

github-actions · 2026-05-14T06:59:12Z

+        attention_mask: Optional[torch.Tensor] = None,
+        image_rotary_emb: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
+        image_embed_seq_len: int = 0,
+    ) -> torch.Tensor:


Incorrect return-type annotation: this method returns (hidden_states, encoder_hidden_states) at line 617, which is a Tuple[torch.Tensor, torch.Tensor], not torch.Tensor.

Suggested change

) -> torch.Tensor:

) -> Tuple[torch.Tensor, torch.Tensor]:

github-actions · 2026-05-14T06:59:12Z

+        self.rope_dim = rope_dim
+        self.theta = theta
+
+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:


Incorrect return-type annotation: this method returns (freqs_cos, freqs_sin) at line 496, which is a Tuple[torch.Tensor, torch.Tensor], not torch.Tensor.

Suggested change

def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:

def forward(self, hidden_states: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:

github-actions · 2026-05-14T06:59:12Z

+
+## MotifVideoPipelineOutput
+
+[[autodoc]] pipelines.motif_video.pipeline_output.MotifVideoPipelineOutput


Nit: missing trailing newline at end of file.

Ken Cheung and others added 30 commits April 23, 2026 07:08

Merge branch 'main' into feat/motif-video

44045b2

Merge branch 'main' into feat/motif-video

127810b

Merge branch 'main' into feat/motif-video

c2f1a14

Merge branch 'feat/motif-video' of github.com:waitingcheung/diffusers…

e3230cc

… into feat/motif-video

Remove linear quadratic

de3c7ff

Remove musicldm

9b0fe33

Update docstring

27b646a

Address vision_encoder comment

fb74918

Add copy source in I2V pippeline

f27c663

Refactor _get_prompt_embeds

5f1c157

Co-authored-by: Beomgyu Kim <beomgyu.kim@motiftech.io>

Fix a typo

80aa88a

Use base classes for scheduler and guider

ce30711

Implement MotifVideoAttention

005bec4

Update style and quality

070ff88

Fix a typo

f118214

Fix a typo

cd8e91f

Fix a typo

112761b

Update year

c3d7ca1

Address rope dtype

3bc4f31

Update docstring and remove frame_rate

5a7bdff

Address unused sigmas

3ef2018

Add available processors

860634c

Address copy from comment

d3069c6

Remove torch.no_grad()

35f26c8

Remove use_attention_mask

6a9ca5d

Merge branch 'main' into feat/motif-video

ed4b717

Merge branch 'feat/motif-video' of github.com:waitingcheung/diffusers…

033b3bd

… into feat/motif-video

Address inline cross-attention

26ff14c

waitingcheung and others added 16 commits May 8, 2026 07:42

Merge branch 'feat/motif-video' of github.com:waitingcheung/diffusers…

0e89d56

… into feat/motif-video

Fix style and quality

f2aab3c

Merge branch 'main' into feat/motif-video

b8301b8

Merge branch 'main' into feat/motif-video

d452fda

Add docs to toctree

9658b80

Merge branch 'main' into feat/motif-video

caabc7f

Fix docs location in toctree and add link in overview

652110b

Merge branch 'feat/motif-video' of github.com:waitingcheung/diffusers…

93e221b

… into feat/motif-video

Merge branch 'main' into feat/motif-video

f3ae9fa

Merge branch 'main' into feat/motif-video

ea4c290

Merge branch 'main' into feat/motif-video

d81e307

Merge branch 'main' into feat/motif-video

fddba29

Inline gradient checkpointing

d9e0584

Add _keep_in_fp32_modules for timestep_embedder

7289de9

Address num_decoder_layers comment

63e7713

Address guider is not None comment

3d15ca5

github-actions Bot added size/L PR with diff > 200 LOC documentation Improvements or additions to documentation models tests utils pipelines single-file hooks and removed size/L PR with diff > 200 LOC labels May 14, 2026

github-actions Bot requested changes May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Motif-Video model and pipelines#13748

feat: Add Motif-Video model and pipelines#13748
tarekziade wants to merge 102 commits into
huggingface:mainfrom
tarekziade:test/motif-video-clone-13551

tarekziade commented May 14, 2026

Uh oh!

tarekziade commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 14, 2026

Uh oh!

tarekziade commented May 14, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 14, 2026

Uh oh!

github-actions Bot May 14, 2026

Uh oh!

github-actions Bot May 14, 2026

Uh oh!

github-actions Bot May 14, 2026

Uh oh!

github-actions Bot May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
	def forward(self, hidden_states: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:


		## MotifVideoPipelineOutput

		[[autodoc]] pipelines.motif_video.pipeline_output.MotifVideoPipelineOutput No newline at end of file

Conversation

tarekziade commented May 14, 2026

What does this PR do?

Changes

New Files

Key Features

Version Requirements

Before submitting

Who can review?

Uh oh!

tarekziade commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 14, 2026

Uh oh!

tarekziade commented May 14, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Correctness

Style / Minor

Dead code analysis (advisory)

Uh oh!

github-actions Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants