Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
507fd9b
[Pipelines] AnyFlow: scaffold pipelines/anyflow + register all top-le…
Enderfga May 6, 2026
29229d7
[Schedulers] AnyFlow: add FlowMapEulerDiscreteScheduler
Enderfga May 6, 2026
2d1e39c
[Models] AnyFlow: add AnyFlowTransformer3DModel
Enderfga May 6, 2026
c0e8b12
[Pipelines] AnyFlow: add AnyFlowPipeline and AnyFlowCausalPipeline
Enderfga May 6, 2026
c650f70
[Docs] AnyFlow: add main pipeline documentation page
Enderfga May 6, 2026
3276d0a
[Auto/Scripts] AnyFlow: register AutoPipelineForText2Video + add conv…
Enderfga May 6, 2026
41b2d9e
[Quality] AnyFlow: ruff-format + regenerated dummy stubs
Enderfga May 6, 2026
74a89ae
[AnyFlow] address review feedback: bug fixes + DMD wording + EN/ZH tu…
Enderfga May 6, 2026
641ae61
[AnyFlow] rename Causal->FAR + explicit forward signature + dataclass…
Enderfga May 6, 2026
8710c4d
[AnyFlow] wire callback_on_step_end through inference_range + add chu…
Enderfga May 6, 2026
3bc25d1
[AnyFlow] Phase 2: split transformer + drop chunk_partition from conf…
Enderfga May 6, 2026
4f11943
[AnyFlow] Phase 3: convention compliance against .ai/AGENTS.md + .ai/…
Enderfga May 6, 2026
3d38c0c
[AnyFlow] FAR fast-test compat: rope 0-dim guard + flex_attention CPU…
Enderfga May 6, 2026
0aea63f
[AnyFlow] docs/code: paper-release tidy-up
Enderfga May 14, 2026
093cf75
[AnyFlow] docs: drop in official BibTeX (full author list)
Enderfga May 14, 2026
8da3679
Merge branch 'main' into add-anyflow-pipeline
Enderfga May 14, 2026
0df6c05
[AnyFlow] align with diffusers conventions + drop training-only code
Enderfga May 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,8 @@
title: Model accelerators and hardware
- isExpanded: false
sections:
- local: using-diffusers/anyflow
title: AnyFlow
- local: using-diffusers/helios
title: Helios
- local: using-diffusers/consisid
Expand Down Expand Up @@ -328,6 +330,10 @@
title: AceStepTransformer1DModel
- local: api/models/allegro_transformer3d
title: AllegroTransformer3DModel
- local: api/models/anyflow_transformer3d
title: AnyFlowTransformer3DModel
- local: api/models/anyflow_far_transformer3d
title: AnyFlowFARTransformer3DModel
- local: api/models/aura_flow_transformer2d
title: AuraFlowTransformer2DModel
- local: api/models/transformer_bria_fibo
Expand Down Expand Up @@ -504,6 +510,8 @@
- sections:
- local: api/pipelines/animatediff
title: AnimateDiff
- local: api/pipelines/anyflow
title: AnyFlow
- local: api/pipelines/aura_flow
title: AuraFlow
- local: api/pipelines/bria_3_2
Expand Down Expand Up @@ -731,6 +739,8 @@
title: EulerAncestralDiscreteScheduler
- local: api/schedulers/euler
title: EulerDiscreteScheduler
- local: api/schedulers/flow_map_euler_discrete
title: FlowMapEulerDiscreteScheduler
- local: api/schedulers/flow_match_euler_discrete
title: FlowMatchEulerDiscreteScheduler
- local: api/schedulers/flow_match_heun_discrete
Expand Down
45 changes: 45 additions & 0 deletions docs/source/en/api/models/anyflow_far_transformer3d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<!-- Copyright 2026 The AnyFlow Team, NVIDIA Corp., and The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the
License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# AnyFlowFARTransformer3DModel

The causal (FAR) 3D Transformer used by [`AnyFlowFARPipeline`](../pipelines/anyflow#anyflowfarpipeline) —
the FAR variant of [AnyFlow](https://huggingface.co/papers/2605.13724) (Yuchao Gu, Guian Fang et al., NUS
ShowLab × NVIDIA). It extends the v0.35.1 Wan2.1 backbone with three additions:

1. **FAR causal block-mask** via `torch.nn.attention.flex_attention`, supporting frame-level autoregressive
generation as introduced in [FAR (Gu et al., 2025)](https://arxiv.org/abs/2503.19325).
2. **Compressed-frame patch embedding** (`far_patch_embedding`) for context (already-generated) frames,
warm-started from the full-resolution `patch_embedding` at construction time via trilinear interpolation.
3. **Dual-timestep flow-map embedding** (same as
[`AnyFlowTransformer3DModel`](anyflow_transformer3d)) — every forward call conditions on both the source
timestep ``t`` and the target timestep ``r``.

The chunk schedule (`chunk_partition`) is **not** baked into the model config. It is a per-call argument to
`forward`, so the same checkpoint handles different `num_frames` configurations without retraining.

```python
from diffusers import AnyFlowFARTransformer3DModel

# Causal AnyFlow checkpoint (FAR):
transformer = AnyFlowFARTransformer3DModel.from_pretrained(
"nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers", subfolder="transformer"
)
```

## AnyFlowFARTransformer3DModel

[[autodoc]] AnyFlowFARTransformer3DModel

## AnyFlowFARTransformerOutput

[[autodoc]] models.transformers.transformer_anyflow.AnyFlowFARTransformerOutput
36 changes: 36 additions & 0 deletions docs/source/en/api/models/anyflow_transformer3d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
<!-- Copyright 2026 The AnyFlow Team, NVIDIA Corp., and The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the
License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# AnyFlowTransformer3DModel

The bidirectional 3D Transformer used by [`AnyFlowPipeline`](../pipelines/anyflow#anyflowpipeline). It is the
v0.35.1 Wan2.1 backbone with one structural change: the timestep embedder is replaced by
``AnyFlowDualTimestepTextImageEmbedding``, so every forward call conditions on both the source timestep
``t`` and the target timestep ``r``. This is the embedding required to learn the flow map
:math:`\Phi_{r\leftarrow t}` introduced in
[AnyFlow](https://huggingface.co/papers/2605.13724) (Yuchao Gu, Guian Fang et al., NUS ShowLab × NVIDIA).

For frame-level autoregressive (FAR causal) generation, use
[`AnyFlowFARTransformer3DModel`](anyflow_far_transformer3d) instead.

```python
from diffusers import AnyFlowTransformer3DModel

# Bidirectional AnyFlow checkpoint (T2V):
transformer = AnyFlowTransformer3DModel.from_pretrained(
"nvidia/AnyFlow-Wan2.1-T2V-1.3B-Diffusers", subfolder="transformer"
)
```

## AnyFlowTransformer3DModel

[[autodoc]] AnyFlowTransformer3DModel
216 changes: 216 additions & 0 deletions docs/source/en/api/pipelines/anyflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
<!-- Copyright 2026 The AnyFlow Team, NVIDIA Corp., and The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the
License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<a href="https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders/lora_pipeline.py">
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-supported-green">
</a>
</div>
</div>

# AnyFlow

[AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation](https://huggingface.co/papers/2605.13724) by Yuchao Gu, Guian Fang and collaborators at [NUS ShowLab](https://sites.google.com/view/showlab) in collaboration with NVIDIA.

*Few-step video generation has been significantly advanced by consistency models. However, their performance often degrades in any-step video diffusion models due to the fixed-point formulation. To address this limitation, we present AnyFlow, the first any-step video diffusion distillation framework built on flow maps. Instead of learning only the mapping z_t → z_0, AnyFlow learns transitions z_t → z_r over arbitrary time intervals, enabling a single model to adapt to different inference budgets. We design an improved forward flow map training recipe that fine-tunes pretrained video diffusion models into flow map models, and introduce Flow Map Backward Simulation to enable on-policy distillation for flow map models. Extensive experiments across both bidirectional and causal architectures, at scales ranging from 1.3B to 14B, on text-to-video and image-to-video tasks demonstrate that AnyFlow outperforms consistency-based baselines while preserving high fidelity and flexible sampling under varying step budgets.*

The original training code is at [`NVlabs/AnyFlow`](https://github.com/NVlabs/AnyFlow). The project page is at [nvlabs.github.io/AnyFlow](https://nvlabs.github.io/AnyFlow).

The following AnyFlow checkpoints are supported:

| Checkpoint | Backbone | Description |
|------------|----------|-------------|
| [`nvidia/AnyFlow-Wan2.1-T2V-1.3B-Diffusers`](https://huggingface.co/nvidia/AnyFlow-Wan2.1-T2V-1.3B-Diffusers) | Wan2.1 1.3B | Bidirectional T2V, lightweight |
| [`nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers`](https://huggingface.co/nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers) | Wan2.1 14B | Bidirectional T2V, full quality |
| [`nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers`](https://huggingface.co/nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers) | FAR + Wan2.1 1.3B | Causal T2V / I2V / V2V |
| [`nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers`](https://huggingface.co/nvidia/AnyFlow-FAR-Wan2.1-14B-Diffusers) | FAR + Wan2.1 14B | Causal T2V / I2V / V2V |

All four are grouped under the [`nvidia/anyflow`](https://huggingface.co/collections/nvidia/anyflow) Hugging Face collection.

> [!TIP]
> Choose `AnyFlowPipeline` for traditional bidirectional text-to-video generation. Choose `AnyFlowFARPipeline` for streaming I2V, video continuation (V2V), or any setup that benefits from frame-by-frame autoregressive sampling.

> [!TIP]
> AnyFlow supports any-step sampling: a single distilled checkpoint can be evaluated at 1, 2, 4, 8, 16... NFE without retraining. Quality scales monotonically with steps in our benchmarks.

### Optimizing Memory and Inference Speed

<hfoptions id="optimization">
<hfoption id="memory">

```py
import torch
from diffusers import AnyFlowPipeline
from diffusers.hooks import apply_group_offloading

pipe = AnyFlowPipeline.from_pretrained(
"nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers", torch_dtype=torch.bfloat16
)
apply_group_offloading(pipe.transformer, onload_device="cuda", offload_type="leaf_level")
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
```

</hfoption>
<hfoption id="inference speed">

```py
import torch
from diffusers import AnyFlowPipeline

pipe = AnyFlowPipeline.from_pretrained(
"nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers", torch_dtype=torch.bfloat16
).to("cuda")
pipe.transformer = torch.compile(pipe.transformer, mode="max-autotune-no-cudagraphs")
```

</hfoption>
</hfoptions>

### Generation with AnyFlow (Bidirectional T2V)

<hfoptions id="anyflow-bidi">
<hfoption id="usage">

```py
import torch
from diffusers import AnyFlowPipeline
from diffusers.utils import export_to_video

pipe = AnyFlowPipeline.from_pretrained(
"nvidia/AnyFlow-Wan2.1-T2V-1.3B-Diffusers", torch_dtype=torch.bfloat16
).to("cuda")

prompt = "A red panda eating bamboo in a forest, cinematic lighting"
video = pipe(prompt, num_inference_steps=4, num_frames=33).frames[0]
export_to_video(video, "out.mp4", fps=16)
```

</hfoption>
</hfoptions>

### Generation with AnyFlow (FAR Causal)

The causal pipeline selects between T2V / I2V / V2V via the ``context_sequence`` argument: pass ``None``
for plain text-to-video, or a dict with a ``"raw"`` key holding a video tensor of shape
``(B, C, T, H, W)`` with ``T = 4n + 1`` to condition on existing frames. Use a single conditioning frame
for I2V and a longer clip for V2V continuation.

> [!IMPORTANT]
> `AnyFlowFARPipeline.default_chunk_partition = [1, 3, 3, 3, 3, 3, 3, 2]` (sum 21) is matched to the
> released checkpoints' canonical 81 raw frames (21 latent frames at the VAE temporal stride of 4). When
> you change `num_frames`, you must also pass a matching `chunk_partition` summing to
> `(num_frames - 1) // 4 + 1`, otherwise the pipeline raises an `AssertionError`.

<hfoptions id="anyflow-far">
<hfoption id="t2v">

```py
import torch
from diffusers import AnyFlowFARPipeline
from diffusers.utils import export_to_video

pipe = AnyFlowFARPipeline.from_pretrained(
"nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers", torch_dtype=torch.bfloat16
).to("cuda")

video = pipe(
prompt="A cat surfing a wave, sunset",
num_inference_steps=4,
num_frames=81,
).frames[0]
export_to_video(video, "out.mp4", fps=16)
```

</hfoption>
<hfoption id="i2v">

```py
import numpy as np
import torch
from diffusers import AnyFlowFARPipeline
from diffusers.utils import export_to_video, load_image

pipe = AnyFlowFARPipeline.from_pretrained(
"nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers", torch_dtype=torch.bfloat16
).to("cuda")

# Wrap the conditioning image as a one-frame video tensor: (1, 3, 1, H, W) in [0, 1].
first_frame = load_image("path/to/first_frame.png").resize((832, 480))
arr = np.asarray(first_frame).astype("float32") / 255.0 # (480, 832, 3)
context_tensor = torch.from_numpy(arr).permute(2, 0, 1).unsqueeze(0).unsqueeze(2).to("cuda")

video = pipe(
prompt="a cat walks across a sunlit lawn",
context_sequence={"raw": context_tensor},
num_inference_steps=4,
num_frames=81,
).frames[0]
export_to_video(video, "out.mp4", fps=16)
```

</hfoption>
<hfoption id="v2v">

```py
import numpy as np
import torch
from diffusers import AnyFlowFARPipeline
from diffusers.utils import export_to_video, load_video

pipe = AnyFlowFARPipeline.from_pretrained(
"nvidia/AnyFlow-FAR-Wan2.1-1.3B-Diffusers", torch_dtype=torch.bfloat16
).to("cuda")

# Context clip — 9 raw frames map to 3 latent frames (9 = 4·2 + 1, 3 = 2 + 1).
context_frames = load_video("path/to/context.mp4")[:9]
arr = np.stack([np.asarray(f.resize((832, 480))) for f in context_frames]).astype("float32") / 255.0
context_tensor = torch.from_numpy(arr).permute(3, 0, 1, 2).unsqueeze(0).to("cuda") # (1, 3, 9, 480, 832)

video = pipe(
prompt="continue the story",
context_sequence={"raw": context_tensor},
num_inference_steps=4,
num_frames=81,
# Override chunk_partition so the first chunk covers exactly the 3 latent context frames.
chunk_partition=[3, 3, 3, 3, 3, 3, 3],
).frames[0]
export_to_video(video, "out.mp4", fps=16)
```

</hfoption>
</hfoptions>

## Notes

- Classifier-free guidance is fused into the released checkpoints, so inference does not run a second guided forward pass. Keep the default `guidance_scale=1.0` unless your own checkpoint requires otherwise.
- `FlowMapEulerDiscreteScheduler` is general-purpose. You can attach it to any flow-map-distilled checkpoint via `from_pretrained(..., scheduler=FlowMapEulerDiscreteScheduler.from_config(...))`.
- `AnyFlowPipeline` uses [`AnyFlowTransformer3DModel`](../models/anyflow_transformer3d) (bidirectional). `AnyFlowFARPipeline` uses [`AnyFlowFARTransformer3DModel`](../models/anyflow_far_transformer3d), which adds a compressed-frame patch embedding and the FAR causal block-mask.
- LoRA loading is supported via `WanLoraLoaderMixin`, the same mixin used by the upstream Wan pipelines.
- For training recipes (forward flow-map training and on-policy distillation), refer to the original AnyFlow training framework at [`NVlabs/AnyFlow`](https://github.com/NVlabs/AnyFlow); training is out of scope for diffusers.

## AnyFlowPipeline

[[autodoc]] AnyFlowPipeline
- all
- __call__

## AnyFlowFARPipeline

[[autodoc]] AnyFlowFARPipeline
- all
- __call__

## AnyFlowPipelineOutput

[[autodoc]] pipelines.anyflow.pipeline_output.AnyFlowPipelineOutput
28 changes: 28 additions & 0 deletions docs/source/en/api/schedulers/flow_map_euler_discrete.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<!-- Copyright 2026 The AnyFlow Team, NVIDIA Corp., and The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the
License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# FlowMapEulerDiscreteScheduler

`FlowMapEulerDiscreteScheduler` is an Euler-style sampler designed for flow-map-distilled diffusion
models. Flow-map models learn arbitrary-interval transitions $\mathbf{z}_t \to \mathbf{z}_r$ rather than
the fixed $\mathbf{z}_t \to \mathbf{z}_0$ mapping of consistency models. Both endpoints of the step are
caller-provided, which is what enables any-step sampling: a single distilled checkpoint can be evaluated at
1, 2, 4, 8, 16... NFE without retraining.

The scheduler was introduced in
[AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation](https://huggingface.co/papers/2605.13724)
and ships with the `AnyFlowPipeline` and `AnyFlowFARPipeline` integrations, but it is not
AnyFlow-specific — any flow-map-distilled checkpoint can use it.

## FlowMapEulerDiscreteScheduler

[[autodoc]] FlowMapEulerDiscreteScheduler
Loading
Loading