Conversation
|
Thanks for the PR! Some preliminary design questions and comments:
|
| self.register_to_config( | ||
| seq_len=seq_len, | ||
| num_inference_steps=num_inference_steps, | ||
| inject_start_token=inject_start_token, | ||
| ) |
There was a problem hiding this comment.
Generally we don't register default __call__ arguments to the config, but rather set them as default arguments to the __call__ method:
diffusers/src/diffusers/pipelines/ltx2/pipeline_ltx2.py
Lines 744 to 752 in d4f97d1
| *, | ||
| batch_size: int = 1, |
There was a problem hiding this comment.
diffusers pipelines usually don't set __call__ arguments to be keyword-only. (That's not to say that there are no arguments for it, but because other pipelines allow positional arguments I think the expectation is that discrete diffusion pipelines will allow them as well.)
| if seq_len is None: | ||
| seq_len = int(self.config.seq_len) | ||
| if num_inference_steps is None: | ||
| num_inference_steps = int(self.config.num_inference_steps) | ||
| if inject_start_token is None: | ||
| inject_start_token = bool(self.config.inject_start_token) |
There was a problem hiding this comment.
Following up on #12911 (comment), this logic could be removed if we don't register default arguments to the config.
| if infill_mask is not None: | ||
| if infill_mask.shape != (batch_size, seq_len): | ||
| raise ValueError( | ||
| f"`infill_mask` must have shape {(batch_size, seq_len)}, got {tuple(infill_mask.shape)}." | ||
| ) |
There was a problem hiding this comment.
I think input checking and exceptions should be moved to a check_inputs method, which is the usual practice for diffusers pipelines:
diffusers/src/diffusers/pipelines/flux2/pipeline_flux2.py
Lines 686 to 693 in d4f97d1
| return int(token_id) | ||
| return None | ||
|
|
||
| def _init_latents( |
There was a problem hiding this comment.
We usually name methods which sample latents from the prior distribution prepare_latents:
| if hasattr(self.scheduler, "forward_process") and getattr(self.scheduler, "forward_process") == "uniform": | ||
| # Uniform prior over token IDs. Mirror scheduler's exclude-mask behavior. | ||
| if getattr(self.scheduler, "exclude_mask_from_uniform", False) and hasattr( | ||
| self.scheduler, "_sample_uniform_tokens" | ||
| ): | ||
| return self.scheduler._sample_uniform_tokens( | ||
| torch.Size((batch_size, seq_len)), | ||
| device=device, | ||
| dtype=torch.long, | ||
| generator=generator, | ||
| ) | ||
| vocab_size = int(getattr(self.scheduler, "vocab_size", 0)) | ||
| if vocab_size <= 0: | ||
| raise ValueError("Scheduler must define `vocab_size` for uniform prior sampling.") | ||
| return torch.randint( | ||
| 0, vocab_size, (batch_size, seq_len), device=device, dtype=torch.long, generator=generator | ||
| ) |
There was a problem hiding this comment.
Suggestion: maybe it would be cleaner to define a scheduler method called (say) sample_prior which samples from the prior distribution based on the configured forward_process? So if self.forward_process == "uniform", we would call _sample_uniform_tokens under the hood in sample_prior to sample from a uniform prior distribution.
I think this would allow for more graceful support of other possible forward processes, and make the pipeline code cleaner (as most of the logic would be handled inside the scheduler).
- Attention mask: use 1/0 float instead of 0/-inf (matches official LLaDA2 code) - Sampling: add shift-right in top-p to preserve at least one token - Sampling: apply temperature before top-k/top-p filtering - Sampling: separate greedy branch for temperature=0 - LLaDA2Pipeline: update defaults to LLaDA2.1 quality mode (threshold=0.7, editing_threshold=0.5, max_post_steps=16, eos_early_stop=True) - LLaDA2Pipeline: remove duplicate _prepare_prompt_ids, reuse mixin's _prepare_input_ids - LLaDA2Pipeline: fix EXAMPLE_DOC_STRING (model ID, dtype, scheduler) - Simplify _extract_input_ids since return_dict=True - Replace deprecated torch_dtype with dtype in example scripts - Add all discrete diffusion pipeline/scheduler pages to docs toctree
These files were part of the merged llada2-support PR and will be replaced by main's versions during merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove BlockTokenDiffusion (folded into LLaDA2Pipeline in main) - TokenDiffusion, HybridTokenDiffusion, DFlash, SDAR: apply output_type, check_inputs first, 3.9+ annotations, progress bars, scheduler-owned stopping logic - SDAR: rename denoising_steps to num_inference_steps, remove attention_mask_mode, add add_noise/check_should_stop to scheduler - DFlash: add add_noise/check_should_stop to scheduler - TokenDiffusion: add enforce_fixed_masks to scheduler
…mask_mode, fix broken doc link
- Add kv_cache support: reset_kv_cache() at start, store_kv=True after each block - Apply subs parameterization in scheduler: mask token -> -inf, log_softmax, identity for unmasked - These match the official BD3LM sampling code
- check_inputs called first, before any processing - mask_token_id: vocab_size - 1 (not vocab_size) - Sigma computation moved to scheduler.compute_sigma() - EOS checking moved to scheduler.check_eos_finished() - Deduplicated p_x0_cache handling - Added 36 tests (10 pipeline + 26 scheduler)
What does this PR do?
Add experimental support for discrete token diffusion methods and pipeline
moved llada2 to its own PR: #13226
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.