[pipelines] fix SD3 crash with pre-computed prompt_embeds and num_images_per_prompt#13755
Open
zxuhan wants to merge 1 commit into
Open
[pipelines] fix SD3 crash with pre-computed prompt_embeds and num_images_per_prompt#13755zxuhan wants to merge 1 commit into
zxuhan wants to merge 1 commit into
Conversation
…ith `num_images_per_prompt` `StableDiffusion3Pipeline.encode_prompt` expands the encoded embeddings inside `_get_clip_prompt_embeds` / `_get_t5_prompt_embeds`, but skips that path when the user supplies `prompt_embeds` directly. The pipeline then multiplies `batch_size` by `num_images_per_prompt` for `prepare_latents`, so the latent batch and the transformer's `encoder_hidden_states` end up with mismatched shapes and the call dies inside the joint-attention block (huggingface#10712). Mirror SDXL's pattern by applying the same expansion to user-supplied `prompt_embeds` (and the matching `pooled_prompt_embeds` / negatives) at the end of `encode_prompt`. Propagated to the img2img, inpaint, controlnet, and PAG variants via `make fix-copies`. Adds a regression test that feeds `encode_prompt(num_images_per_prompt=k)` output back into the pipeline.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #10712.
StableDiffusion3Pipeline.__call__raises aRuntimeErrorwhen the caller passes pre-computedprompt_embedstogether withnum_images_per_prompt > 1:Root cause
encode_promptrelies on_get_clip_prompt_embedsand_get_t5_prompt_embedsto apply thenum_images_per_promptexpansion. Those helpers only run whenprompt_embeds is None, so user-supplied embeddings keep their original batch dimension.__call__then computeslatents = prepare_latents(batch_size * num_images_per_prompt, ...), CFG doubles that to2 * batch_size * num_images_per_prompt, whileencoder_hidden_statesstays at2 * batch_size. The joint-attention block fails when it concatenates the two.StableDiffusionXLPipelineis not affected because itsencode_promptapplies therepeat(1, num_images_per_prompt, 1).view(...)expansion unconditionally after the if/else branch.Fix
Apply the same expansion to user-supplied
prompt_embeds(and the matchingpooled_prompt_embeds,negative_prompt_embeds,negative_pooled_prompt_embeds) at the end ofencode_prompt. The change is propagated viamake fix-copiesto the img2img, inpaint, controlnet, and PAG variants:pipeline_stable_diffusion_3_img2img.pypipeline_stable_diffusion_3_inpaint.pycontrolnet_sd3/pipeline_stable_diffusion_3_controlnet.pycontrolnet_sd3/pipeline_stable_diffusion_3_controlnet_inpainting.pypag/pipeline_pag_sd_3.pypag/pipeline_pag_sd_3_img2img.pyAdds
test_pipeline_accepts_prompt_embeds_with_num_images_per_prompt, which feeds the output ofencode_prompt(num_images_per_prompt=k)back into the pipeline. Without the fix the test reproduces theRuntimeErrorfrom the issue; with the fix it passes.Before submitting
num_images_per_promptalready describes the intended behavior; this PR makes the implementation match it.Who can review?
@yiyixuxu @sayakpaul