Skip to content

fix(qwen-tts): install flash-attn on cuda13 images (#9293)#10293

Open
localai-bot wants to merge 1 commit into
masterfrom
fix/9293-qwen-tts-cuda13-flashattn
Open

fix(qwen-tts): install flash-attn on cuda13 images (#9293)#10293
localai-bot wants to merge 1 commit into
masterfrom
fix/9293-qwen-tts-cuda13-flashattn

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Re: #9293 (the qwen-tts backend part)

Problem

On CUDA-13 the Qwen TTS backend logs flash-attn warnings and falls back to SDPA. The cuda12 image installs flash-attn via requirements-cublas12-after.txt, but there was no requirements-cublas13-after.txt, so cuda13-qwen-tts never installed flash_attn.

Fix

Add backend/python/qwen-tts/requirements-cublas13-after.txt containing flash-attn, mirroring the cublas12 variant.

Scope

This addresses the qwen-backend performance part of the issue. The separate vllm-omni-fails-entirely part (related to #8536) needs its own reproduction and is not covered here. Additive requirements change, not built locally.

Assisted-by: claude:claude-opus-4-8 [Claude Code]

The cuda12 image installs flash-attn via requirements-cublas12-after.txt,
but there was no cublas13 equivalent, so cuda13-qwen-tts never installed
flash_attn and fell back to SDPA with warnings. Add the matching
requirements-cublas13-after.txt.

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants