Skip to content

Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.#835

Open
copybara-service[bot] wants to merge 1 commit intodevfrom
test_868146247

Conversation

@copybara-service
Copy link

Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.

@copybara-service copybara-service bot force-pushed the test_868146247 branch 2 times, most recently from 7b55d41 to a814aa4 Compare February 16, 2026 11:55
@copybara-service copybara-service bot force-pushed the test_868146247 branch 2 times, most recently from 0ad7b78 to 0ec1821 Compare February 27, 2026 10:59
…ask distribution, increase parallelism on decode, and use double the registers for the core of flash attention.

PiperOrigin-RevId: 868146247
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant