Skip to content

[CK_TILE]: PreshuffleB + PreshuffleBQuant for ABQuant pipeline#3687

Closed
ErwinTerpstra wants to merge 8 commits intodevelopfrom
eterpstr/preshuffle-bquant-for-abquant-preshuffleb
Closed

[CK_TILE]: PreshuffleB + PreshuffleBQuant for ABQuant pipeline#3687
ErwinTerpstra wants to merge 8 commits intodevelopfrom
eterpstr/preshuffle-bquant-for-abquant-preshuffleb

Conversation

@ErwinTerpstra
Copy link
Contributor

Proposed changes

Implement BQuantPreshuffle option for the ABQuant PreshuffleB pipeline.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

@ErwinTerpstra ErwinTerpstra force-pushed the eterpstr/preshuffle-bquant-for-abquant-preshuffleb branch from 2b0aeef to 99264c6 Compare January 30, 2026 16:25
@ErwinTerpstra ErwinTerpstra changed the title [CK_TILE]: BQuant preshuffle for PreshuffleB ABQuant pipeline [CK_TILE]: PreshuffleB + PreshuffleBQuant for ABQuant pipeline Jan 30, 2026
static constexpr index_t NPerBlock = BlockGemmShape::kN;
static constexpr index_t KPerBlock = BlockGemmShape::kK;

static constexpr index_t NPerBlockBQ = (BQuantGroupSize::kN <= KPerBlock)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I know why we put kN <= KPerblock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted, that should indeed be NPerBlock.

Copy link
Contributor

@wj-laskowski wj-laskowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, no comments

@ammallya
Copy link
Contributor

ammallya commented Feb 3, 2026

Imported to ROCm/rocm-libraries

@ammallya ammallya closed this Feb 3, 2026
ThomasNing pushed a commit to ROCm/rocm-libraries that referenced this pull request Feb 10, 2026
## Proposed changes

Implement BQuantPreshuffle option for the ABQuant PreshuffleB pipeline.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [X] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [X] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [X] I have added inline documentation which enables the maintainers
with understanding the motivation
- [X] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [X] I have run `clang-format` on all changed files
- [X] Any dependent changes have been merged



---
🔁 Imported from
[ROCm/composable_kernel#3687](ROCm/composable_kernel#3687)
🧑‍💻 Originally authored by @ErwinTerpstra

---------

Co-authored-by: Erwin Terpstra <erwin.terpstra@streamhpc.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants