Skip to content

173 implement device grouped gemm fixed nk for rdna4#3668

Closed
bidlekm wants to merge 18 commits intoROCm:developfrom
StreamHPC:173-implement-device_grouped_gemm_fixed_nk-for-rdna4
Closed

173 implement device grouped gemm fixed nk for rdna4#3668
bidlekm wants to merge 18 commits intoROCm:developfrom
StreamHPC:173-implement-device_grouped_gemm_fixed_nk-for-rdna4

Conversation

@bidlekm
Copy link

@bidlekm bidlekm commented Jan 28, 2026

Proposed changes

Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

Copy link
Contributor

@zsotakal zsotakal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A few minor things, mostly related to code quality.

arg.c_element_op_);
};

// const auto tail_num =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find this tail number logic anywhere in the existing code. Also there are seemingly related commented out sections below. Can you explain a bit what was the motivation behind it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldnt find any logic either, i wanted to observe its functioning this way

}
}

// private:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this comes from the XDL variant, but this line serves no purpose other than (in the best case) annoy the reader or (in the worst case) make them wonder the reason behind it and lose time on it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was useful for me to locate the struct's variables when i was quickly scrolling through the code. but if you think it is counterproductive to have it there in general, i could remove it.

bool pass = true;
for(int kbatch : kbatches)
{
pass &= ck::profiler::profile_grouped_gemm_fixed_nk_impl<ADataType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function call may throw. What happens if it does? Should you maybe pass the fail_if_no_supported_instances as an argument and early return without throwing? Or at least use a try-catch block here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail_if_no_supported_instances is used for modifying the return value of the function if certain conditions are met , the preexisting XDL implementation did not have that so introducing it now would result in an interface breaking change for an already existing functionality. The original profiler's throws are not handled here. Based on that fact, the error handling responsibility is elsewhere and introducing it here could result in hiding some config errors and changing the test behaviour

@chris-tsiaousis-hpc
Copy link
Contributor

Added some comments you might want to address. Great work overall! :)

@ammallya
Copy link
Contributor

ammallya commented Feb 3, 2026

Imported to ROCm/rocm-libraries

@ammallya ammallya closed this Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants