173 implement device grouped gemm fixed nk for rdna4#3668
173 implement device grouped gemm fixed nk for rdna4#3668bidlekm wants to merge 18 commits intoROCm:developfrom
Conversation
…ement-device_grouped_gemm_fastgelu-for-rdna4
zsotakal
left a comment
There was a problem hiding this comment.
Looks good! A few minor things, mostly related to code quality.
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_wmma_fixed_nk.hpp
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_wmma_fixed_nk.hpp
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_wmma_fixed_nk.hpp
Show resolved
Hide resolved
| arg.c_element_op_); | ||
| }; | ||
|
|
||
| // const auto tail_num = |
There was a problem hiding this comment.
I couldn't find this tail number logic anywhere in the existing code. Also there are seemingly related commented out sections below. Can you explain a bit what was the motivation behind it?
There was a problem hiding this comment.
I couldnt find any logic either, i wanted to observe its functioning this way
include/ck/tensor_operation/gpu/grid/gridwise_gemm_wmma_cshuffle_v3_common.hpp
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_wmma_fixed_nk.hpp
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_wmma_fixed_nk.hpp
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_wmma_fixed_nk.hpp
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_wmma_fixed_nk.hpp
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| // private: |
There was a problem hiding this comment.
I can see this comes from the XDL variant, but this line serves no purpose other than (in the best case) annoy the reader or (in the worst case) make them wonder the reason behind it and lose time on it.
There was a problem hiding this comment.
it was useful for me to locate the struct's variables when i was quickly scrolling through the code. but if you think it is counterproductive to have it there in general, i could remove it.
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_wmma_fixed_nk.hpp
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_wmma_fixed_nk.hpp
Show resolved
Hide resolved
...ry/tensor_operation_instance/gpu/grouped_gemm/device_grouped_gemm_wmma_fixed_nk_instance.hpp
Show resolved
Hide resolved
...ry/tensor_operation_instance/gpu/grouped_gemm/device_grouped_gemm_wmma_fixed_nk_instance.hpp
Show resolved
Hide resolved
...ry/tensor_operation_instance/gpu/grouped_gemm/device_grouped_gemm_wmma_fixed_nk_instance.hpp
Show resolved
Hide resolved
| bool pass = true; | ||
| for(int kbatch : kbatches) | ||
| { | ||
| pass &= ck::profiler::profile_grouped_gemm_fixed_nk_impl<ADataType, |
There was a problem hiding this comment.
This function call may throw. What happens if it does? Should you maybe pass the fail_if_no_supported_instances as an argument and early return without throwing? Or at least use a try-catch block here?
There was a problem hiding this comment.
fail_if_no_supported_instances is used for modifying the return value of the function if certain conditions are met , the preexisting XDL implementation did not have that so introducing it now would result in an interface breaking change for an already existing functionality. The original profiler's throws are not handled here. Based on that fact, the error handling responsibility is elsewhere and introducing it here could result in hiding some config errors and changing the test behaviour
|
Added some comments you might want to address. Great work overall! :) |
|
Imported to ROCm/rocm-libraries |
Proposed changes
Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered