[SM6.10][Bugfix] MultiplyAdd - Convert Bias vector to output vector type#8394
[SM6.10][Bugfix] MultiplyAdd - Convert Bias vector to output vector type#8394hekota wants to merge 5 commits intomicrosoft:mainfrom
Conversation
If the Bias vector type on MultiplytAdd is different than the output vector type, convert it first to the output vector type before calling `dx.op.linAlgMatVecMulAdd` op. The interpretation for the bias vector is always going to be set to the output vector interpretation on the op. To accommodate this, MultiplyAdd functions have been split into two variants - one where bias interpretation matches the output vector interpretation and no conversion is necessary, and second where the types don't match and the bias vector is first converted to the output vector type before passing it into the __builtin_LinAlg_MatrixVectorMultiplyAdd. Adds`hlsl::__detail::TypeTraits` struct to enable mapping of HLSL scalar types to component type enum values. Fixes microsoft#8390
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
llvm-beanz
left a comment
There was a problem hiding this comment.
A couple comments, but none that need to block this.
| MultiplyAdd(Matrix<MatrixDT, M, K, MatrixUse::A, MatrixScope::Thread> MatrixA, | ||
| InterpretedVector<InputElTy, VecK, InputInterp> InterpVec, | ||
| vector<BiasElTy, M> Bias) { | ||
| vector<OutputElTy, M> Result; |
There was a problem hiding this comment.
I think the approach of having two overloads is fine, but we could alternatively have less SFINAE if we had just one overload where the convert was under an if statement. The if condition would be what you have in the SFINAE now so it would always be a constant evaluated expression and the branch would always get optimized away.
Either approach is fine.
There was a problem hiding this comment.
I was just thinking, what about if the Convert template function had either a) do-nothing overload for when CompType is the same (no conversion necessary), or b) an if around the builtin operation.
Then you could use the templated Convert in these methods instead of the builtin directly, which would auto-resolve to a no-op when the type matches, instead of adding the logic to each of these methods.
I slightly prefer the a) separate overload, just because it makes the initial IR gen cleaner and you can test for that. When it's just Convert, it's less ugly to only double Convert for the no-op case, than doubling all overloads that use Convert.
There was a problem hiding this comment.
I had a version with the convert under an if statement, but for packed data types that was reporting a warning about vector truncation. I could eliminate the warning by changing the __builtin_LinAlg_MatrixVectorMultiplyAdd signature to allow for vectors of 3 different sizes, but that is not correct either - the bias count and result count must always be the same. That's why I split all of the MultiplyAdd functions.
I will try the suggestion of using the templated Convert and having do-nothing Convert overload for no conversion path. I think that should work and look nicer.
There was a problem hiding this comment.
I pushed an update that adds a no-op Convert version in case DestTy==OriginTy and removed the double versions of MultiplyAdd.
I did however find an issue with Convert that it does not support packed type vector sizes that are not a multiple of the number of elements per scalar. I used a workaround to make it work and filed #8418 to fix it.
tex3d
left a comment
There was a problem hiding this comment.
I think this looks good, but wouldn't it be even cleaner to overload Convert to do-nothing when types match, then just always call that?
Removed double versions of MultiplyAdd; these now call the Convert function, which will or will not result in a on-op depending on whether the bias component type matches the output type or not. Found an issue that Convert does not support packed type vector sizes that are not a multiple of the number of elements per scalar. Implemented a workaround and file issue microsoft#8418 to fix this.
If the
Biasvector type/interpretation onMultiplyAdddiffers from the output vector type, convert it first to the output vector type before passing it into the__builtin_LinAlg_MatrixVectorMultiplyAddbuilt-in function (which gets translated todx.op.linAlgMatVecMulAddop). The bias vector interpretation is always going to be the same as the output vector interpretation on the call.To accommodate this, each
MultiplyAddnow has two SFINAE-constrained variants:Adds
hlsl::__detail::TypeTraitsstruct to enable mapping of HLSL scalar types to component type enum values.Fixes #8390