[SM6.10] Implement LinAlg groupshared Builtins#8254
Conversation
hekota
left a comment
There was a problem hiding this comment.
Looks good! Just a few comments about the tests.
tools/clang/test/CodeGenDXIL/hlsl/linalg/builtins/matrixaccumulatetomemory/nominal.hlsl
Outdated
Show resolved
Hide resolved
tools/clang/test/CodeGenDXIL/hlsl/linalg/builtins/matrixaccumulatetomemory/nominal.hlsl
Outdated
Show resolved
Hide resolved
|
Also the PR title and description do not mention that this is for LinAlg Matrix. |
Implements the Load/Store/Accumulate to memory groupshared builtins following the pattern of the previous builtins
1f308f9 to
e520d0a
Compare
There was a problem hiding this comment.
First, I think we can do this without adding the array overload, by simply accepting an address space 3 pointer to the overload type. This will require a new type code (perhaps $gs_x1), instead of $x1, mapped to a new macro, instead of EXT in GetOpFunc, which creates an address space 3 pointer to the element type.
There are problems with the array overload mechanism, as defined here:
- The array size is not captured to the overload name, yet it impacts the parameter type. This would lead to conflicting DXIL overloads when different array sizes are used in the same module.
- You can't define a set of allowed element types for the array, so any arbitrary type could be allowed (even non-scalar types).
- You don't capture the pointer address space constraint anywhere, so it could easily be created with the wrong address space. This would also create overload conflicts.
- You don't constrain it to a pointer, so you could use a by-value array argument, if that's what's passed in to create the function. This would also create overload conflicts.
If we decide instead that we want to keep the array overload, we should do it differently, more like the vector overload, where the accepted scalar component type set is still defined. This could use the same pattern as vector, using the [ character to denote an array type overload, overloads to the left indicating allowed non-array types, and ones to the right for what's allowed inside an array. Support could be limited at first to requiring [ to be left-most at first (not allowing array and non-array types). This would still require the new type code to indicate the address space 3 pointer to the element type.
My strong preference at this time is to simply use a pointer to the first array element (like an array-to-pointer decay). This would be a relatively simple change, requiring no new broader overload mechanisms.
|
I'm fine not adding the array overload type. It was a means to an end and I can't say I was exactly happy with it. I remember exploring a new |
Yeah, the spec says I can help with adding the new overload type usage in the DXIL op parameter definition (the |
| } else if (TypeSlot == TS_UDT) { | ||
| } | ||
|
|
||
| switch (TypeSlot) { |
There was a problem hiding this comment.
Since this function no longer has any functional changes, this is now a pure refactor. In fact, there are no other changes in this file that are not a result of generated code changes.
That's fine, but I thought it might be good to call that out. Think of this as just a review note.
| void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromDescriptor(out LinAlgMatrix ret, in ByteAddressBuffer buf, in uint offset, in uint stride, in uint layout); | ||
| void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromDescriptor(out LinAlgMatrix ret, in RWByteAddressBuffer buf, in uint offset, in uint stride, in uint layout); | ||
| void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromMemory(out LinAlgMatrix ret, in int GroupSharedMem, in uint offset, in uint stride, in uint layout); | ||
| void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromMemory(out LinAlgMatrix ret, groupshared numeric[] memory, in uint offset, in uint stride, in uint layout); |
There was a problem hiding this comment.
The experimental builtins above use LinAlg for the HLSL component set usable with the LinAlg operations. We should be able to use that here, unless our plan was to remove it for some reason.
The same would apply to the rest of the HLSL component templates in these builtins, so perhaps this change can be made in a separate follow-up PR.
There was a problem hiding this comment.
Yeah I'd prefer the put that into a separate PR. Do you happen to know where LinAlg<> is defined? The acceptable values are (slightly) different between CoopVec and LinAlg so it would need to be updated
There was a problem hiding this comment.
If this is about the component type enumeration, I checked in ComponentType enum into linalg.h with the same values for as we have for cooperative vectors, and filed an issue for Chris to update the spec to match: microsoft/hlsl-specs#779
utils/hct/hctdb.py
Outdated
| # _gs is extra metadata info on the overload. It has no impact on | ||
| # the ordering rules so it can be erased for the check. | ||
| # $x_gs7 -> $x7 | ||
| ty = i.llvm_type.replace("_gs", "") |
There was a problem hiding this comment.
Would be better to be a bit more strict:
| ty = i.llvm_type.replace("_gs", "") | |
| ty = i.llvm_type.replace("$x_gs", "$x") |
| "col_major": "AR_QUAL_COLMAJOR", | ||
| "row_major": "AR_QUAL_ROWMAJOR", | ||
| "groupshared": "AR_QUAL_GROUPSHARED", | ||
| "groupshared": "AR_QUAL_IN | AR_QUAL_GROUPSHARED", |
There was a problem hiding this comment.
Why this? It doesn't seem necessary or correct to me.
There was a problem hiding this comment.
https://github.com/V-FEXrt/DirectXShaderCompiler/blob/main/tools/clang/lib/Sema/SemaHLSL.cpp#L1950
This function asserts if none of IN/OUT/IN_OUT/REF are set. We could update that function instead but it seem like one of those should infact be set so I put it here.
Happy to take a recommendation but we do have to do something here :)
There was a problem hiding this comment.
Oh yeah, that function was going to need to be updated to handle this qualifier.
Since we now have AR_QUAL_REF, that seems like a better match, because the groupshared parameter semantics don't match those of AR_QUAL_IN, though it seems like we should still be setting some qualifier on the parameter.
This made me take a closer look at the AST, and it doesn't look right, see new comment.
utils/hct/hctdb_instrhelp.py
Outdated
| + ", ".join([get_type_at_index(index, unwrap_pointer) for index, unwrap_pointer in index_tuple]) | ||
| + "});\n" |
There was a problem hiding this comment.
Nit: long line could be wrapped (my fault, since I provided the original change)
| + ", ".join([get_type_at_index(index, unwrap_pointer) for index, unwrap_pointer in index_tuple]) | |
| + "});\n" | |
| + ", ".join([ | |
| get_type_at_index(index, unwrap_pointer) | |
| for index, unwrap_pointer in index_tuple | |
| ]) + "});\n" |
| // REQUIRES: dxil-1-10 | ||
| // RUN: %dxc -T lib_6_10 -E main %s -ast-dump-implicit | FileCheck %s | ||
|
|
||
| // CHECK: FunctionDecl {{.*}} implicit used __builtin_LinAlg_MatrixStoreToMemory 'void (__builtin_LinAlgMatrix {{.*}}, float const __attribute__((address_space(3))) (&)[64], unsigned int, unsigned int, unsigned int)' extern |
There was a problem hiding this comment.
I think the const here reveals a problem from adding AR_QUAL_IN. I'd like to see the expansion of the SharedArr parameter passed to the function. Is there an RValue cast? I think this should be treated more like a reference, and there shouldn't be a const qualifier.
There was a problem hiding this comment.
Tried AR_QUAL_REF but that didn't change the const. Only diff is && became & is that what you expected?
IN: memory 'float const __attribute__((address_space(3))) (&)[64]'
REF: memory 'float const __attribute__((address_space(3))) (&&)[64]'
There was a problem hiding this comment.
Oh, yeah that's strange. Now that I think about it, maybe the const refers to the address as opposed to the memory it references. Also, I think groupshared might be adding the reference elsewhere, and adding REF might double that up, which would be bad. I think that function just needs to be updated to handle groupshared. I think that function is only used for intrinsics, which would be why the earlier change that added groupshared support for user function parameters didn't update it.
There was a problem hiding this comment.
Actually, I think const does apply to the elements. And (&&) is an rvalue reference, so I think that's wrong.
There was a problem hiding this comment.
What would you consider the "right" thing here? Happy to work on the function but I need to know what to work towards lol
There was a problem hiding this comment.
Maybe another question, is this particular issue something we can push off to an immediate follow up? Helena has a couple PRs blocked on this going in and I have some work that would be nice to build on top of Helena's PRs.
Getting this PR in unclogs the system, but I am committed to getting the signature right here.
Implements the LinAlg Load/Store/Accumulate to memory groupshared builtins following the pattern of the previous builtins in alignment with the spec. https://github.com/microsoft/hlsl-specs/blob/main/proposals/0035-linalg-matrix.md