Skip to content

[SM6.10] Implement LinAlg groupshared Builtins#8254

Open
V-FEXrt wants to merge 5 commits intomainfrom
user/vfexrt/linalg-groupshared-ops
Open

[SM6.10] Implement LinAlg groupshared Builtins#8254
V-FEXrt wants to merge 5 commits intomainfrom
user/vfexrt/linalg-groupshared-ops

Conversation

@V-FEXrt
Copy link
Collaborator

@V-FEXrt V-FEXrt commented Mar 10, 2026

Implements the LinAlg Load/Store/Accumulate to memory groupshared builtins following the pattern of the previous builtins in alignment with the spec. https://github.com/microsoft/hlsl-specs/blob/main/proposals/0035-linalg-matrix.md

Copy link
Member

@hekota hekota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few comments about the tests.

@hekota
Copy link
Member

hekota commented Mar 10, 2026

Also the PR title and description do not mention that this is for LinAlg Matrix.

Implements the Load/Store/Accumulate to memory groupshared
builtins following the pattern of the previous builtins
@V-FEXrt V-FEXrt force-pushed the user/vfexrt/linalg-groupshared-ops branch from 1f308f9 to e520d0a Compare March 10, 2026 20:45
@V-FEXrt V-FEXrt changed the title [SM6.10] Implement groupshared Builtins [SM6.10] Implement LinAlg groupshared Builtins Mar 10, 2026
Copy link
Contributor

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I think we can do this without adding the array overload, by simply accepting an address space 3 pointer to the overload type. This will require a new type code (perhaps $gs_x1), instead of $x1, mapped to a new macro, instead of EXT in GetOpFunc, which creates an address space 3 pointer to the element type.

There are problems with the array overload mechanism, as defined here:

  1. The array size is not captured to the overload name, yet it impacts the parameter type. This would lead to conflicting DXIL overloads when different array sizes are used in the same module.
  2. You can't define a set of allowed element types for the array, so any arbitrary type could be allowed (even non-scalar types).
  3. You don't capture the pointer address space constraint anywhere, so it could easily be created with the wrong address space. This would also create overload conflicts.
  4. You don't constrain it to a pointer, so you could use a by-value array argument, if that's what's passed in to create the function. This would also create overload conflicts.

If we decide instead that we want to keep the array overload, we should do it differently, more like the vector overload, where the accepted scalar component type set is still defined. This could use the same pattern as vector, using the [ character to denote an array type overload, overloads to the left indicating allowed non-array types, and ones to the right for what's allowed inside an array. Support could be limited at first to requiring [ to be left-most at first (not allowing array and non-array types). This would still require the new type code to indicate the address space 3 pointer to the element type.

My strong preference at this time is to simply use a pointer to the first array element (like an array-to-pointer decay). This would be a relatively simple change, requiring no new broader overload mechanisms.

@github-project-automation github-project-automation bot moved this from New to In progress in HLSL Roadmap Mar 11, 2026
@V-FEXrt
Copy link
Collaborator Author

V-FEXrt commented Mar 11, 2026

I'm fine not adding the array overload type. It was a means to an end and I can't say I was exactly happy with it. I remember exploring a new $ directive without much luck but I likely just got the details wrong. Specificially I need to enforce addrspace(3) and also that the overload generation is the correct shape per the spec. I though I was having issues with one of those but I can't recall them now. I'll poke around at making this change.

@tex3d
Copy link
Contributor

tex3d commented Mar 11, 2026

I'm fine not adding the array overload type. It was a means to an end and I can't say I was exactly happy with it. I remember exploring a new $ directive without much luck but I likely just got the details wrong. Specificially I need to enforce addrspace(3) and also that the overload generation is the correct shape per the spec. I though I was having issues with one of those but I can't recall them now. I'll poke around at making this change.

Yeah, the spec says [Ty] * addrspace(3), which implies a pointer to a component type [Ty], not an array with a fixed size.

I can help with adding the new overload type usage in the DXIL op parameter definition (the $ directive).

} else if (TypeSlot == TS_UDT) {
}

switch (TypeSlot) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function no longer has any functional changes, this is now a pure refactor. In fact, there are no other changes in this file that are not a result of generated code changes.

That's fine, but I thought it might be good to call that out. Think of this as just a review note.

void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromDescriptor(out LinAlgMatrix ret, in ByteAddressBuffer buf, in uint offset, in uint stride, in uint layout);
void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromDescriptor(out LinAlgMatrix ret, in RWByteAddressBuffer buf, in uint offset, in uint stride, in uint layout);
void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromMemory(out LinAlgMatrix ret, in int GroupSharedMem, in uint offset, in uint stride, in uint layout);
void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromMemory(out LinAlgMatrix ret, groupshared numeric[] memory, in uint offset, in uint stride, in uint layout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The experimental builtins above use LinAlg for the HLSL component set usable with the LinAlg operations. We should be able to use that here, unless our plan was to remove it for some reason.

The same would apply to the rest of the HLSL component templates in these builtins, so perhaps this change can be made in a separate follow-up PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd prefer the put that into a separate PR. Do you happen to know where LinAlg<> is defined? The acceptable values are (slightly) different between CoopVec and LinAlg so it would need to be updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is about the component type enumeration, I checked in ComponentType enum into linalg.h with the same values for as we have for cooperative vectors, and filed an issue for Chris to update the spec to match: microsoft/hlsl-specs#779

# _gs is extra metadata info on the overload. It has no impact on
# the ordering rules so it can be erased for the check.
# $x_gs7 -> $x7
ty = i.llvm_type.replace("_gs", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to be a bit more strict:

Suggested change
ty = i.llvm_type.replace("_gs", "")
ty = i.llvm_type.replace("$x_gs", "$x")

"col_major": "AR_QUAL_COLMAJOR",
"row_major": "AR_QUAL_ROWMAJOR",
"groupshared": "AR_QUAL_GROUPSHARED",
"groupshared": "AR_QUAL_IN | AR_QUAL_GROUPSHARED",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this? It doesn't seem necessary or correct to me.

Copy link
Collaborator Author

@V-FEXrt V-FEXrt Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/V-FEXrt/DirectXShaderCompiler/blob/main/tools/clang/lib/Sema/SemaHLSL.cpp#L1950

This function asserts if none of IN/OUT/IN_OUT/REF are set. We could update that function instead but it seem like one of those should infact be set so I put it here.

Happy to take a recommendation but we do have to do something here :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that function was going to need to be updated to handle this qualifier.

Since we now have AR_QUAL_REF, that seems like a better match, because the groupshared parameter semantics don't match those of AR_QUAL_IN, though it seems like we should still be setting some qualifier on the parameter.

This made me take a closer look at the AST, and it doesn't look right, see new comment.

Comment on lines 905 to 906
+ ", ".join([get_type_at_index(index, unwrap_pointer) for index, unwrap_pointer in index_tuple])
+ "});\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: long line could be wrapped (my fault, since I provided the original change)

Suggested change
+ ", ".join([get_type_at_index(index, unwrap_pointer) for index, unwrap_pointer in index_tuple])
+ "});\n"
+ ", ".join([
get_type_at_index(index, unwrap_pointer)
for index, unwrap_pointer in index_tuple
]) + "});\n"

// REQUIRES: dxil-1-10
// RUN: %dxc -T lib_6_10 -E main %s -ast-dump-implicit | FileCheck %s

// CHECK: FunctionDecl {{.*}} implicit used __builtin_LinAlg_MatrixStoreToMemory 'void (__builtin_LinAlgMatrix {{.*}}, float const __attribute__((address_space(3))) (&)[64], unsigned int, unsigned int, unsigned int)' extern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the const here reveals a problem from adding AR_QUAL_IN. I'd like to see the expansion of the SharedArr parameter passed to the function. Is there an RValue cast? I think this should be treated more like a reference, and there shouldn't be a const qualifier.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried AR_QUAL_REF but that didn't change the const. Only diff is && became & is that what you expected?

 IN: memory 'float const __attribute__((address_space(3))) (&)[64]'
REF: memory 'float const __attribute__((address_space(3))) (&&)[64]'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah that's strange. Now that I think about it, maybe the const refers to the address as opposed to the memory it references. Also, I think groupshared might be adding the reference elsewhere, and adding REF might double that up, which would be bad. I think that function just needs to be updated to handle groupshared. I think that function is only used for intrinsics, which would be why the earlier change that added groupshared support for user function parameters didn't update it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think const does apply to the elements. And (&&) is an rvalue reference, so I think that's wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you consider the "right" thing here? Happy to work on the function but I need to know what to work towards lol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another question, is this particular issue something we can push off to an immediate follow up? Helena has a couple PRs blocked on this going in and I have some work that would be nice to build on top of Helena's PRs.

Getting this PR in unclogs the system, but I am committed to getting the signature right here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants