Execution Tests: Add min precision test cases to the long vector test by alsepkow · Pull Request #8260 · microsoft/DirectXShaderCompiler

alsepkow · 2026-03-11T01:07:52Z

This PR extends the SM 6.9 long vector execution tests to cover HLSL min precision types (min16float, min16int, min16uint). These types are always available — D3D12_SHADER_MIN_PRECISION_SUPPORT only reports whether hardware actually uses reduced precision, not whether the types compile — so no device capability check is needed and the tests live in the existing DxilConf_SM69_Vectorized_Core class alongside other types.

Note: I wasn't able to find any existing min precision HLK tests. Unclear if we have coverage.

Key design decisions

Full-precision buffer I/O: Min precision types have implementation-defined buffer storage width, so we use full-precision types (float/int/uint) for all Load/Store operations via the IO_TYPE/IO_OUT_TYPE shader defines, with explicit casts to/from the min precision compute type. This ensures deterministic data layout regardless of the device implementation.

Half-precision tolerances: Validation compares results in fp16 space using HLSLHalf_t ULP tolerances. Since min precision guarantees at least 16-bit, fp16 tolerances are a correct upper bound — devices computing at higher precision will produce more accurate results, not less.

Test coverage mirrors existing patterns:

min16float mirrors HLSLHalf_t (float/trig/math/comparison/dot/cast/derivative/wave/quad/load-store)
min16int mirrors int16_t (arithmetic/bitwise/comparison/reduction/cast/wave/quad/load-store)
min16uint mirrors uint16_t (arithmetic/bitwise/comparison/cast/wave/quad/load-store)

Wave and quad op support: Wave ops (WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First, WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) are tested for all three min precision types, mirroring the ops supported by their 16-bit equivalents. The wave op shader helpers use #ifdef MIN_PRECISION guards to store results via IO_OUT_TYPE for deterministic buffer layout without changing DXIL for existing non-min-precision tests.

Excluded operations:

Signed div/mod on min16int: HLSL does not support signed integer division on min precision types
Bit shifting on min16int/min16uint: Not supported for min precision types
FP specials (INF/NaN/denorm): min precision types do not support them

Resolves #7780

All tests require the rawBufferVectorLoad/Store fix from : #8274
The array accessor and wave/quad op tests for min precision require the optimizer fix from: #8269

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Add device support helper, wrapper types, input data sets, type registration, and validation for min16float, min16int, min16uint. - doesDeviceSupportMinPrecision() checks D3D12 MinPrecisionSupport - HLSLMin16Float_t/HLSLMin16Int_t/HLSLMin16Uint_t wrapper structs (32-bit storage, matching DXIL layout without -enable-16bit-types) - Input data constrained to 16-bit representable range - DATA_TYPE registrations and isFloatingPointType/isMinPrecisionType traits - doValuesMatch overloads: min16float compares in half-precision space (reuses CompareHalfULP/CompareHalfEpsilon), integers use exact match - TrigonometricValidation specializations matching HLSLHalf_t tolerances Part of: microsoft#7780 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add DxilConf_SM69_Vectorized_MinPrecision test class with HLK_TEST_MINP and HLK_WAVEOP_TEST_MINP macros. Mirrors 16-bit counterpart coverage (HLSLHalf_t/int16_t/uint16_t) minus documented exclusions. - New test class with Kits.Specification = Device.Graphics.D3D12.DXILCore.ShaderModel69.MinPrecision - setupClass skips when device lacks min precision support - ~160 test entries across 3 types (min16float/min16int/min16uint) - MakeDifferent overloads in ShaderOpArith.xml (not gated by __HLSL_ENABLE_16_BIT since min precision is always available) - Excluded: FP specials, AsType, Cast, bit-manipulation ops Part of: microsoft#7780 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Make all but one conversion operator explicit per wrapper type to avoid C2666 ambiguity with built-in arithmetic operators. Matches HLSLHalf_t pattern: one implicit conversion to the natural type (float/int32_t/uint32_t), all others explicit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The implicit operator float() combined with implicit constructors from int/uint32_t created ambiguity for expressions like 'A + 4': the compiler could not choose between member operator+(HLSLMin16Float_t) via constructor and built-in float+int via conversion. Making the int/uint constructors explicit eliminates the member operator+ path for int literals while preserving T(0) direct construction and implicit float conversion for std:: math functions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

HLSLMin16Int_t: add uint32_t and uint64_t constructors for static_cast<T>(UINT) and static_cast<T>(size_t) patterns used in shift masking and wave ops. Add operator~() for bitwise NOT in WaveMultiPrefixBit ops. HLSLMin16Uint_t: add operator~() for the same reason. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

UnaryMathAbs: extend unsigned check to include HLSLMin16Uint_t (std::is_unsigned_v is false for class types, so abs was called with ambiguous overloads via implicit operator uint32_t). MaskShiftAmount: change constexpr to const since wrapper types are not literal types (no constexpr constructors). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add ~27 Cast test entries (CastToBool, CastToInt16, CastToInt32, CastToInt64, CastToUint16/32/64, CastToFloat16/32) for all three min precision types. The generic Cast templates work via the single implicit conversion operator on each wrapper type — C-style casts chain through it (e.g. (int32_t)min16float goes float->int32_t). Remove explicit conversion operators (operator double, operator int32_t, etc.) that were not exercised since Cast tests were not previously included and no other code paths use them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

WaveMultiPrefixBitAnd/BitOr/BitXor use the any_int type set (g_AnyIntCT) which is defined as {int16, int32, int64, uint16, uint32, uint64} and does not include min precision integer types (min16int, min16uint). Remove the 6 invalid test entries and the now-unused operator~() from both integer wrapper types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move all min precision test entries (min16float, min16int, min16uint) from the separate DxilConf_SM69_Vectorized_MinPrecision class into DxilConf_SM69_Vectorized_Core, using HLK_TEST/HLK_WAVEOP_TEST macros. Remove the HLK_TEST_MINP and HLK_WAVEOP_TEST_MINP macro definitions, the DxilConf_SM69_Vectorized_MinPrecision class, and the doesDeviceSupportMinPrecision utility function since min precision support checking is not required. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Min precision types (min16float, min16int, min16uint) are hints that allow hardware to use any precision >= the specified minimum, making buffer storage width implementation-defined. Add IO_TYPE/IO_OUT_TYPE compiler defines that map min precision types to their full-precision equivalents (float, int, uint) for buffer Load/Store operations. For all other types, IO_TYPE equals TYPE and IO_OUT_TYPE equals OUT_TYPE. This ensures deterministic buffer data layout regardless of the device's min precision implementation, while still testing min precision computation via explicit casts between the I/O types and the min precision types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

HLSL does not support signed integer division on minimum-precision types. The compiler rejects these with: 'signed integer division is not supported on minimum-precision types, cast to int to use 32-bit division'. Remove the Divide and Modulus test entries for HLSLMin16Int_t. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace min16float input values that are not exactly representable in float16 with values that are. This avoids precision mismatches between CPU-side expected value computation (float32) and GPU-side min precision results, where the cast to min16float rounds values to the nearest float16 representation. Key changes: - Default1: -0.01f -> -0.03125f (exact power-of-2 fraction) - Positive: 0.01f -> 0.03125f, 5531.0f -> 5504.0f, 331.233f -> 331.25f, 3250.01f -> 3250.0f - RangeHalfPi/RangeOne: replaced with float16-exact fractions covering the same ranges Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adjust input values so that arithmetic results (multiply, mad, subtract, left-shift, wave prefix products) do not overflow 16-bit integer range. Min precision types compute at >= 16 bits, so results that overflow at 16 bits differ from the 32-bit expected values. min16uint changes: - Default1: reduced large values (699->199, 1023->200) so products and wave prefix products fit in uint16 - Default1: ensured all values >= Default2 to avoid subtract underflow (1->3, 6->10, 0->22) - BitShiftRhs: reduced large shifts (13->12, 14->12, 15->12) so shifted values fit in uint16 min16int changes: - BitShiftRhs: reduced large shifts (13->11, 14->11, 15->14) so shifted values fit in int16 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Wave and quad intrinsics (WaveReadLaneAt, WaveReadLaneFirst, WaveActiveSum/Min/Max/Product/AllEqual, WavePrefixSum/Product, WaveMultiPrefixSum/Product, WaveMatch, QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) do not support min precision types (min16float, min16int, min16uint). The DXIL wave/quad shuffle operations operate on 32-bit or 64-bit register slots and do not handle 16-bit min precision payloads. Removes 48 test entries (16 per min precision type) and adds explanatory comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The dot product tolerance computation was using float32 ULPs for HLSLMin16Float_t, but the GPU may compute at float16 precision. With NUM=256 elements the accumulated error exceeds the float32-based epsilon. Use HLSLHalf_t::GetULP to compute half-precision ULPs for min16float, matching the approach already used for HLSLHalf_t. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-03-14T06:45:40Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff f62b9b4be05f76adb41134e3abd9323be71e68f4 470302b8b01fa350eafe64215fb65e380ea14ab7 -- tools/clang/unittests/HLSLExec/LongVectorTestData.h tools/clang/unittests/HLSLExec/LongVectors.cpp

View the diff from clang-format here.

diff --git a/tools/clang/unittests/HLSLExec/LongVectorTestData.h b/tools/clang/unittests/HLSLExec/LongVectorTestData.h
index 2e8458b9..f77bab92 100644
--- a/tools/clang/unittests/HLSLExec/LongVectorTestData.h
+++ b/tools/clang/unittests/HLSLExec/LongVectorTestData.h
@@ -718,8 +718,8 @@ INPUT_SET(InputSet::Zero, 0);
 INPUT_SET(InputSet::BitShiftRhs, 1, 6, 3, 0, 9, 3, 8, 8, 8, 8);
 INPUT_SET(InputSet::SelectCond, 0, 1);
 INPUT_SET(InputSet::AllOnes, 1);
-INPUT_SET(InputSet::WaveMultiPrefixBitwise, 0x0, 0x1, 0x3, 0x4, 0x10, 0x12,
-          0xF, 0x7FFF);
+INPUT_SET(InputSet::WaveMultiPrefixBitwise, 0x0, 0x1, 0x3, 0x4, 0x10, 0x12, 0xF,
+          0x7FFF);
 END_INPUT_SETS()
 
 #undef BEGIN_INPUT_SETS

Check this box to apply formatting changes to this branch.

tools/clang/unittests/HLSLExec/LongVectors.cpp

tools/clang/unittests/HLSLExec/LongVectorTestData.h

Cast operations have different input and output types (e.g. min16float input with int32 output), so each side needs its own IO type mapping. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Enables constexpr on MaskShiftAmount's ShiftMask local variable, restoring the original constexpr qualifier that was downgraded to const. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

alsepkow · 2026-03-17T00:37:59Z

Discussion: Shift operator semantics for min precision integer types

While working on these tests, I investigated how << and >> work on min16int/min16uint and found a potential semantic ambiguity worth discussing with the team.

How shifts are implemented

Shift operators aren't intrinsics — they're built-in binary operators handled through Clang's Sema/CodeGen layers:

Sema validation: SemaHLSL.cpp classifies shifts as "bitwise" ops requiring integral types (BinaryOperatorKindIsBitwise → BinaryOperatorKindRequiresIntegrals → IsBasicKindIntegral). min16int and min16uint have BPROP_INTEGER set, so they pass validation. min16float is correctly rejected.
Shift amount wrapping: HLSL wraps shift amounts like OpenCL — the RHS is masked by (bit_width - 1). This happens in CGExprScalar.cpp:EmitShl/EmitShr:
```
if (CGF.getLangOpts().OpenCL || CGF.getLangOpts().HLSL)
    RHS = Builder.CreateAnd(RHS, GetWidthMinusOneValue(Ops.LHS, RHS), "shl.mask");
```

The ambiguity

min16int has AST Width = 16 (ASTContext.cpp:1683), producing LLVM IR type i16 (CodeGenTypes.cpp:437). So GetWidthMinusOneValue returns 15, and the shift amount is masked with & 15 (mod 16).

However, in min precision mode (UseMinPrecision=true), the hardware executes on 32-bit registers. The & 15 mask is baked into the IR before any min-precision-to-32-bit promotion. This means:

The shift amount is clamped to [0, 15] based on the declared 16-bit type
But the actual shift executes on a 32-bit value

For left shift, this is arguably fine — bits above 15 are "extra precision." For right shift, it's murkier — if the 32-bit register has bits set above bit 15 from prior operations, a right shift masked to [0, 15] won't clear them the way a true 16-bit right shift would.

Compare to int32_t where the mask is & 31 and execution width matches. For min precision, the wrapping width (16) doesn't necessarily match the execution width (potentially 32).

Test data is safe

The BitShiftRhs input set for int16_t/uint16_t (reused by Min16Int_t/Min16UInt_t) uses values {1, 6, 3, 0, 9, 3, 12, 13, 14, 15} — all within [0, 15], so we avoid the ambiguous boundary. This was intentional.

Existing test coverage

Sema tests (scalar-operators.hlsl): verify type rules for min16 shifts ✅
FileCheck tests for shift IR output on min precision: none ❌
Execution tests for min precision shifts: this PR adds the first ones ✅

Questions for the team

Is the & 15 mask correct for min precision types, or should it be & 31 to match the actual execution width?
Should we add a boundary-probing test (e.g., shift by 16) to document/pin down the current behavior?
Is this a known design decision or a gap that should be tracked as a separate issue?

These input sets are not referenced by any test entry since wave ops are excluded for min precision types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Only define -DMIN_PRECISION, -DIO_TYPE, and -DIO_OUT_TYPE for min precision test types. Shader templates use #ifdef MIN_PRECISION to gate the load-with-cast paths, leaving non-min-precision shaders completely unchanged. Fallback #ifndef defines ensure IO_OUT_TYPE resolves to OUT_TYPE for Store calls when MIN_PRECISION is not set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Wave, quad, splitdouble, frexp, and modf stores will never be reached by min precision test types, so IO_OUT_TYPE is unnecessary. Keep it only for the final main store and derivative stores which are exercised by min precision tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move min precision IO_TYPE/IO_OUT_TYPE/MIN_PRECISION defines from getCompilerOptionsString into a dedicated dispatchMinPrecisionTest function that passes them via AdditionalCompilerOptions. This matches the existing pattern used by dispatchWaveOpTest and keeps the shared compiler options builder clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Use if constexpr on the C++ type instead of strcmp on HLSL type strings. Cleaner and resolved at compile time. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add IOTypeString field to DataType and MIN_PRECISION_DATA_TYPE macro. Remove standalone getIOTypeString template function. The IO type mapping now lives alongside the other type metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

tools/clang/unittests/HLSLExec/ShaderOpArith.xml

tools/clang/unittests/HLSLExec/LongVectors.cpp

Bit shifting is not supported for min precision data types (min16int, min16uint). Remove LeftShift/RightShift test entries and their associated BitShiftRhs input data sets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add wave and quad op test entries for min16float, min16int, and min16uint long vector types, mirroring the ops supported by their 16-bit equivalents. Test entries added: - Min16Float: 12 wave ops + 4 quad ops - Min16Int: 15 wave ops (includes WaveMultiPrefixBit*) + 4 quad ops - Min16Uint: 15 wave ops (includes WaveMultiPrefixBit*) + 4 quad ops Infrastructure changes: - ShaderOpArith.xml: Add #ifdef MIN_PRECISION guards to 7 wave op Store calls so they use IO_OUT_TYPE for min precision buffer I/O while keeping OUT_TYPE for standard types (no DXIL change for existing tests) - LongVectors.cpp: Add dispatchMinPrecisionWaveOpTest combining WaveSize with IO_TYPE/MIN_PRECISION compiler options, and HLK_MIN_PRECISION_WAVEOP_TEST macro - LongVectorTestData.h: Add WaveMultiPrefixBitwise input sets for HLSLMin16Int_t and HLSLMin16Uint_t - Fix ambiguous operator& in waveMultiPrefixBitOr for min precision types Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Rename the shader defines and C++ field to better communicate their purpose: these specify the full-precision buffer storage type for Load/Store operations, not a generic I/O type. - IO_TYPE -> BUFFER_TYPE - IO_OUT_TYPE -> BUFFER_OUT_TYPE - IOTypeString -> BufferTypeString Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Extract duplicated WaveSize computation from runWaveOpTest and runMinPrecisionWaveOpTest into a shared getWaveSize() method. Update section comments to say 'mirrors applicable ops' since not all ops from the 16-bit equivalent types are supported for min precision. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add LeftShift and RightShift test entries for min16int and min16uint. Both produce valid min-precision DXIL (shl/ashr/lshr i16 with 4-bit shift masking). ReverseBits, CountBits, FirstBitHigh, FirstBitLow are excluded — DXC promotes min precision to i32 before calling these DXIL intrinsics, so they don't actually test min precision behavior. Infrastructure changes: - LongVectorTestData.h: Add Bitwise and BitShiftRhs input sets for HLSLMin16Int_t and HLSLMin16Uint_t matching int16_t/uint16_t names. Values constrained to 16-bit safe range. - LongVectorTestData.h: Add compound assignment operators (<<=, >>=, |=, &=, ^=) and unary ~ to both wrapper types to resolve ambiguity with integer promotion in template functions. - LongVectorTestData.h: Specialize std::is_signed for wrapper types so FirstBitHigh SFINAE selects the correct signed/unsigned variant. - LongVectors.cpp: Fix ReverseBits, ScanFromMSB, FirstBitLow to use explicit static_cast<T> for integer literals, avoiding ambiguous operator overload resolution with wrapper types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The DXC compiler fix in PR microsoft#8274 handles min precision vector load/store correctly at the DXIL level, so the test-side IO type indirection (MIN_PRECISION, IO_TYPE, IO_OUT_TYPE, BUFFER_TYPE, dispatchMinPrecisionTest, HLK_MIN_PRECISION_TEST) is no longer needed. Min precision tests now use the same HLK_TEST/HLK_WAVEOP_TEST macros and dispatch paths as all other types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Revert static_cast changes in ReverseBits, ScanFromMSB, FirstBitLow that were only needed if these templates were instantiated with min precision wrapper types (they aren't - these ops are excluded). Revert extra static_cast in waveMultiPrefixBitOr. Inline getWaveSize() back into runWaveOpTest() since it no longer has multiple callers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove cast-to-16bit test entries (CastToInt16, CastToUint16, CastToFloat16) for min precision types — 16-bit output types require -enable-16bit-types which changes min precision semantics. Constrain HLSLMin16Uint_t input data so results stay below 0x8000. WARP computes min precision at 16-bit and sign-extends bit 15 on 32-bit store, causing mismatches when results exceed 0x7FFF. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

tex3d

LGTM!

alsepkow and others added 7 commits March 10, 2026 17:20

Update MinPrecision test class GUID

a5e03b9

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-project-automation bot added this to HLSL Roadmap Mar 11, 2026

github-project-automation bot moved this to New in HLSL Roadmap Mar 11, 2026

alsepkow and others added 2 commits March 10, 2026 18:09

Update comment: clarify why FP specials are excluded

250f29f

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

alsepkow changed the title ~~Execution Tetsts: Add min precision test cases to the long vector test~~ Execution Tests: Add min precision test cases to the long vector test Mar 11, 2026

alsepkow and others added 8 commits March 11, 2026 15:56

alsepkow force-pushed the user/alsepkow/MinPrecision branch from c6a3918 to d8cfc9e Compare March 16, 2026 20:49