Execution Tests: Add min precision test cases to the long vector test#8260
Execution Tests: Add min precision test cases to the long vector test#8260alsepkow wants to merge 38 commits intomicrosoft:mainfrom
Conversation
Add device support helper, wrapper types, input data sets, type registration, and validation for min16float, min16int, min16uint. - doesDeviceSupportMinPrecision() checks D3D12 MinPrecisionSupport - HLSLMin16Float_t/HLSLMin16Int_t/HLSLMin16Uint_t wrapper structs (32-bit storage, matching DXIL layout without -enable-16bit-types) - Input data constrained to 16-bit representable range - DATA_TYPE registrations and isFloatingPointType/isMinPrecisionType traits - doValuesMatch overloads: min16float compares in half-precision space (reuses CompareHalfULP/CompareHalfEpsilon), integers use exact match - TrigonometricValidation specializations matching HLSLHalf_t tolerances Part of: microsoft#7780 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add DxilConf_SM69_Vectorized_MinPrecision test class with HLK_TEST_MINP and HLK_WAVEOP_TEST_MINP macros. Mirrors 16-bit counterpart coverage (HLSLHalf_t/int16_t/uint16_t) minus documented exclusions. - New test class with Kits.Specification = Device.Graphics.D3D12.DXILCore.ShaderModel69.MinPrecision - setupClass skips when device lacks min precision support - ~160 test entries across 3 types (min16float/min16int/min16uint) - MakeDifferent overloads in ShaderOpArith.xml (not gated by __HLSL_ENABLE_16_BIT since min precision is always available) - Excluded: FP specials, AsType, Cast, bit-manipulation ops Part of: microsoft#7780 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make all but one conversion operator explicit per wrapper type to avoid C2666 ambiguity with built-in arithmetic operators. Matches HLSLHalf_t pattern: one implicit conversion to the natural type (float/int32_t/uint32_t), all others explicit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The implicit operator float() combined with implicit constructors from int/uint32_t created ambiguity for expressions like 'A + 4': the compiler could not choose between member operator+(HLSLMin16Float_t) via constructor and built-in float+int via conversion. Making the int/uint constructors explicit eliminates the member operator+ path for int literals while preserving T(0) direct construction and implicit float conversion for std:: math functions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
HLSLMin16Int_t: add uint32_t and uint64_t constructors for static_cast<T>(UINT) and static_cast<T>(size_t) patterns used in shift masking and wave ops. Add operator~() for bitwise NOT in WaveMultiPrefixBit ops. HLSLMin16Uint_t: add operator~() for the same reason. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
UnaryMathAbs: extend unsigned check to include HLSLMin16Uint_t (std::is_unsigned_v is false for class types, so abs was called with ambiguous overloads via implicit operator uint32_t). MaskShiftAmount: change constexpr to const since wrapper types are not literal types (no constexpr constructors). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ~27 Cast test entries (CastToBool, CastToInt16, CastToInt32, CastToInt64, CastToUint16/32/64, CastToFloat16/32) for all three min precision types. The generic Cast templates work via the single implicit conversion operator on each wrapper type — C-style casts chain through it (e.g. (int32_t)min16float goes float->int32_t). Remove explicit conversion operators (operator double, operator int32_t, etc.) that were not exercised since Cast tests were not previously included and no other code paths use them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WaveMultiPrefixBitAnd/BitOr/BitXor use the any_int type set (g_AnyIntCT)
which is defined as {int16, int32, int64, uint16, uint32, uint64} and does
not include min precision integer types (min16int, min16uint). Remove the 6
invalid test entries and the now-unused operator~() from both integer
wrapper types.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move all min precision test entries (min16float, min16int, min16uint) from the separate DxilConf_SM69_Vectorized_MinPrecision class into DxilConf_SM69_Vectorized_Core, using HLK_TEST/HLK_WAVEOP_TEST macros. Remove the HLK_TEST_MINP and HLK_WAVEOP_TEST_MINP macro definitions, the DxilConf_SM69_Vectorized_MinPrecision class, and the doesDeviceSupportMinPrecision utility function since min precision support checking is not required. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Min precision types (min16float, min16int, min16uint) are hints that allow hardware to use any precision >= the specified minimum, making buffer storage width implementation-defined. Add IO_TYPE/IO_OUT_TYPE compiler defines that map min precision types to their full-precision equivalents (float, int, uint) for buffer Load/Store operations. For all other types, IO_TYPE equals TYPE and IO_OUT_TYPE equals OUT_TYPE. This ensures deterministic buffer data layout regardless of the device's min precision implementation, while still testing min precision computation via explicit casts between the I/O types and the min precision types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
HLSL does not support signed integer division on minimum-precision types. The compiler rejects these with: 'signed integer division is not supported on minimum-precision types, cast to int to use 32-bit division'. Remove the Divide and Modulus test entries for HLSLMin16Int_t. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace min16float input values that are not exactly representable in float16 with values that are. This avoids precision mismatches between CPU-side expected value computation (float32) and GPU-side min precision results, where the cast to min16float rounds values to the nearest float16 representation. Key changes: - Default1: -0.01f -> -0.03125f (exact power-of-2 fraction) - Positive: 0.01f -> 0.03125f, 5531.0f -> 5504.0f, 331.233f -> 331.25f, 3250.01f -> 3250.0f - RangeHalfPi/RangeOne: replaced with float16-exact fractions covering the same ranges Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adjust input values so that arithmetic results (multiply, mad, subtract, left-shift, wave prefix products) do not overflow 16-bit integer range. Min precision types compute at >= 16 bits, so results that overflow at 16 bits differ from the 32-bit expected values. min16uint changes: - Default1: reduced large values (699->199, 1023->200) so products and wave prefix products fit in uint16 - Default1: ensured all values >= Default2 to avoid subtract underflow (1->3, 6->10, 0->22) - BitShiftRhs: reduced large shifts (13->12, 14->12, 15->12) so shifted values fit in uint16 min16int changes: - BitShiftRhs: reduced large shifts (13->11, 14->11, 15->14) so shifted values fit in int16 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wave and quad intrinsics (WaveReadLaneAt, WaveReadLaneFirst, WaveActiveSum/Min/Max/Product/AllEqual, WavePrefixSum/Product, WaveMultiPrefixSum/Product, WaveMatch, QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) do not support min precision types (min16float, min16int, min16uint). The DXIL wave/quad shuffle operations operate on 32-bit or 64-bit register slots and do not handle 16-bit min precision payloads. Removes 48 test entries (16 per min precision type) and adds explanatory comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The dot product tolerance computation was using float32 ULPs for HLSLMin16Float_t, but the GPU may compute at float16 precision. With NUM=256 elements the accumulated error exceeds the float32-based epsilon. Use HLSLHalf_t::GetULP to compute half-precision ULPs for min16float, matching the approach already used for HLSLHalf_t. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
You can test this locally with the following command:git-clang-format --diff f62b9b4be05f76adb41134e3abd9323be71e68f4 470302b8b01fa350eafe64215fb65e380ea14ab7 -- tools/clang/unittests/HLSLExec/LongVectorTestData.h tools/clang/unittests/HLSLExec/LongVectors.cppView the diff from clang-format here.diff --git a/tools/clang/unittests/HLSLExec/LongVectorTestData.h b/tools/clang/unittests/HLSLExec/LongVectorTestData.h
index 2e8458b9..f77bab92 100644
--- a/tools/clang/unittests/HLSLExec/LongVectorTestData.h
+++ b/tools/clang/unittests/HLSLExec/LongVectorTestData.h
@@ -718,8 +718,8 @@ INPUT_SET(InputSet::Zero, 0);
INPUT_SET(InputSet::BitShiftRhs, 1, 6, 3, 0, 9, 3, 8, 8, 8, 8);
INPUT_SET(InputSet::SelectCond, 0, 1);
INPUT_SET(InputSet::AllOnes, 1);
-INPUT_SET(InputSet::WaveMultiPrefixBitwise, 0x0, 0x1, 0x3, 0x4, 0x10, 0x12,
- 0xF, 0x7FFF);
+INPUT_SET(InputSet::WaveMultiPrefixBitwise, 0x0, 0x1, 0x3, 0x4, 0x10, 0x12, 0xF,
+ 0x7FFF);
END_INPUT_SETS()
#undef BEGIN_INPUT_SETS
|
c6a3918 to
d8cfc9e
Compare
Cast operations have different input and output types (e.g. min16float input with int32 output), so each side needs its own IO type mapping. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Enables constexpr on MaskShiftAmount's ShiftMask local variable, restoring the original constexpr qualifier that was downgraded to const. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Discussion: Shift operator semantics for min precision integer typesWhile working on these tests, I investigated how How shifts are implementedShift operators aren't intrinsics — they're built-in binary operators handled through Clang's Sema/CodeGen layers:
The ambiguity
However, in min precision mode (
For left shift, this is arguably fine — bits above 15 are "extra precision." For right shift, it's murkier — if the 32-bit register has bits set above bit 15 from prior operations, a right shift masked to Compare to Test data is safeThe Existing test coverage
Questions for the team
|
These input sets are not referenced by any test entry since wave ops are excluded for min precision types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Only define -DMIN_PRECISION, -DIO_TYPE, and -DIO_OUT_TYPE for min precision test types. Shader templates use #ifdef MIN_PRECISION to gate the load-with-cast paths, leaving non-min-precision shaders completely unchanged. Fallback #ifndef defines ensure IO_OUT_TYPE resolves to OUT_TYPE for Store calls when MIN_PRECISION is not set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wave, quad, splitdouble, frexp, and modf stores will never be reached by min precision test types, so IO_OUT_TYPE is unnecessary. Keep it only for the final main store and derivative stores which are exercised by min precision tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move min precision IO_TYPE/IO_OUT_TYPE/MIN_PRECISION defines from getCompilerOptionsString into a dedicated dispatchMinPrecisionTest function that passes them via AdditionalCompilerOptions. This matches the existing pattern used by dispatchWaveOpTest and keeps the shared compiler options builder clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use if constexpr on the C++ type instead of strcmp on HLSL type strings. Cleaner and resolved at compile time. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add IOTypeString field to DataType and MIN_PRECISION_DATA_TYPE macro. Remove standalone getIOTypeString template function. The IO type mapping now lives alongside the other type metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bit shifting is not supported for min precision data types (min16int, min16uint). Remove LeftShift/RightShift test entries and their associated BitShiftRhs input data sets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add wave and quad op test entries for min16float, min16int, and min16uint long vector types, mirroring the ops supported by their 16-bit equivalents. Test entries added: - Min16Float: 12 wave ops + 4 quad ops - Min16Int: 15 wave ops (includes WaveMultiPrefixBit*) + 4 quad ops - Min16Uint: 15 wave ops (includes WaveMultiPrefixBit*) + 4 quad ops Infrastructure changes: - ShaderOpArith.xml: Add #ifdef MIN_PRECISION guards to 7 wave op Store calls so they use IO_OUT_TYPE for min precision buffer I/O while keeping OUT_TYPE for standard types (no DXIL change for existing tests) - LongVectors.cpp: Add dispatchMinPrecisionWaveOpTest combining WaveSize with IO_TYPE/MIN_PRECISION compiler options, and HLK_MIN_PRECISION_WAVEOP_TEST macro - LongVectorTestData.h: Add WaveMultiPrefixBitwise input sets for HLSLMin16Int_t and HLSLMin16Uint_t - Fix ambiguous operator& in waveMultiPrefixBitOr for min precision types Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename the shader defines and C++ field to better communicate their purpose: these specify the full-precision buffer storage type for Load/Store operations, not a generic I/O type. - IO_TYPE -> BUFFER_TYPE - IO_OUT_TYPE -> BUFFER_OUT_TYPE - IOTypeString -> BufferTypeString Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract duplicated WaveSize computation from runWaveOpTest and runMinPrecisionWaveOpTest into a shared getWaveSize() method. Update section comments to say 'mirrors applicable ops' since not all ops from the 16-bit equivalent types are supported for min precision. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add LeftShift and RightShift test entries for min16int and min16uint. Both produce valid min-precision DXIL (shl/ashr/lshr i16 with 4-bit shift masking). ReverseBits, CountBits, FirstBitHigh, FirstBitLow are excluded — DXC promotes min precision to i32 before calling these DXIL intrinsics, so they don't actually test min precision behavior. Infrastructure changes: - LongVectorTestData.h: Add Bitwise and BitShiftRhs input sets for HLSLMin16Int_t and HLSLMin16Uint_t matching int16_t/uint16_t names. Values constrained to 16-bit safe range. - LongVectorTestData.h: Add compound assignment operators (<<=, >>=, |=, &=, ^=) and unary ~ to both wrapper types to resolve ambiguity with integer promotion in template functions. - LongVectorTestData.h: Specialize std::is_signed for wrapper types so FirstBitHigh SFINAE selects the correct signed/unsigned variant. - LongVectors.cpp: Fix ReverseBits, ScanFromMSB, FirstBitLow to use explicit static_cast<T> for integer literals, avoiding ambiguous operator overload resolution with wrapper types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The DXC compiler fix in PR microsoft#8274 handles min precision vector load/store correctly at the DXIL level, so the test-side IO type indirection (MIN_PRECISION, IO_TYPE, IO_OUT_TYPE, BUFFER_TYPE, dispatchMinPrecisionTest, HLK_MIN_PRECISION_TEST) is no longer needed. Min precision tests now use the same HLK_TEST/HLK_WAVEOP_TEST macros and dispatch paths as all other types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert static_cast changes in ReverseBits, ScanFromMSB, FirstBitLow that were only needed if these templates were instantiated with min precision wrapper types (they aren't - these ops are excluded). Revert extra static_cast in waveMultiPrefixBitOr. Inline getWaveSize() back into runWaveOpTest() since it no longer has multiple callers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove cast-to-16bit test entries (CastToInt16, CastToUint16, CastToFloat16) for min precision types — 16-bit output types require -enable-16bit-types which changes min precision semantics. Constrain HLSLMin16Uint_t input data so results stay below 0x8000. WARP computes min precision at 16-bit and sign-extends bit 15 on 32-bit store, causing mismatches when results exceed 0x7FFF. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This PR extends the SM 6.9 long vector execution tests to cover HLSL min precision types (min16float, min16int, min16uint). These types are always available —
D3D12_SHADER_MIN_PRECISION_SUPPORTonly reports whether hardware actually uses reduced precision, not whether the types compile — so no device capability check is needed and the tests live in the existingDxilConf_SM69_Vectorized_Coreclass alongside other types.Note: I wasn't able to find any existing min precision HLK tests. Unclear if we have coverage.
Key design decisions
Full-precision buffer I/O: Min precision types have implementation-defined buffer storage width, so we use full-precision types (
float/int/uint) for all Load/Store operations via theIO_TYPE/IO_OUT_TYPEshader defines, with explicit casts to/from the min precision compute type. This ensures deterministic data layout regardless of the device implementation.Half-precision tolerances: Validation compares results in fp16 space using HLSLHalf_t ULP tolerances. Since min precision guarantees at least 16-bit, fp16 tolerances are a correct upper bound — devices computing at higher precision will produce more accurate results, not less.
Test coverage mirrors existing patterns:
Wave and quad op support: Wave ops (WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First, WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) are tested for all three min precision types, mirroring the ops supported by their 16-bit equivalents. The wave op shader helpers use
#ifdef MIN_PRECISIONguards to store results viaIO_OUT_TYPEfor deterministic buffer layout without changing DXIL for existing non-min-precision tests.Excluded operations:
Resolves #7780
All tests require the rawBufferVectorLoad/Store fix from : #8274
The array accessor and wave/quad op tests for min precision require the optimizer fix from: #8269
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com