Skip to content

Execution Tests: Add min precision test cases to the long vector test#8260

Open
alsepkow wants to merge 38 commits intomicrosoft:mainfrom
alsepkow:user/alsepkow/MinPrecision
Open

Execution Tests: Add min precision test cases to the long vector test#8260
alsepkow wants to merge 38 commits intomicrosoft:mainfrom
alsepkow:user/alsepkow/MinPrecision

Conversation

@alsepkow
Copy link
Copy Markdown
Contributor

@alsepkow alsepkow commented Mar 11, 2026

This PR extends the SM 6.9 long vector execution tests to cover HLSL min precision types (min16float, min16int, min16uint). These types are always available — D3D12_SHADER_MIN_PRECISION_SUPPORT only reports whether hardware actually uses reduced precision, not whether the types compile — so no device capability check is needed and the tests live in the existing DxilConf_SM69_Vectorized_Core class alongside other types.

Note: I wasn't able to find any existing min precision HLK tests. Unclear if we have coverage.

Key design decisions

Full-precision buffer I/O: Min precision types have implementation-defined buffer storage width, so we use full-precision types (float/int/uint) for all Load/Store operations via the IO_TYPE/IO_OUT_TYPE shader defines, with explicit casts to/from the min precision compute type. This ensures deterministic data layout regardless of the device implementation.

Half-precision tolerances: Validation compares results in fp16 space using HLSLHalf_t ULP tolerances. Since min precision guarantees at least 16-bit, fp16 tolerances are a correct upper bound — devices computing at higher precision will produce more accurate results, not less.

Test coverage mirrors existing patterns:

  • min16float mirrors HLSLHalf_t (float/trig/math/comparison/dot/cast/derivative/wave/quad/load-store)
  • min16int mirrors int16_t (arithmetic/bitwise/comparison/reduction/cast/wave/quad/load-store)
  • min16uint mirrors uint16_t (arithmetic/bitwise/comparison/cast/wave/quad/load-store)

Wave and quad op support: Wave ops (WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First, WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) are tested for all three min precision types, mirroring the ops supported by their 16-bit equivalents. The wave op shader helpers use #ifdef MIN_PRECISION guards to store results via IO_OUT_TYPE for deterministic buffer layout without changing DXIL for existing non-min-precision tests.

Excluded operations:

  • Signed div/mod on min16int: HLSL does not support signed integer division on min precision types
  • Bit shifting on min16int/min16uint: Not supported for min precision types
  • FP specials (INF/NaN/denorm): min precision types do not support them

Resolves #7780

All tests require the rawBufferVectorLoad/Store fix from : #8274
The array accessor and wave/quad op tests for min precision require the optimizer fix from: #8269

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

alsepkow and others added 7 commits March 10, 2026 17:20
Add device support helper, wrapper types, input data sets, type
registration, and validation for min16float, min16int, min16uint.

- doesDeviceSupportMinPrecision() checks D3D12 MinPrecisionSupport
- HLSLMin16Float_t/HLSLMin16Int_t/HLSLMin16Uint_t wrapper structs
  (32-bit storage, matching DXIL layout without -enable-16bit-types)
- Input data constrained to 16-bit representable range
- DATA_TYPE registrations and isFloatingPointType/isMinPrecisionType traits
- doValuesMatch overloads: min16float compares in half-precision space
  (reuses CompareHalfULP/CompareHalfEpsilon), integers use exact match
- TrigonometricValidation specializations matching HLSLHalf_t tolerances

Part of: microsoft#7780

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add DxilConf_SM69_Vectorized_MinPrecision test class with HLK_TEST_MINP
and HLK_WAVEOP_TEST_MINP macros. Mirrors 16-bit counterpart coverage
(HLSLHalf_t/int16_t/uint16_t) minus documented exclusions.

- New test class with Kits.Specification =
  Device.Graphics.D3D12.DXILCore.ShaderModel69.MinPrecision
- setupClass skips when device lacks min precision support
- ~160 test entries across 3 types (min16float/min16int/min16uint)
- MakeDifferent overloads in ShaderOpArith.xml (not gated by
  __HLSL_ENABLE_16_BIT since min precision is always available)
- Excluded: FP specials, AsType, Cast, bit-manipulation ops

Part of: microsoft#7780

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make all but one conversion operator explicit per wrapper type to
avoid C2666 ambiguity with built-in arithmetic operators. Matches
HLSLHalf_t pattern: one implicit conversion to the natural type
(float/int32_t/uint32_t), all others explicit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The implicit operator float() combined with implicit constructors from
int/uint32_t created ambiguity for expressions like 'A + 4': the
compiler could not choose between member operator+(HLSLMin16Float_t)
via constructor and built-in float+int via conversion. Making the
int/uint constructors explicit eliminates the member operator+ path
for int literals while preserving T(0) direct construction and
implicit float conversion for std:: math functions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
HLSLMin16Int_t: add uint32_t and uint64_t constructors for
static_cast<T>(UINT) and static_cast<T>(size_t) patterns used in
shift masking and wave ops. Add operator~() for bitwise NOT in
WaveMultiPrefixBit ops.

HLSLMin16Uint_t: add operator~() for the same reason.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
UnaryMathAbs: extend unsigned check to include HLSLMin16Uint_t
(std::is_unsigned_v is false for class types, so abs was called
with ambiguous overloads via implicit operator uint32_t).

MaskShiftAmount: change constexpr to const since wrapper types are
not literal types (no constexpr constructors).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
alsepkow and others added 2 commits March 10, 2026 18:09
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ~27 Cast test entries (CastToBool, CastToInt16, CastToInt32,
CastToInt64, CastToUint16/32/64, CastToFloat16/32) for all three
min precision types. The generic Cast templates work via the single
implicit conversion operator on each wrapper type — C-style casts
chain through it (e.g. (int32_t)min16float goes float->int32_t).

Remove explicit conversion operators (operator double, operator
int32_t, etc.) that were not exercised since Cast tests were not
previously included and no other code paths use them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@alsepkow alsepkow changed the title Execution Tetsts: Add min precision test cases to the long vector test Execution Tests: Add min precision test cases to the long vector test Mar 11, 2026
alsepkow and others added 8 commits March 11, 2026 15:56
WaveMultiPrefixBitAnd/BitOr/BitXor use the any_int type set (g_AnyIntCT)
which is defined as {int16, int32, int64, uint16, uint32, uint64} and does
not include min precision integer types (min16int, min16uint). Remove the 6
invalid test entries and the now-unused operator~() from both integer
wrapper types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move all min precision test entries (min16float, min16int, min16uint)
from the separate DxilConf_SM69_Vectorized_MinPrecision class into
DxilConf_SM69_Vectorized_Core, using HLK_TEST/HLK_WAVEOP_TEST macros.

Remove the HLK_TEST_MINP and HLK_WAVEOP_TEST_MINP macro definitions,
the DxilConf_SM69_Vectorized_MinPrecision class, and the
doesDeviceSupportMinPrecision utility function since min precision
support checking is not required.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Min precision types (min16float, min16int, min16uint) are hints that
allow hardware to use any precision >= the specified minimum, making
buffer storage width implementation-defined. Add IO_TYPE/IO_OUT_TYPE
compiler defines that map min precision types to their full-precision
equivalents (float, int, uint) for buffer Load/Store operations. For
all other types, IO_TYPE equals TYPE and IO_OUT_TYPE equals OUT_TYPE.

This ensures deterministic buffer data layout regardless of the
device's min precision implementation, while still testing min
precision computation via explicit casts between the I/O types and
the min precision types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
HLSL does not support signed integer division on minimum-precision
types. The compiler rejects these with: 'signed integer division is
not supported on minimum-precision types, cast to int to use 32-bit
division'. Remove the Divide and Modulus test entries for
HLSLMin16Int_t.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace min16float input values that are not exactly representable
in float16 with values that are. This avoids precision mismatches
between CPU-side expected value computation (float32) and GPU-side
min precision results, where the cast to min16float rounds values
to the nearest float16 representation.

Key changes:
- Default1: -0.01f -> -0.03125f (exact power-of-2 fraction)
- Positive: 0.01f -> 0.03125f, 5531.0f -> 5504.0f,
  331.233f -> 331.25f, 3250.01f -> 3250.0f
- RangeHalfPi/RangeOne: replaced with float16-exact fractions
  covering the same ranges

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adjust input values so that arithmetic results (multiply, mad,
subtract, left-shift, wave prefix products) do not overflow 16-bit
integer range. Min precision types compute at >= 16 bits, so results
that overflow at 16 bits differ from the 32-bit expected values.

min16uint changes:
- Default1: reduced large values (699->199, 1023->200) so products
  and wave prefix products fit in uint16
- Default1: ensured all values >= Default2 to avoid subtract underflow
  (1->3, 6->10, 0->22)
- BitShiftRhs: reduced large shifts (13->12, 14->12, 15->12) so
  shifted values fit in uint16

min16int changes:
- BitShiftRhs: reduced large shifts (13->11, 14->11, 15->14) so
  shifted values fit in int16

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wave and quad intrinsics (WaveReadLaneAt, WaveReadLaneFirst,
WaveActiveSum/Min/Max/Product/AllEqual, WavePrefixSum/Product,
WaveMultiPrefixSum/Product, WaveMatch, QuadReadLaneAt,
QuadReadAcrossX/Y/Diagonal) do not support min precision types
(min16float, min16int, min16uint). The DXIL wave/quad shuffle
operations operate on 32-bit or 64-bit register slots and do not
handle 16-bit min precision payloads.

Removes 48 test entries (16 per min precision type) and adds
explanatory comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The dot product tolerance computation was using float32 ULPs for
HLSLMin16Float_t, but the GPU may compute at float16 precision.
With NUM=256 elements the accumulated error exceeds the float32-based
epsilon. Use HLSLHalf_t::GetULP to compute half-precision ULPs for
min16float, matching the approach already used for HLSLHalf_t.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 14, 2026

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff f62b9b4be05f76adb41134e3abd9323be71e68f4 470302b8b01fa350eafe64215fb65e380ea14ab7 -- tools/clang/unittests/HLSLExec/LongVectorTestData.h tools/clang/unittests/HLSLExec/LongVectors.cpp
View the diff from clang-format here.
diff --git a/tools/clang/unittests/HLSLExec/LongVectorTestData.h b/tools/clang/unittests/HLSLExec/LongVectorTestData.h
index 2e8458b9..f77bab92 100644
--- a/tools/clang/unittests/HLSLExec/LongVectorTestData.h
+++ b/tools/clang/unittests/HLSLExec/LongVectorTestData.h
@@ -718,8 +718,8 @@ INPUT_SET(InputSet::Zero, 0);
 INPUT_SET(InputSet::BitShiftRhs, 1, 6, 3, 0, 9, 3, 8, 8, 8, 8);
 INPUT_SET(InputSet::SelectCond, 0, 1);
 INPUT_SET(InputSet::AllOnes, 1);
-INPUT_SET(InputSet::WaveMultiPrefixBitwise, 0x0, 0x1, 0x3, 0x4, 0x10, 0x12,
-          0xF, 0x7FFF);
+INPUT_SET(InputSet::WaveMultiPrefixBitwise, 0x0, 0x1, 0x3, 0x4, 0x10, 0x12, 0xF,
+          0x7FFF);
 END_INPUT_SETS()
 
 #undef BEGIN_INPUT_SETS
  • Check this box to apply formatting changes to this branch.

@alsepkow alsepkow force-pushed the user/alsepkow/MinPrecision branch from c6a3918 to d8cfc9e Compare March 16, 2026 20:49
alsepkow and others added 2 commits March 16, 2026 16:46
Cast operations have different input and output types (e.g. min16float
input with int32 output), so each side needs its own IO type mapping.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Enables constexpr on MaskShiftAmount's ShiftMask local variable,
restoring the original constexpr qualifier that was downgraded to const.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@alsepkow
Copy link
Copy Markdown
Contributor Author

Discussion: Shift operator semantics for min precision integer types

While working on these tests, I investigated how << and >> work on min16int/min16uint and found a potential semantic ambiguity worth discussing with the team.

How shifts are implemented

Shift operators aren't intrinsics — they're built-in binary operators handled through Clang's Sema/CodeGen layers:

  • Sema validation: SemaHLSL.cpp classifies shifts as "bitwise" ops requiring integral types (BinaryOperatorKindIsBitwiseBinaryOperatorKindRequiresIntegralsIsBasicKindIntegral). min16int and min16uint have BPROP_INTEGER set, so they pass validation. min16float is correctly rejected.

  • Shift amount wrapping: HLSL wraps shift amounts like OpenCL — the RHS is masked by (bit_width - 1). This happens in CGExprScalar.cpp:EmitShl/EmitShr:

    if (CGF.getLangOpts().OpenCL || CGF.getLangOpts().HLSL)
        RHS = Builder.CreateAnd(RHS, GetWidthMinusOneValue(Ops.LHS, RHS), "shl.mask");

The ambiguity

min16int has AST Width = 16 (ASTContext.cpp:1683), producing LLVM IR type i16 (CodeGenTypes.cpp:437). So GetWidthMinusOneValue returns 15, and the shift amount is masked with & 15 (mod 16).

However, in min precision mode (UseMinPrecision=true), the hardware executes on 32-bit registers. The & 15 mask is baked into the IR before any min-precision-to-32-bit promotion. This means:

  • The shift amount is clamped to [0, 15] based on the declared 16-bit type
  • But the actual shift executes on a 32-bit value

For left shift, this is arguably fine — bits above 15 are "extra precision." For right shift, it's murkier — if the 32-bit register has bits set above bit 15 from prior operations, a right shift masked to [0, 15] won't clear them the way a true 16-bit right shift would.

Compare to int32_t where the mask is & 31 and execution width matches. For min precision, the wrapping width (16) doesn't necessarily match the execution width (potentially 32).

Test data is safe

The BitShiftRhs input set for int16_t/uint16_t (reused by Min16Int_t/Min16UInt_t) uses values {1, 6, 3, 0, 9, 3, 12, 13, 14, 15} — all within [0, 15], so we avoid the ambiguous boundary. This was intentional.

Existing test coverage

  • Sema tests (scalar-operators.hlsl): verify type rules for min16 shifts ✅
  • FileCheck tests for shift IR output on min precision: none ❌
  • Execution tests for min precision shifts: this PR adds the first ones ✅

Questions for the team

  1. Is the & 15 mask correct for min precision types, or should it be & 31 to match the actual execution width?
  2. Should we add a boundary-probing test (e.g., shift by 16) to document/pin down the current behavior?
  3. Is this a known design decision or a gap that should be tracked as a separate issue?

alsepkow and others added 7 commits March 16, 2026 17:38
These input sets are not referenced by any test entry since wave ops
are excluded for min precision types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Only define -DMIN_PRECISION, -DIO_TYPE, and -DIO_OUT_TYPE for min
precision test types. Shader templates use #ifdef MIN_PRECISION to
gate the load-with-cast paths, leaving non-min-precision shaders
completely unchanged. Fallback #ifndef defines ensure IO_OUT_TYPE
resolves to OUT_TYPE for Store calls when MIN_PRECISION is not set.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wave, quad, splitdouble, frexp, and modf stores will never be reached
by min precision test types, so IO_OUT_TYPE is unnecessary. Keep it
only for the final main store and derivative stores which are exercised
by min precision tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move min precision IO_TYPE/IO_OUT_TYPE/MIN_PRECISION defines from
getCompilerOptionsString into a dedicated dispatchMinPrecisionTest
function that passes them via AdditionalCompilerOptions. This matches
the existing pattern used by dispatchWaveOpTest and keeps the shared
compiler options builder clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use if constexpr on the C++ type instead of strcmp on HLSL type
strings. Cleaner and resolved at compile time.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add IOTypeString field to DataType and MIN_PRECISION_DATA_TYPE macro.
Remove standalone getIOTypeString template function. The IO type
mapping now lives alongside the other type metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@alsepkow alsepkow marked this pull request as ready for review March 17, 2026 01:21
alsepkow and others added 11 commits March 17, 2026 13:43
Bit shifting is not supported for min precision data types (min16int,
min16uint). Remove LeftShift/RightShift test entries and their associated
BitShiftRhs input data sets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add wave and quad op test entries for min16float, min16int, and min16uint
long vector types, mirroring the ops supported by their 16-bit equivalents.

Test entries added:
- Min16Float: 12 wave ops + 4 quad ops
- Min16Int: 15 wave ops (includes WaveMultiPrefixBit*) + 4 quad ops
- Min16Uint: 15 wave ops (includes WaveMultiPrefixBit*) + 4 quad ops

Infrastructure changes:
- ShaderOpArith.xml: Add #ifdef MIN_PRECISION guards to 7 wave op Store
  calls so they use IO_OUT_TYPE for min precision buffer I/O while keeping
  OUT_TYPE for standard types (no DXIL change for existing tests)
- LongVectors.cpp: Add dispatchMinPrecisionWaveOpTest combining WaveSize
  with IO_TYPE/MIN_PRECISION compiler options, and HLK_MIN_PRECISION_WAVEOP_TEST
  macro
- LongVectorTestData.h: Add WaveMultiPrefixBitwise input sets for
  HLSLMin16Int_t and HLSLMin16Uint_t
- Fix ambiguous operator& in waveMultiPrefixBitOr for min precision types

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename the shader defines and C++ field to better communicate their purpose:
these specify the full-precision buffer storage type for Load/Store
operations, not a generic I/O type.

- IO_TYPE -> BUFFER_TYPE
- IO_OUT_TYPE -> BUFFER_OUT_TYPE
- IOTypeString -> BufferTypeString

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract duplicated WaveSize computation from runWaveOpTest and
runMinPrecisionWaveOpTest into a shared getWaveSize() method.

Update section comments to say 'mirrors applicable ops' since not all
ops from the 16-bit equivalent types are supported for min precision.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add LeftShift and RightShift test entries for min16int and min16uint.
Both produce valid min-precision DXIL (shl/ashr/lshr i16 with 4-bit
shift masking).

ReverseBits, CountBits, FirstBitHigh, FirstBitLow are excluded — DXC
promotes min precision to i32 before calling these DXIL intrinsics,
so they don't actually test min precision behavior.

Infrastructure changes:
- LongVectorTestData.h: Add Bitwise and BitShiftRhs input sets for
  HLSLMin16Int_t and HLSLMin16Uint_t matching int16_t/uint16_t names.
  Values constrained to 16-bit safe range.
- LongVectorTestData.h: Add compound assignment operators (<<=, >>=,
  |=, &=, ^=) and unary ~ to both wrapper types to resolve ambiguity
  with integer promotion in template functions.
- LongVectorTestData.h: Specialize std::is_signed for wrapper types
  so FirstBitHigh SFINAE selects the correct signed/unsigned variant.
- LongVectors.cpp: Fix ReverseBits, ScanFromMSB, FirstBitLow to use
  explicit static_cast<T> for integer literals, avoiding ambiguous
  operator overload resolution with wrapper types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The DXC compiler fix in PR microsoft#8274 handles min precision vector
load/store correctly at the DXIL level, so the test-side IO type
indirection (MIN_PRECISION, IO_TYPE, IO_OUT_TYPE, BUFFER_TYPE,
dispatchMinPrecisionTest, HLK_MIN_PRECISION_TEST) is no longer needed.
Min precision tests now use the same HLK_TEST/HLK_WAVEOP_TEST macros
and dispatch paths as all other types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert static_cast changes in ReverseBits, ScanFromMSB, FirstBitLow
that were only needed if these templates were instantiated with min
precision wrapper types (they aren't - these ops are excluded).
Revert extra static_cast in waveMultiPrefixBitOr. Inline getWaveSize()
back into runWaveOpTest() since it no longer has multiple callers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove cast-to-16bit test entries (CastToInt16, CastToUint16,
CastToFloat16) for min precision types — 16-bit output types require
-enable-16bit-types which changes min precision semantics.

Constrain HLSLMin16Uint_t input data so results stay below 0x8000.
WARP computes min precision at 16-bit and sign-extends bit 15 on
32-bit store, causing mismatches when results exceed 0x7FFF.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New

Development

Successfully merging this pull request may close these issues.

Long Vector Execution Tests: Add test cases using min precision values

3 participants