Fix GVN and SROA miscompilation of min precision vector element access#8269
Conversation
Multiple optimization passes mishandle min precision vector types due to DXC's padded data layout (i16:32, f16:32), where getTypeSizeInBits returns padded sizes for vectors but primitive sizes for scalars. Bug 1 - GVN ICE: CanCoerceMustAliasedValueToLoad computes a padded integer type (e.g., i96 for <3 x half>) then attempts to bitcast from the 48-bit LLVM type, triggering an assert. Fix: reject coercion when type sizes include padding. Bug 2 - GVN incorrect store forwarding: processLoad forwards a 'store <3 x i16> zeroinitializer' past partial element stores because MemoryDependence uses padded sizes for aliasing. Fix: skip store-to-load forwarding for padded types. Bug 3 - SROA element misindexing: getNaturalGEPRecursively uses getTypeSizeInBits (2 bytes for i16) for element offsets while GEP uses getTypeAllocSize (4 bytes with i16:32). Byte offset 4 (element 1) maps to index 4/2=2 instead of 4/4=1, causing SROA to misplace or eliminate element stores. Fix: use getTypeAllocSizeInBits consistently for vector element sizes throughout SROA. Fixes microsoft#8268 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…thub.com/alsepkow/DirectXShaderCompiler into user/alsepkow/fix-min-precision-opt-bugs
Add lit tests that verify GVN and SROA handle DXC's padded min precision types (i16:32, f16:32) correctly. GVN tests (test/Transforms/GVN/min-precision-padding.ll): - Verify store-to-load forwarding is blocked for padded vector types - Verify type coercion is rejected for padded types - Cover <3 x i16>, <3 x half>, <5 x i16>, <8 x half> SROA tests (test/Transforms/SROA/min-precision-padding.ll): - Verify GEP element indices use alloc size (4 bytes) not prim size (2 bytes) - Verify all element stores survive with correct indices - Cover <3 x i16>, <3 x half>, <5 x i16>, <8 x half> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The processLoad padding guard was unconditionally blocking store-to-load forwarding for all padded vector types, including trivially correct same-type forwarding. Narrow the check to only fire when stored and loaded types differ, preserving defense-in-depth for cross-type forwarding while allowing valid same-type optimizations for i16/half vectors. Also fix the comment to accurately describe the root cause: BasicAA returns MustAlias at offset 0 regardless of access sizes, causing partial element stores to appear as Defs to MemoryDependence. Add test cases verifying same-type forwarding still works for padded types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes optimizer miscompilations affecting min-precision vector element ([]) access under padded HLSL data layouts by aligning GVN/SROA behavior with allocation-size-based strides and preventing unsafe GVN coercions/forwarding.
Changes:
- Update SROA vector element stride calculations to use allocation size in bits (matching GEP byte offset stride under padded layouts).
- Harden GVN against coercion and cross-type store-to-load forwarding when data layout padding would make the transformation invalid.
- Add new regression tests covering SROA and GVN behavior for padded
i16/halfvectors, including partial element stores.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| test/Transforms/SROA/min-precision-padding.ll | Adds SROA regression coverage ensuring vector element GEP indexing matches padded element stride. |
| test/Transforms/GVN/min-precision-padding.ll | Adds GVN regression coverage preventing unsafe coercion/forwarding with padded min-precision vectors while preserving same-type forwarding. |
| lib/Transforms/Scalar/SROA.cpp | Switches vector element sizing to alloc-size-in-bits in multiple SROA decision points. |
| lib/Transforms/Scalar/GVN.cpp | Rejects coercion and adds forwarding guardrails when type size implies padding-related hazards. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
V-FEXrt
left a comment
There was a problem hiding this comment.
Overall LGTM but this part of the code base (min precision) is notorious and I would really like someone whose worked on it to review. Otherwise a couple smaller comment
also, I've notice that code generated by LLMs seem to way over comment compared to code a human would write. A lot of these comments are fairly repetitive (I think I've seen "the issue is 2/1 is 2 not 1" at least 4 times in the PR). Not sure what we should do about it but in general I prefer for "why not how" type comments and LLMs seem to create a lot of "how" comments (I assume for the benefit of the human-in-the-loop?) Maybe we should have a editorial policy here if its a common trope others notice
damyanp
left a comment
There was a problem hiding this comment.
This LGTM, and I did some poking to see if these changes might impact non-min-precision types and convinced myself it wouldn't.
We should get an approval from someone more familiar with this area of the code though.
- GVN CanCoerceMustAliasedValueToLoad: clarify comment to describe the vector padding mismatch (not scalar). - SROA AllocaSliceRewriter: add missing // HLSL Change marker on assert. - GVN test: make coercion rejection check more specific (CHECK: load i96, CHECK-NOT: bitcast). - SROA test: trim verbose header comment to focus on what, not how. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tex3d
left a comment
There was a problem hiding this comment.
I think this looks good. I do wonder if there are other places with a similar assumption (that vector width is dependent on element size, not element alloc size), but this certainly seems to fix things without regressing anything as far as I can tell.
…#8260) This PR extends the SM 6.9 long vector execution tests to cover HLSL min precision types (min16float, min16int, min16uint). These types are always available — `D3D12_SHADER_MIN_PRECISION_SUPPORT` only reports whether hardware actually uses reduced precision, not whether the types compile — so no device capability check is needed and the tests live in the existing `DxilConf_SM69_Vectorized_Core` class alongside other types. Note: I wasn't able to find any existing min precision HLK tests. Unclear if we have coverage. ## Key design decisions **Full-precision buffer I/O:** Min precision types have implementation-defined buffer storage width, so we use full-precision types (`float`/`int`/`uint`) for all Load/Store operations via the `IO_TYPE`/`IO_OUT_TYPE` shader defines, with explicit casts to/from the min precision compute type. This ensures deterministic data layout regardless of the device implementation. **Half-precision tolerances:** Validation compares results in fp16 space using HLSLHalf_t ULP tolerances. Since min precision guarantees at least 16-bit, fp16 tolerances are a correct upper bound — devices computing at higher precision will produce more accurate results, not less. **Test coverage mirrors existing patterns:** - min16float mirrors HLSLHalf_t (float/trig/math/comparison/dot/cast/derivative/wave/quad/load-store) - min16int mirrors int16_t (arithmetic/bitwise/comparison/reduction/cast/wave/quad/load-store) - min16uint mirrors uint16_t (arithmetic/bitwise/comparison/cast/wave/quad/load-store) **Wave and quad op support:** Wave ops (WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First, WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) are tested for all three min precision types, mirroring the ops supported by their 16-bit equivalents. The wave op shader helpers use `#ifdef MIN_PRECISION` guards to store results via `IO_OUT_TYPE` for deterministic buffer layout without changing DXIL for existing non-min-precision tests. **Excluded operations:** - Signed div/mod on min16int: HLSL does not support signed integer division on min precision types - Bit shifting on min16int/min16uint: Not supported for min precision types - FP specials (INF/NaN/denorm): min precision types do not support them Resolves #7780 All tests require the rawBufferVectorLoad/Store fix from : #8274 The array accessor and wave/quad op tests for min precision require the optimizer fix from: #8269 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
microsoft#8269) ## Summary Fixes three related optimizer bugs that cause miscompilation of min precision vector element access (`[]` operator) on `min16float`, `min16int`, and `min16uint` types at optimization levels O1+. Resolves microsoft#8268 ## Root Cause DXC's data layout pads min precision types (`i16:32`, `f16:32`). The HLSL change in `DataLayout::getTypeSizeInBits` makes vector sizes use alloc size per element (e.g., `<3 x i16>` = 96 bits), but scalar `getTypeSizeInBits(i16)` returns the primitive width (16 bits). This inconsistency causes three bugs: 1. **GVN ICE**: `CanCoerceMustAliasedValueToLoad` creates a padded-width integer (i96) then attempts a bitcast from the 48-bit LLVM vector type — assert fires. 2. **GVN incorrect store forwarding**: `processLoad` forwards a zeroinitializer store past partial element stores because MemoryDependence uses padded sizes for aliasing. 3. **SROA element misindexing** (primary bug): `getNaturalGEPRecursively` uses `getTypeSizeInBits(i16)/8 = 2` for element offsets, while GEP uses `getTypeAllocSize(i16) = 4`. Byte offset 4 (element 1) maps to index `4/2 = 2` instead of `4/4 = 1`, causing SROA to misplace or eliminate element stores. ## Changes **`lib/Transforms/Scalar/GVN.cpp`** - `CanCoerceMustAliasedValueToLoad`: Reject coercion when type sizes include padding (`getTypeSizeInBits != getPrimitiveSizeInBits`) - `processLoad` StoreInst handler: Skip store-to-load forwarding for padded types **`lib/Transforms/Scalar/SROA.cpp`** - `getNaturalGEPRecursively`: Use `getTypeAllocSizeInBits` for vector element size to match GEP offset stride - `isVectorPromotionViable`: Same fix for element size calculation - `AllocaSliceRewriter` constructor: Same fix for `ElementSize` ## Testing - All 6 min precision ArrayOperator tests (StaticAccess + DynamicAccess × 3 types) pass with the fix - Verified optimized DXIL output retains all 3 element stores with correct indices ## Co-authored This fix was investigated and implemented with the assistance of GitHub Copilot (AI pair programming). The root cause analysis — tracing the bug through `-print-after-all` pass dumps, identifying SROA as the culprit, and understanding the `getTypeSizeInBits` vs `getTypeAllocSize` mismatch — was a collaborative effort. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…microsoft#8260) This PR extends the SM 6.9 long vector execution tests to cover HLSL min precision types (min16float, min16int, min16uint). These types are always available — `D3D12_SHADER_MIN_PRECISION_SUPPORT` only reports whether hardware actually uses reduced precision, not whether the types compile — so no device capability check is needed and the tests live in the existing `DxilConf_SM69_Vectorized_Core` class alongside other types. Note: I wasn't able to find any existing min precision HLK tests. Unclear if we have coverage. ## Key design decisions **Full-precision buffer I/O:** Min precision types have implementation-defined buffer storage width, so we use full-precision types (`float`/`int`/`uint`) for all Load/Store operations via the `IO_TYPE`/`IO_OUT_TYPE` shader defines, with explicit casts to/from the min precision compute type. This ensures deterministic data layout regardless of the device implementation. **Half-precision tolerances:** Validation compares results in fp16 space using HLSLHalf_t ULP tolerances. Since min precision guarantees at least 16-bit, fp16 tolerances are a correct upper bound — devices computing at higher precision will produce more accurate results, not less. **Test coverage mirrors existing patterns:** - min16float mirrors HLSLHalf_t (float/trig/math/comparison/dot/cast/derivative/wave/quad/load-store) - min16int mirrors int16_t (arithmetic/bitwise/comparison/reduction/cast/wave/quad/load-store) - min16uint mirrors uint16_t (arithmetic/bitwise/comparison/cast/wave/quad/load-store) **Wave and quad op support:** Wave ops (WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First, WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) are tested for all three min precision types, mirroring the ops supported by their 16-bit equivalents. The wave op shader helpers use `#ifdef MIN_PRECISION` guards to store results via `IO_OUT_TYPE` for deterministic buffer layout without changing DXIL for existing non-min-precision tests. **Excluded operations:** - Signed div/mod on min16int: HLSL does not support signed integer division on min precision types - Bit shifting on min16int/min16uint: Not supported for min precision types - FP specials (INF/NaN/denorm): min precision types do not support them Resolves microsoft#7780 All tests require the rawBufferVectorLoad/Store fix from : microsoft#8274 The array accessor and wave/quad op tests for min precision require the optimizer fix from: microsoft#8269 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary
Fixes three related optimizer bugs that cause miscompilation of min precision vector element access (
[]operator) onmin16float,min16int, andmin16uinttypes at optimization levels O1+.Resolves #8268
Root Cause
DXC's data layout pads min precision types (
i16:32,f16:32). The HLSL change inDataLayout::getTypeSizeInBitsmakes vector sizes use alloc size per element (e.g.,<3 x i16>= 96 bits), but scalargetTypeSizeInBits(i16)returns the primitive width (16 bits). This inconsistency causes three bugs:CanCoerceMustAliasedValueToLoadcreates a padded-width integer (i96) then attempts a bitcast from the 48-bit LLVM vector type — assert fires.processLoadforwards a zeroinitializer store past partial element stores because MemoryDependence uses padded sizes for aliasing.getNaturalGEPRecursivelyusesgetTypeSizeInBits(i16)/8 = 2for element offsets, while GEP usesgetTypeAllocSize(i16) = 4. Byte offset 4 (element 1) maps to index4/2 = 2instead of4/4 = 1, causing SROA to misplace or eliminate element stores.Changes
lib/Transforms/Scalar/GVN.cppCanCoerceMustAliasedValueToLoad: Reject coercion when type sizes include padding (getTypeSizeInBits != getPrimitiveSizeInBits)processLoadStoreInst handler: Skip store-to-load forwarding for padded typeslib/Transforms/Scalar/SROA.cppgetNaturalGEPRecursively: UsegetTypeAllocSizeInBitsfor vector element size to match GEP offset strideisVectorPromotionViable: Same fix for element size calculationAllocaSliceRewriterconstructor: Same fix forElementSizeTesting
Co-authored
This fix was investigated and implemented with the assistance of GitHub Copilot (AI pair programming). The root cause analysis — tracing the bug through
-print-after-allpass dumps, identifying SROA as the culprit, and understanding thegetTypeSizeInBitsvsgetTypeAllocSizemismatch — was a collaborative effort.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com