Fix GVN and SROA miscompilation of min precision vector element access by alsepkow · Pull Request #8269 · microsoft/DirectXShaderCompiler

alsepkow · 2026-03-14T06:49:17Z

Summary

Fixes three related optimizer bugs that cause miscompilation of min precision vector element access ([] operator) on min16float, min16int, and min16uint types at optimization levels O1+.

Resolves #8268

Root Cause

DXC's data layout pads min precision types (i16:32, f16:32). The HLSL change in DataLayout::getTypeSizeInBits makes vector sizes use alloc size per element (e.g., <3 x i16> = 96 bits), but scalar getTypeSizeInBits(i16) returns the primitive width (16 bits). This inconsistency causes three bugs:

GVN ICE: CanCoerceMustAliasedValueToLoad creates a padded-width integer (i96) then attempts a bitcast from the 48-bit LLVM vector type — assert fires.
GVN incorrect store forwarding: processLoad forwards a zeroinitializer store past partial element stores because MemoryDependence uses padded sizes for aliasing.
SROA element misindexing (primary bug): getNaturalGEPRecursively uses getTypeSizeInBits(i16)/8 = 2 for element offsets, while GEP uses getTypeAllocSize(i16) = 4. Byte offset 4 (element 1) maps to index 4/2 = 2 instead of 4/4 = 1, causing SROA to misplace or eliminate element stores.

Changes

lib/Transforms/Scalar/GVN.cpp

CanCoerceMustAliasedValueToLoad: Reject coercion when type sizes include padding (getTypeSizeInBits != getPrimitiveSizeInBits)
processLoad StoreInst handler: Skip store-to-load forwarding for padded types

lib/Transforms/Scalar/SROA.cpp

getNaturalGEPRecursively: Use getTypeAllocSizeInBits for vector element size to match GEP offset stride
isVectorPromotionViable: Same fix for element size calculation
AllocaSliceRewriter constructor: Same fix for ElementSize

Testing

All 6 min precision ArrayOperator tests (StaticAccess + DynamicAccess × 3 types) pass with the fix
Verified optimized DXIL output retains all 3 element stores with correct indices

Co-authored

This fix was investigated and implemented with the assistance of GitHub Copilot (AI pair programming). The root cause analysis — tracing the bug through -print-after-all pass dumps, identifying SROA as the culprit, and understanding the getTypeSizeInBits vs getTypeAllocSize mismatch — was a collaborative effort.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Multiple optimization passes mishandle min precision vector types due to DXC's padded data layout (i16:32, f16:32), where getTypeSizeInBits returns padded sizes for vectors but primitive sizes for scalars. Bug 1 - GVN ICE: CanCoerceMustAliasedValueToLoad computes a padded integer type (e.g., i96 for <3 x half>) then attempts to bitcast from the 48-bit LLVM type, triggering an assert. Fix: reject coercion when type sizes include padding. Bug 2 - GVN incorrect store forwarding: processLoad forwards a 'store <3 x i16> zeroinitializer' past partial element stores because MemoryDependence uses padded sizes for aliasing. Fix: skip store-to-load forwarding for padded types. Bug 3 - SROA element misindexing: getNaturalGEPRecursively uses getTypeSizeInBits (2 bytes for i16) for element offsets while GEP uses getTypeAllocSize (4 bytes with i16:32). Byte offset 4 (element 1) maps to index 4/2=2 instead of 4/4=1, causing SROA to misplace or eliminate element stores. Fix: use getTypeAllocSizeInBits consistently for vector element sizes throughout SROA. Fixes microsoft#8268 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-03-14T06:50:07Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…thub.com/alsepkow/DirectXShaderCompiler into user/alsepkow/fix-min-precision-opt-bugs

Add lit tests that verify GVN and SROA handle DXC's padded min precision types (i16:32, f16:32) correctly. GVN tests (test/Transforms/GVN/min-precision-padding.ll): - Verify store-to-load forwarding is blocked for padded vector types - Verify type coercion is rejected for padded types - Cover <3 x i16>, <3 x half>, <5 x i16>, <8 x half> SROA tests (test/Transforms/SROA/min-precision-padding.ll): - Verify GEP element indices use alloc size (4 bytes) not prim size (2 bytes) - Verify all element stores survive with correct indices - Cover <3 x i16>, <3 x half>, <5 x i16>, <8 x half> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The processLoad padding guard was unconditionally blocking store-to-load forwarding for all padded vector types, including trivially correct same-type forwarding. Narrow the check to only fire when stored and loaded types differ, preserving defense-in-depth for cross-type forwarding while allowing valid same-type optimizations for i16/half vectors. Also fix the comment to accurately describe the root cause: BasicAA returns MustAlias at offset 0 regardless of access sizes, causing partial element stores to appear as Defs to MemoryDependence. Add test cases verifying same-type forwarding still works for padded types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Fixes optimizer miscompilations affecting min-precision vector element ([]) access under padded HLSL data layouts by aligning GVN/SROA behavior with allocation-size-based strides and preventing unsafe GVN coercions/forwarding.

Changes:

Update SROA vector element stride calculations to use allocation size in bits (matching GEP byte offset stride under padded layouts).
Harden GVN against coercion and cross-type store-to-load forwarding when data layout padding would make the transformation invalid.
Add new regression tests covering SROA and GVN behavior for padded i16/half vectors, including partial element stores.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
test/Transforms/SROA/min-precision-padding.ll	Adds SROA regression coverage ensuring vector element GEP indexing matches padded element stride.
test/Transforms/GVN/min-precision-padding.ll	Adds GVN regression coverage preventing unsafe coercion/forwarding with padded min-precision vectors while preserving same-type forwarding.
lib/Transforms/Scalar/SROA.cpp	Switches vector element sizing to alloc-size-in-bits in multiple SROA decision points.
lib/Transforms/Scalar/GVN.cpp	Rejects coercion and adds forwarding guardrails when type size implies padding-related hazards.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/Transforms/Scalar/GVN.cpp

lib/Transforms/Scalar/SROA.cpp

V-FEXrt

Overall LGTM but this part of the code base (min precision) is notorious and I would really like someone whose worked on it to review. Otherwise a couple smaller comment

also, I've notice that code generated by LLMs seem to way over comment compared to code a human would write. A lot of these comments are fairly repetitive (I think I've seen "the issue is 2/1 is 2 not 1" at least 4 times in the PR). Not sure what we should do about it but in general I prefer for "why not how" type comments and LLMs seem to create a lot of "how" comments (I assume for the benefit of the human-in-the-loop?) Maybe we should have a editorial policy here if its a common trope others notice

lib/Transforms/Scalar/SROA.cpp

test/Transforms/GVN/min-precision-padding.ll

test/Transforms/SROA/min-precision-padding.ll

damyanp

This LGTM, and I did some poking to see if these changes might impact non-min-precision types and convinced myself it wouldn't.

We should get an approval from someone more familiar with this area of the code though.

- GVN CanCoerceMustAliasedValueToLoad: clarify comment to describe the vector padding mismatch (not scalar). - SROA AllocaSliceRewriter: add missing // HLSL Change marker on assert. - GVN test: make coercion rejection check more specific (CHECK: load i96, CHECK-NOT: bitcast). - SROA test: trim verbose header comment to focus on what, not how. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

tex3d

I think this looks good. I do wonder if there are other places with a similar assumption (that vector width is dependent on element size, not element alloc size), but this certainly seems to fix things without regressing anything as far as I can tell.

…#8260) This PR extends the SM 6.9 long vector execution tests to cover HLSL min precision types (min16float, min16int, min16uint). These types are always available — `D3D12_SHADER_MIN_PRECISION_SUPPORT` only reports whether hardware actually uses reduced precision, not whether the types compile — so no device capability check is needed and the tests live in the existing `DxilConf_SM69_Vectorized_Core` class alongside other types. Note: I wasn't able to find any existing min precision HLK tests. Unclear if we have coverage. ## Key design decisions **Full-precision buffer I/O:** Min precision types have implementation-defined buffer storage width, so we use full-precision types (`float`/`int`/`uint`) for all Load/Store operations via the `IO_TYPE`/`IO_OUT_TYPE` shader defines, with explicit casts to/from the min precision compute type. This ensures deterministic data layout regardless of the device implementation. **Half-precision tolerances:** Validation compares results in fp16 space using HLSLHalf_t ULP tolerances. Since min precision guarantees at least 16-bit, fp16 tolerances are a correct upper bound — devices computing at higher precision will produce more accurate results, not less. **Test coverage mirrors existing patterns:** - min16float mirrors HLSLHalf_t (float/trig/math/comparison/dot/cast/derivative/wave/quad/load-store) - min16int mirrors int16_t (arithmetic/bitwise/comparison/reduction/cast/wave/quad/load-store) - min16uint mirrors uint16_t (arithmetic/bitwise/comparison/cast/wave/quad/load-store) **Wave and quad op support:** Wave ops (WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First, WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) are tested for all three min precision types, mirroring the ops supported by their 16-bit equivalents. The wave op shader helpers use `#ifdef MIN_PRECISION` guards to store results via `IO_OUT_TYPE` for deterministic buffer layout without changing DXIL for existing non-min-precision tests. **Excluded operations:** - Signed div/mod on min16int: HLSL does not support signed integer division on min precision types - Bit shifting on min16int/min16uint: Not supported for min precision types - FP specials (INF/NaN/denorm): min precision types do not support them Resolves #7780 All tests require the rawBufferVectorLoad/Store fix from : #8274 The array accessor and wave/quad op tests for min precision require the optimizer fix from: #8269 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

microsoft#8269) ## Summary Fixes three related optimizer bugs that cause miscompilation of min precision vector element access (`[]` operator) on `min16float`, `min16int`, and `min16uint` types at optimization levels O1+. Resolves microsoft#8268 ## Root Cause DXC's data layout pads min precision types (`i16:32`, `f16:32`). The HLSL change in `DataLayout::getTypeSizeInBits` makes vector sizes use alloc size per element (e.g., `<3 x i16>` = 96 bits), but scalar `getTypeSizeInBits(i16)` returns the primitive width (16 bits). This inconsistency causes three bugs: 1. **GVN ICE**: `CanCoerceMustAliasedValueToLoad` creates a padded-width integer (i96) then attempts a bitcast from the 48-bit LLVM vector type — assert fires. 2. **GVN incorrect store forwarding**: `processLoad` forwards a zeroinitializer store past partial element stores because MemoryDependence uses padded sizes for aliasing. 3. **SROA element misindexing** (primary bug): `getNaturalGEPRecursively` uses `getTypeSizeInBits(i16)/8 = 2` for element offsets, while GEP uses `getTypeAllocSize(i16) = 4`. Byte offset 4 (element 1) maps to index `4/2 = 2` instead of `4/4 = 1`, causing SROA to misplace or eliminate element stores. ## Changes **`lib/Transforms/Scalar/GVN.cpp`** - `CanCoerceMustAliasedValueToLoad`: Reject coercion when type sizes include padding (`getTypeSizeInBits != getPrimitiveSizeInBits`) - `processLoad` StoreInst handler: Skip store-to-load forwarding for padded types **`lib/Transforms/Scalar/SROA.cpp`** - `getNaturalGEPRecursively`: Use `getTypeAllocSizeInBits` for vector element size to match GEP offset stride - `isVectorPromotionViable`: Same fix for element size calculation - `AllocaSliceRewriter` constructor: Same fix for `ElementSize` ## Testing - All 6 min precision ArrayOperator tests (StaticAccess + DynamicAccess × 3 types) pass with the fix - Verified optimized DXIL output retains all 3 element stores with correct indices ## Co-authored This fix was investigated and implemented with the assistance of GitHub Copilot (AI pair programming). The root cause analysis — tracing the bug through `-print-after-all` pass dumps, identifying SROA as the culprit, and understanding the `getTypeSizeInBits` vs `getTypeAllocSize` mismatch — was a collaborative effort. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…microsoft#8260) This PR extends the SM 6.9 long vector execution tests to cover HLSL min precision types (min16float, min16int, min16uint). These types are always available — `D3D12_SHADER_MIN_PRECISION_SUPPORT` only reports whether hardware actually uses reduced precision, not whether the types compile — so no device capability check is needed and the tests live in the existing `DxilConf_SM69_Vectorized_Core` class alongside other types. Note: I wasn't able to find any existing min precision HLK tests. Unclear if we have coverage. ## Key design decisions **Full-precision buffer I/O:** Min precision types have implementation-defined buffer storage width, so we use full-precision types (`float`/`int`/`uint`) for all Load/Store operations via the `IO_TYPE`/`IO_OUT_TYPE` shader defines, with explicit casts to/from the min precision compute type. This ensures deterministic data layout regardless of the device implementation. **Half-precision tolerances:** Validation compares results in fp16 space using HLSLHalf_t ULP tolerances. Since min precision guarantees at least 16-bit, fp16 tolerances are a correct upper bound — devices computing at higher precision will produce more accurate results, not less. **Test coverage mirrors existing patterns:** - min16float mirrors HLSLHalf_t (float/trig/math/comparison/dot/cast/derivative/wave/quad/load-store) - min16int mirrors int16_t (arithmetic/bitwise/comparison/reduction/cast/wave/quad/load-store) - min16uint mirrors uint16_t (arithmetic/bitwise/comparison/cast/wave/quad/load-store) **Wave and quad op support:** Wave ops (WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First, WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) are tested for all three min precision types, mirroring the ops supported by their 16-bit equivalents. The wave op shader helpers use `#ifdef MIN_PRECISION` guards to store results via `IO_OUT_TYPE` for deterministic buffer layout without changing DXIL for existing non-min-precision tests. **Excluded operations:** - Signed div/mod on min16int: HLSL does not support signed integer division on min precision types - Bit shifting on min16int/min16uint: Not supported for min precision types - FP specials (INF/NaN/denorm): min precision types do not support them Resolves microsoft#7780 All tests require the rawBufferVectorLoad/Store fix from : microsoft#8274 The array accessor and wave/quad op tests for min precision require the optimizer fix from: microsoft#8269 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

github-project-automation bot added this to HLSL Roadmap Mar 14, 2026

github-project-automation bot moved this to New in HLSL Roadmap Mar 14, 2026

github-actions bot and others added 4 commits March 14, 2026 06:53

chore: autopublish 2026-03-14T06:53:16Z

70294fa

Apply clang-format to SROA.cpp HLSL Change comments

e48fba7

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'user/alsepkow/fix-min-precision-opt-bugs' of https://gi…

c0aadc6

…thub.com/alsepkow/DirectXShaderCompiler into user/alsepkow/fix-min-precision-opt-bugs

alsepkow marked this pull request as ready for review March 16, 2026 23:14

alsepkow mentioned this pull request Mar 16, 2026

Execution Tests: Add min precision test cases to the long vector test #8260

Merged

alsepkow mentioned this pull request Mar 19, 2026

GVN and SROA miscompile min precision vector element access #8268

Closed

damyanp requested a review from Copilot March 25, 2026 23:19

Copilot started reviewing on behalf of damyanp March 25, 2026 23:25 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

lib/Transforms/Scalar/GVN.cpp Outdated Show resolved Hide resolved

lib/Transforms/Scalar/GVN.cpp Show resolved Hide resolved

lib/Transforms/Scalar/SROA.cpp Show resolved Hide resolved

lib/Transforms/Scalar/SROA.cpp Outdated Show resolved Hide resolved

V-FEXrt reviewed Mar 25, 2026

View reviewed changes

lib/Transforms/Scalar/SROA.cpp Outdated Show resolved Hide resolved

test/Transforms/GVN/min-precision-padding.ll Outdated Show resolved Hide resolved

test/Transforms/SROA/min-precision-padding.ll Outdated Show resolved Hide resolved

damyanp reviewed Mar 26, 2026

View reviewed changes

alsepkow and others added 2 commits March 30, 2026 17:07

Trim verbose comments to focus on why, not what

5b5d05c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

alsepkow assigned tex3d Mar 31, 2026

tex3d approved these changes Apr 1, 2026

View reviewed changes

alsepkow merged commit ccba9f8 into microsoft:main Apr 1, 2026
12 checks passed

github-project-automation bot moved this from New to Done in HLSL Roadmap Apr 1, 2026

alsepkow mentioned this pull request Apr 14, 2026

[Release SM 6.9] Cherry-Pick Fix GVN and SROA miscompilation of min precision vector element access #8370

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GVN and SROA miscompilation of min precision vector element access#8269

Fix GVN and SROA miscompilation of min precision vector element access#8269
alsepkow merged 8 commits intomicrosoft:mainfrom
alsepkow:user/alsepkow/fix-min-precision-opt-bugs

alsepkow commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

V-FEXrt left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

damyanp left a comment

Uh oh!

tex3d left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

alsepkow commented Mar 14, 2026

Summary

Root Cause

Changes

Testing

Co-authored

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

V-FEXrt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

damyanp left a comment

Choose a reason for hiding this comment

Uh oh!

tex3d left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Mar 14, 2026 •

edited

Loading