Skip to content

Fix GVN and SROA miscompilation of min precision vector element access#8269

Merged
alsepkow merged 8 commits intomicrosoft:mainfrom
alsepkow:user/alsepkow/fix-min-precision-opt-bugs
Apr 1, 2026
Merged

Fix GVN and SROA miscompilation of min precision vector element access#8269
alsepkow merged 8 commits intomicrosoft:mainfrom
alsepkow:user/alsepkow/fix-min-precision-opt-bugs

Conversation

@alsepkow
Copy link
Copy Markdown
Contributor

Summary

Fixes three related optimizer bugs that cause miscompilation of min precision vector element access ([] operator) on min16float, min16int, and min16uint types at optimization levels O1+.

Resolves #8268

Root Cause

DXC's data layout pads min precision types (i16:32, f16:32). The HLSL change in DataLayout::getTypeSizeInBits makes vector sizes use alloc size per element (e.g., <3 x i16> = 96 bits), but scalar getTypeSizeInBits(i16) returns the primitive width (16 bits). This inconsistency causes three bugs:

  1. GVN ICE: CanCoerceMustAliasedValueToLoad creates a padded-width integer (i96) then attempts a bitcast from the 48-bit LLVM vector type — assert fires.
  2. GVN incorrect store forwarding: processLoad forwards a zeroinitializer store past partial element stores because MemoryDependence uses padded sizes for aliasing.
  3. SROA element misindexing (primary bug): getNaturalGEPRecursively uses getTypeSizeInBits(i16)/8 = 2 for element offsets, while GEP uses getTypeAllocSize(i16) = 4. Byte offset 4 (element 1) maps to index 4/2 = 2 instead of 4/4 = 1, causing SROA to misplace or eliminate element stores.

Changes

lib/Transforms/Scalar/GVN.cpp

  • CanCoerceMustAliasedValueToLoad: Reject coercion when type sizes include padding (getTypeSizeInBits != getPrimitiveSizeInBits)
  • processLoad StoreInst handler: Skip store-to-load forwarding for padded types

lib/Transforms/Scalar/SROA.cpp

  • getNaturalGEPRecursively: Use getTypeAllocSizeInBits for vector element size to match GEP offset stride
  • isVectorPromotionViable: Same fix for element size calculation
  • AllocaSliceRewriter constructor: Same fix for ElementSize

Testing

  • All 6 min precision ArrayOperator tests (StaticAccess + DynamicAccess × 3 types) pass with the fix
  • Verified optimized DXIL output retains all 3 element stores with correct indices

Co-authored

This fix was investigated and implemented with the assistance of GitHub Copilot (AI pair programming). The root cause analysis — tracing the bug through -print-after-all pass dumps, identifying SROA as the culprit, and understanding the getTypeSizeInBits vs getTypeAllocSize mismatch — was a collaborative effort.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Multiple optimization passes mishandle min precision vector types due to
DXC's padded data layout (i16:32, f16:32), where getTypeSizeInBits
returns padded sizes for vectors but primitive sizes for scalars.

Bug 1 - GVN ICE: CanCoerceMustAliasedValueToLoad computes a padded
integer type (e.g., i96 for <3 x half>) then attempts to bitcast from
the 48-bit LLVM type, triggering an assert. Fix: reject coercion when
type sizes include padding.

Bug 2 - GVN incorrect store forwarding: processLoad forwards a 'store
<3 x i16> zeroinitializer' past partial element stores because
MemoryDependence uses padded sizes for aliasing. Fix: skip store-to-load
forwarding for padded types.

Bug 3 - SROA element misindexing: getNaturalGEPRecursively uses
getTypeSizeInBits (2 bytes for i16) for element offsets while GEP uses
getTypeAllocSize (4 bytes with i16:32). Byte offset 4 (element 1) maps
to index 4/2=2 instead of 4/4=1, causing SROA to misplace or eliminate
element stores. Fix: use getTypeAllocSizeInBits consistently for vector
element sizes throughout SROA.

Fixes microsoft#8268

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 14, 2026

✅ With the latest revision this PR passed the C/C++ code formatter.

github-actions bot and others added 4 commits March 14, 2026 06:53
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add lit tests that verify GVN and SROA handle DXC's padded min precision
types (i16:32, f16:32) correctly.

GVN tests (test/Transforms/GVN/min-precision-padding.ll):
- Verify store-to-load forwarding is blocked for padded vector types
- Verify type coercion is rejected for padded types
- Cover <3 x i16>, <3 x half>, <5 x i16>, <8 x half>

SROA tests (test/Transforms/SROA/min-precision-padding.ll):
- Verify GEP element indices use alloc size (4 bytes) not prim size (2 bytes)
- Verify all element stores survive with correct indices
- Cover <3 x i16>, <3 x half>, <5 x i16>, <8 x half>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The processLoad padding guard was unconditionally blocking store-to-load
forwarding for all padded vector types, including trivially correct same-type
forwarding. Narrow the check to only fire when stored and loaded types differ,
preserving defense-in-depth for cross-type forwarding while allowing valid
same-type optimizations for i16/half vectors.

Also fix the comment to accurately describe the root cause: BasicAA returns
MustAlias at offset 0 regardless of access sizes, causing partial element
stores to appear as Defs to MemoryDependence.

Add test cases verifying same-type forwarding still works for padded types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes optimizer miscompilations affecting min-precision vector element ([]) access under padded HLSL data layouts by aligning GVN/SROA behavior with allocation-size-based strides and preventing unsafe GVN coercions/forwarding.

Changes:

  • Update SROA vector element stride calculations to use allocation size in bits (matching GEP byte offset stride under padded layouts).
  • Harden GVN against coercion and cross-type store-to-load forwarding when data layout padding would make the transformation invalid.
  • Add new regression tests covering SROA and GVN behavior for padded i16/half vectors, including partial element stores.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
test/Transforms/SROA/min-precision-padding.ll Adds SROA regression coverage ensuring vector element GEP indexing matches padded element stride.
test/Transforms/GVN/min-precision-padding.ll Adds GVN regression coverage preventing unsafe coercion/forwarding with padded min-precision vectors while preserving same-type forwarding.
lib/Transforms/Scalar/SROA.cpp Switches vector element sizing to alloc-size-in-bits in multiple SROA decision points.
lib/Transforms/Scalar/GVN.cpp Rejects coercion and adds forwarding guardrails when type size implies padding-related hazards.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@V-FEXrt V-FEXrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM but this part of the code base (min precision) is notorious and I would really like someone whose worked on it to review. Otherwise a couple smaller comment

also, I've notice that code generated by LLMs seem to way over comment compared to code a human would write. A lot of these comments are fairly repetitive (I think I've seen "the issue is 2/1 is 2 not 1" at least 4 times in the PR). Not sure what we should do about it but in general I prefer for "why not how" type comments and LLMs seem to create a lot of "how" comments (I assume for the benefit of the human-in-the-loop?) Maybe we should have a editorial policy here if its a common trope others notice

Copy link
Copy Markdown
Member

@damyanp damyanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, and I did some poking to see if these changes might impact non-min-precision types and convinced myself it wouldn't.

We should get an approval from someone more familiar with this area of the code though.

alsepkow and others added 2 commits March 30, 2026 17:07
- GVN CanCoerceMustAliasedValueToLoad: clarify comment to describe the
  vector padding mismatch (not scalar).
- SROA AllocaSliceRewriter: add missing // HLSL Change marker on assert.
- GVN test: make coercion rejection check more specific (CHECK: load i96,
  CHECK-NOT: bitcast).
- SROA test: trim verbose header comment to focus on what, not how.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. I do wonder if there are other places with a similar assumption (that vector width is dependent on element size, not element alloc size), but this certainly seems to fix things without regressing anything as far as I can tell.

@alsepkow alsepkow merged commit ccba9f8 into microsoft:main Apr 1, 2026
12 checks passed
@github-project-automation github-project-automation bot moved this from New to Done in HLSL Roadmap Apr 1, 2026
alsepkow added a commit that referenced this pull request Apr 1, 2026
…#8260)

This PR extends the SM 6.9 long vector execution tests to cover HLSL min
precision types (min16float, min16int, min16uint). These types are
always available — `D3D12_SHADER_MIN_PRECISION_SUPPORT` only reports
whether hardware actually uses reduced precision, not whether the types
compile — so no device capability check is needed and the tests live in
the existing `DxilConf_SM69_Vectorized_Core` class alongside other
types.

Note: I wasn't able to find any existing min precision HLK tests.
Unclear if we have coverage.

## Key design decisions

**Full-precision buffer I/O:** Min precision types have
implementation-defined buffer storage width, so we use full-precision
types (`float`/`int`/`uint`) for all Load/Store operations via the
`IO_TYPE`/`IO_OUT_TYPE` shader defines, with explicit casts to/from the
min precision compute type. This ensures deterministic data layout
regardless of the device implementation.

**Half-precision tolerances:** Validation compares results in fp16 space
using HLSLHalf_t ULP tolerances. Since min precision guarantees at least
16-bit, fp16 tolerances are a correct upper bound — devices computing at
higher precision will produce more accurate results, not less.

**Test coverage mirrors existing patterns:**
- min16float mirrors HLSLHalf_t
(float/trig/math/comparison/dot/cast/derivative/wave/quad/load-store)
- min16int mirrors int16_t
(arithmetic/bitwise/comparison/reduction/cast/wave/quad/load-store)
- min16uint mirrors uint16_t
(arithmetic/bitwise/comparison/cast/wave/quad/load-store)

**Wave and quad op support:** Wave ops
(WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First,
WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt,
QuadReadAcrossX/Y/Diagonal) are tested for all three min precision
types, mirroring the ops supported by their 16-bit equivalents. The wave
op shader helpers use `#ifdef MIN_PRECISION` guards to store results via
`IO_OUT_TYPE` for deterministic buffer layout without changing DXIL for
existing non-min-precision tests.

**Excluded operations:**
- Signed div/mod on min16int: HLSL does not support signed integer
division on min precision types
- Bit shifting on min16int/min16uint: Not supported for min precision
types
- FP specials (INF/NaN/denorm): min precision types do not support them

Resolves #7780

All tests require the rawBufferVectorLoad/Store fix from :
#8274
The array accessor and wave/quad op tests for min precision require the
optimizer fix from:
#8269


Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
alsepkow added a commit to alsepkow/DirectXShaderCompiler that referenced this pull request Apr 14, 2026
microsoft#8269)

## Summary

Fixes three related optimizer bugs that cause miscompilation of min
precision vector element access (`[]` operator) on `min16float`,
`min16int`, and `min16uint` types at optimization levels O1+.

Resolves microsoft#8268

## Root Cause

DXC's data layout pads min precision types (`i16:32`, `f16:32`). The
HLSL change in `DataLayout::getTypeSizeInBits` makes vector sizes use
alloc size per element (e.g., `<3 x i16>` = 96 bits), but scalar
`getTypeSizeInBits(i16)` returns the primitive width (16 bits). This
inconsistency causes three bugs:

1. **GVN ICE**: `CanCoerceMustAliasedValueToLoad` creates a padded-width
integer (i96) then attempts a bitcast from the 48-bit LLVM vector type —
assert fires.
2. **GVN incorrect store forwarding**: `processLoad` forwards a
zeroinitializer store past partial element stores because
MemoryDependence uses padded sizes for aliasing.
3. **SROA element misindexing** (primary bug):
`getNaturalGEPRecursively` uses `getTypeSizeInBits(i16)/8 = 2` for
element offsets, while GEP uses `getTypeAllocSize(i16) = 4`. Byte offset
4 (element 1) maps to index `4/2 = 2` instead of `4/4 = 1`, causing SROA
to misplace or eliminate element stores.

## Changes

**`lib/Transforms/Scalar/GVN.cpp`**
- `CanCoerceMustAliasedValueToLoad`: Reject coercion when type sizes
include padding (`getTypeSizeInBits != getPrimitiveSizeInBits`)
- `processLoad` StoreInst handler: Skip store-to-load forwarding for
padded types

**`lib/Transforms/Scalar/SROA.cpp`**
- `getNaturalGEPRecursively`: Use `getTypeAllocSizeInBits` for vector
element size to match GEP offset stride
- `isVectorPromotionViable`: Same fix for element size calculation
- `AllocaSliceRewriter` constructor: Same fix for `ElementSize`

## Testing

- All 6 min precision ArrayOperator tests (StaticAccess + DynamicAccess
× 3 types) pass with the fix
- Verified optimized DXIL output retains all 3 element stores with
correct indices

## Co-authored

This fix was investigated and implemented with the assistance of GitHub
Copilot (AI pair programming). The root cause analysis — tracing the bug
through `-print-after-all` pass dumps, identifying SROA as the culprit,
and understanding the `getTypeSizeInBits` vs `getTypeAllocSize` mismatch
— was a collaborative effort.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
alsepkow added a commit to alsepkow/DirectXShaderCompiler that referenced this pull request Apr 14, 2026
…microsoft#8260)

This PR extends the SM 6.9 long vector execution tests to cover HLSL min
precision types (min16float, min16int, min16uint). These types are
always available — `D3D12_SHADER_MIN_PRECISION_SUPPORT` only reports
whether hardware actually uses reduced precision, not whether the types
compile — so no device capability check is needed and the tests live in
the existing `DxilConf_SM69_Vectorized_Core` class alongside other
types.

Note: I wasn't able to find any existing min precision HLK tests.
Unclear if we have coverage.

## Key design decisions

**Full-precision buffer I/O:** Min precision types have
implementation-defined buffer storage width, so we use full-precision
types (`float`/`int`/`uint`) for all Load/Store operations via the
`IO_TYPE`/`IO_OUT_TYPE` shader defines, with explicit casts to/from the
min precision compute type. This ensures deterministic data layout
regardless of the device implementation.

**Half-precision tolerances:** Validation compares results in fp16 space
using HLSLHalf_t ULP tolerances. Since min precision guarantees at least
16-bit, fp16 tolerances are a correct upper bound — devices computing at
higher precision will produce more accurate results, not less.

**Test coverage mirrors existing patterns:**
- min16float mirrors HLSLHalf_t
(float/trig/math/comparison/dot/cast/derivative/wave/quad/load-store)
- min16int mirrors int16_t
(arithmetic/bitwise/comparison/reduction/cast/wave/quad/load-store)
- min16uint mirrors uint16_t
(arithmetic/bitwise/comparison/cast/wave/quad/load-store)

**Wave and quad op support:** Wave ops
(WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First,
WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt,
QuadReadAcrossX/Y/Diagonal) are tested for all three min precision
types, mirroring the ops supported by their 16-bit equivalents. The wave
op shader helpers use `#ifdef MIN_PRECISION` guards to store results via
`IO_OUT_TYPE` for deterministic buffer layout without changing DXIL for
existing non-min-precision tests.

**Excluded operations:**
- Signed div/mod on min16int: HLSL does not support signed integer
division on min precision types
- Bit shifting on min16int/min16uint: Not supported for min precision
types
- FP specials (INF/NaN/denorm): min precision types do not support them

Resolves microsoft#7780

All tests require the rawBufferVectorLoad/Store fix from :
microsoft#8274
The array accessor and wave/quad op tests for min precision require the
optimizer fix from:
microsoft#8269


Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

GVN and SROA miscompile min precision vector element access

5 participants