GCRoot: Report GCFrame and in-flight exception objects#129145
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends cDAC stack reference enumeration to also report two additional GC root sources that were previously missed: (1) objects protected by the thread’s GCFrame (GCPROTECT) chain and (2) in-flight exception objects held on the thread’s ExInfo chain. It also adds new dump debuggees and dump tests to validate these root sources show up in IStackWalk.WalkStackReferences.
Changes:
- Add cDAC data contract support for
GCFrameand expose the thread’sm_pGCFramechain via the Thread contract. - Extend
StackWalk_1.WalkStackReferencesto report GCFrame roots and ExInfo (exception tracker) roots. - Add two new dump debuggees and corresponding dump tests covering GCFrame roots and nested in-flight exceptions.
Show a summary per file
| File | Description |
|---|---|
| src/native/managed/cdac/tests/DumpTests/StackReferenceDumpTests.cs | Adds new dump-based tests for ExInfo and GCFrame root reporting; GCFrame test needs to avoid potential false positives. |
| src/native/managed/cdac/tests/DumpTests/Debuggees/NestedException/Program.cs | New debuggee that crashes while nested exceptions are still in flight. |
| src/native/managed/cdac/tests/DumpTests/Debuggees/NestedException/NestedException.csproj | Registers the NestedException debuggee and requests Full dumps. |
| src/native/managed/cdac/tests/DumpTests/Debuggees/GCProtect/Program.cs | New debuggee that FailFasts inside an AssemblyResolve handler under a native GCPROTECT scope. |
| src/native/managed/cdac/tests/DumpTests/Debuggees/GCProtect/GCProtect.csproj | Registers the GCProtect debuggee and requests Full dumps. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/DataType.cs | Adds GCFrame to the public DataType enum (new public surface). |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Thread.cs | Adds an optional GCFrame pointer field to the managed Thread data contract. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/GCFrame.cs | New managed data contract type describing the native GCFrame layout. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/StackWalk_1.cs | Reports GCFrame-protected roots and ExInfo-tracked exception roots as stack references. |
| src/coreclr/vm/threads.h | Exposes Thread::m_pGCFrame offset via cdac_data<Thread>. |
| src/coreclr/vm/frames.h | Adds cdac_data<GCFrame> describing the native GCFrame layout. |
| src/coreclr/vm/datadescriptor/datadescriptor.inc | Adds Thread.GCFrame field and a new GCFrame CDAC type descriptor. |
Copilot's findings
- Files reviewed: 12/12 changed files
- Comments generated: 4
noahfalk
left a comment
There was a problem hiding this comment.
Hey Lee, this looks good! Probably my only major comment is whether we should do some contract level refactoring but curious to get a 2nd opinion from Max.
deb2f6e to
5c183f8
Compare
The StackWalk contract reads Thread.GCFrame indirectly via the Thread contract, whose transitive field closure is assumed; only the directly read GCFrame type fields need listing here. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rather than collapsing ExInfo roots to SOS_StackSourceFrame, append a new SOS_StackSourceExInfo value to the SOSStackSourceType enum (sospriv.idl + prebuilt header) and emit it from SOSDacImpl. Appending keeps existing numeric values stable for pre-existing CLRMD/SOS consumers while letting the new ExInfo source be reported distinctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address review feedback: the new roots are no longer all 'frame-sourced'. Describe Source/StackPointer as the node address and note SourceType distinguishes Frame (GCFrame) from the dedicated ExInfo source type. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…logging The runtime-side GC stress oracle scanned the GCFrame (GCPROTECT) and in-flight ExInfo chains without re-stamping currentFrameSource/ currentFrameSourceType, so those roots inherited the last frame walk's source (or 0), producing spurious per-frame mismatches against cDAC, which attributes GCFrame roots to the GCFrame node (Frame source type) and ExInfo roots to the ExInfo node (ExInfo source type). Stamp the node address and matching source type before each scan. Also fix ResolveMethodName: SOS_StackSourceExInfo (2) is now a real source type, so don't dereference an ExInfo node as a Frame* when logging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This PR added SOS_StackSourceExInfo and SOSDacImpl.GetStackReferences now forwards ExInfo roots as that value rather than collapsing them to SOS_StackSourceFrame; update the doc to match the implementation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Incorporate PR feedback by adding StackSourceOther
ab733bf to
3aed379
Compare
|
/ba-g Helix timeouts, I can't get those to complete. Also the osx tests are actually fixed by #129462 and unrelated to my changes. |
Threads' GCFrame objects were not being reported correctly as GCRoots. Similarly, an in-flight exception was also not being reported, though it typically has other roots it's just good to fix the gap.
This is my first real cdac change, so detailed feedback and nitpicks are very appreciated.