Skip to content

Fix LoaderHeap's free list growing more than expected#129203

Open
eduardo-vp wants to merge 9 commits into
dotnet:mainfrom
eduardo-vp:xunit-reg-linux
Open

Fix LoaderHeap's free list growing more than expected#129203
eduardo-vp wants to merge 9 commits into
dotnet:mainfrom
eduardo-vp:xunit-reg-linux

Conversation

@eduardo-vp

@eduardo-vp eduardo-vp commented Jun 10, 2026

Copy link
Copy Markdown
Member

I was working with the test in https://github.com/korchak-aleksandr/net10-regression-repro and found out that the free list in UnlockedLoaderHeap grows to thousands of elements, which makes allocations very slow since we do a linear scan of this free list for each one of them.

In these scenarios multiple threads might need the same generic instantiation simultaneously and they all race to create/publish it. Multiple threads can lose the race and quickly add blocks to the free list since they don't need that memory. Subsequent calls that need generic instantiations do a linear scan of the free list to find a memory block to reuse. This ends up taking a lot of time due to its size (can be up to ten of thousands).

This PR stops making thread race to create/publish such that we don't insert several blocks in the free list.

.NET 10 .NET 10 + this PR
Time 91.2 s 3.9 s
Max RSS 114.1 MB 117.5 MB

Copilot AI review requested due to automatic review settings June 10, 2026 01:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures UnlockedLoaderHeap’s free list to reduce allocation-time overhead in backout-heavy scenarios by replacing a single linear-scanned free list with size-segregated buckets for common small block sizes plus an overflow list for larger blocks.

Changes:

  • Replaces m_pFirstFreeBlock with 32 size buckets (pointer-size increments) and a separate “large/overflow” free list.
  • Updates free-block insertion/allocation logic to use bucketed O(1) reuse for small sizes, and retains linear scanning only for the overflow list (including a stress-log warning on long scans).
  • Adjusts debug-only free-list dumping and validation to iterate across all buckets and the overflow list.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/coreclr/utilcode/loaderheap.cpp Implements bucket initialization, bucket-aware allocation/insertion, overflow scan warning, and updates debug dump/validation to traverse buckets.
src/coreclr/utilcode/loaderheap_shared.h Updates LoaderHeapFreeBlock API to no longer take an explicit head pointer (heap chooses bucket internally).
src/coreclr/inc/loaderheap.h Adds bucket/overflow free list fields and related constants to UnlockedLoaderHeap.

Comment thread src/coreclr/utilcode/loaderheap.cpp Outdated
Comment thread src/coreclr/utilcode/loaderheap.cpp Outdated
Comment thread src/coreclr/utilcode/loaderheap_shared.h Outdated
@jkotas

jkotas commented Jun 10, 2026

Copy link
Copy Markdown
Member

In these scenarios multiple threads might need the same generic instantiation simultaneously and they all race to create/publish it

Where is this race condition exactly? Can we add a lock there instead?

The freelist in LoaderHeap is meant to be only used in error conditions to backout types that failed to load, or to deal with rare race condition. If you see the freelist growing this much, it means that the loader heap is not used correctly. We should fix that instead.

@eduardo-vp

Copy link
Copy Markdown
Member Author

I'll take a look, this the part where many threads lose

InstantiatedMethodDesc *pOldMD = FindLoadedInstantiatedMethodDesc(pExactMT,
pGenericMDescInRepMT->GetMemberDef(),
methodInst,
getWrappedCode,
pGenericMDescInRepMT->IsAsyncVariantMethod());
if (pOldMD == NULL)
{
// No one else got there first, our MethodDesc wins.
amt.SuppressRelease();
#ifdef _DEBUG
SString name;
TypeString::AppendMethodDebug(name, pNewMD);
const char* pDebugNameUTF8 = name.GetUTF8();
const char* verb = "Created";
if (pWrappedMD)
LOG((LF_CLASSLOADER, LL_INFO1000,
"GENERICS: %s instantiating-stub method desc %s with dictionary size %d\n",
verb, pDebugNameUTF8, infoSize));
else
LOG((LF_CLASSLOADER, LL_INFO1000,
"GENERICS: %s instantiated method desc %s\n",
verb, pDebugNameUTF8));
S_SIZE_T safeLen = S_SIZE_T(strlen(pDebugNameUTF8))+S_SIZE_T(1);
if(safeLen.IsOverflow()) COMPlusThrowHR(COR_E_OVERFLOW);
size_t len = safeLen.Value();
pNewMD->m_pszDebugMethodName = (char*) (void*)pAllocator->GetLowFrequencyHeap()->AllocMem(safeLen);
_ASSERTE(pNewMD->m_pszDebugMethodName);
strcpy_s((char *) pNewMD->m_pszDebugMethodName, len, pDebugNameUTF8);
pNewMD->m_pszDebugClassName = pExactMT->GetDebugClassName();
pNewMD->m_pszDebugMethodSignature = (LPUTF8)pNewMD->m_pszDebugMethodName;
#endif // _DEBUG
// Generic methods can't be varargs. code:MethodTableBuilder::ValidateMethods should have checked it.
_ASSERTE(!pNewMD->IsVarArg());
// Verify that we are not creating redundant MethodDescs
_ASSERTE(!pNewMD->IsTightlyBoundToMethodTable());
// The method desc is fully set up; now add to the table
InstMethodHashTable* pTable = pExactMDLoaderModule->GetInstMethodHashTable();
pTable->InsertMethodDesc(pNewMD);
}
else
pNewMD = pOldMD;
// CrstHolder goes out of scope here
}

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/vm/genmeth.cpp
Comment thread src/coreclr/vm/genmeth.cpp
@eduardo-vp eduardo-vp changed the title Add size-segregated buckets in UnlockedLoaderHeap Fix LoaderHeap's free list growing more than expected Jun 11, 2026
@eduardo-vp eduardo-vp requested a review from Copilot June 11, 2026 00:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/vm/genmeth.cpp Outdated
Comment thread src/coreclr/vm/genmeth.cpp Outdated
Comment thread src/coreclr/vm/genmeth.cpp Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/vm/genmeth.cpp Outdated
Comment thread src/coreclr/vm/genmeth.cpp
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 11, 2026 02:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread src/coreclr/vm/genmeth.cpp
Comment thread src/coreclr/vm/genmeth.cpp Outdated
Comment thread src/coreclr/vm/genmeth.cpp
Comment thread src/coreclr/vm/genmeth.cpp
Copilot AI review requested due to automatic review settings June 15, 2026 21:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@jkotas

jkotas commented Jun 15, 2026

Copy link
Copy Markdown
Member

Could you please run a perf test that tries to create many different method instantiations on multiple threads, and see whether taking a lock around MethodDesc creation makes it measurably slower?

If we find that it is getting measurably slower, we may want to do something else about it - e.g. move more of the work outside the lock.

@eduardo-vp

Copy link
Copy Markdown
Member Author

I used a script to create a Gen.cs file and test the creation of 150k different method instantiations.

public static class W
{
    [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
    public static long M1<T>() => System.Runtime.CompilerServices.Unsafe.SizeOf<T>();
    [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
    public static long M2<T>() => System.Runtime.CompilerServices.Unsafe.SizeOf<T>();
    [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
    public static long M3<T>() => System.Runtime.CompilerServices.Unsafe.SizeOf<T>();
    [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
    public static long M4<T>() => System.Runtime.CompilerServices.Unsafe.SizeOf<T>();
    [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
    public static long M5<T>() => System.Runtime.CompilerServices.Unsafe.SizeOf<T>();
}

public struct S0 { }
public struct S1 { }
public struct S2 { }
public struct S3 { }
public struct S4 { }
// ...
public struct S29998 { }
public struct S29999 { }

public static class Gen
{
    public static readonly System.Action[] Actions = new System.Action[]
    {
        static () => { W.M1<S0>(); W.M2<S0>(); W.M3<S0>(); W.M4<S0>(); W.M5<S0>(); },
        static () => { W.M1<S1>(); W.M2<S1>(); W.M3<S1>(); W.M4<S1>(); W.M5<S1>(); },
        static () => { W.M1<S2>(); W.M2<S2>(); W.M3<S2>(); W.M4<S2>(); W.M5<S2>(); },
        static () => { W.M1<S3>(); W.M2<S3>(); W.M3<S3>(); W.M4<S3>(); W.M5<S3>(); },
        static () => { W.M1<S4>(); W.M2<S4>(); W.M3<S4>(); W.M4<S4>(); W.M5<S4>(); },
        // ...
        static () => { W.M1<S29998>(); W.M2<S29998>(); W.M3<S29998>(); W.M4<S29998>(); W.M5<S29998>(); },
        static () => { W.M1<S29999>(); W.M2<S29999>(); W.M3<S29999>(); W.M4<S29999>(); W.M5<S29999>(); },
    }
}

The main programs just runs a loop

Parallel.For(0, N, new ParallelOptions { MaxDegreeOfParallelism = Workers }, i => actions[i]());

net10 takes ~690 ms while .net10 + this change takes ~705 ms (around 2% slower in this extreme case). I think this should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants