Skip to content

[release/8.0-staging] Add an in-memory cache for CRLs on Linux#127626

Open
bartonjs wants to merge 4 commits intodotnet:release/8.0-stagingfrom
bartonjs:crl_cache_80
Open

[release/8.0-staging] Add an in-memory cache for CRLs on Linux#127626
bartonjs wants to merge 4 commits intodotnet:release/8.0-stagingfrom
bartonjs:crl_cache_80

Conversation

@bartonjs
Copy link
Copy Markdown
Member

Manual backport/cherry-pick of #123562 to release/8.0-staging

Customer Impact

  • Customer reported
  • Found internally

Customers have been reporting "memory leaks" on Linux related to CRL processing for quite a while. These "leaks" aren't actual leaks, but an interaction with how OpenSSL processes CRLs (using many small calls to malloc), and glibc memory arenas and small-allocation caching -- glibc holds onto the small allocs from free so it can hand them out again later.

Because we handle CRLs by loading them, checking them, and discarding them, a process that does a lot of revocation checks will end up checking the CRL on every thread, and thus can end up with large "reserved" memory for their process. As the size of the CRL goes up, the number of threads goes up, and memory limits come down (e.g. Kubernetes) the reserved memory becomes more of a potential problem.

This change (originally introduced for 11 preview 3) changes the CRL processing to use an in-memory bounded MRU cache with GC cooperation. So, a process that repeatedly hits the same endpoints over and over (or even multiple endpoints from the same CA+CRL) ideally only ever has to load the CRL once (unless it expires). Since it isn't freed while still in use, it doesn't contribute to small-allocation accumulation.

Regression

  • Yes
  • No

Testing

As with the PR into main, most of the tests are existing tests. A new test is included to show cross-process disk cache recovery.

Risk

Medium-Low. The MRU cache isn't just a drop-in layering piece, so the volume of code carries inherent risk. The risk is largely mitigated by a large amount of coverage from unit tests, and manual stress tests against the feature when it was written in main (cycling through a few hundred HTTPS hosts randomly across several threads while doing a large amount of background memory allocation/deallocation).

Users experiencing the high RES memory problem on Linux have reported that .NET 11 Preview 3 ameliorated the problem. Otherwise, no feedback has been received regarding the change (implying that it has not caused a problem for anyone).

bartonjs and others added 2 commits April 30, 2026 14:50
Introduce an extra layer of caching for CRLs.

* The cache has a fixed size of 30 elements. When full, it evicts the
least-recently-used entry.
* Using the same finalizable object sentinel approach as ArrayPool, the
cache will purge entries every time the GC finalizes.
* During a finalize, the current MRU node is marked as what to purge
next time.
* Using that node moves the purge target to the next-older entry before
the node is promoted back to MRU.
* On the subsequent finalize, the marked node (and everything after it)
are purged.
* To avoid finalizing the CRL SafeHandles, the cache does an
AddReference on every item that is returned (so the caller must Release
it), and it calls Dispose on anything it evicts.
* GC/Finalization-triggered cooperative eviction is not performed until
the probe object is promoted to the final GC generation (currently Gen2)

---------

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 30, 2026 21:54
@bartonjs bartonjs self-assigned this Apr 30, 2026
@bartonjs bartonjs added Servicing-consider Issue for next servicing release review area-System.Security labels Apr 30, 2026
@bartonjs bartonjs added this to the 8.0.x milestone Apr 30, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @bartonjs, @vcsjones, @dotnet/area-system-security
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Backports the OpenSSL CRL processing changes to reduce Linux “high RES memory” behavior by adding a bounded in-memory MRU cache (with GC-cooperative pruning) on top of the existing on-disk CRL cache, plus additional diagnostics and a new Unix test covering disk-cache recovery.

Changes:

  • Add an in-memory MRU cache layer for CRL handles and integrate it into CRL attach/load flow (memory hit/expired/miss → disk cache → download).
  • Extend OpenSslX509ChainEventSource with new verbose events for in-memory cache behavior and clarify existing disk-cache event messages.
  • Add a Unix outerloop test that truncates a persisted CRL file and validates cross-process recovery.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
src/libraries/System.Security.Cryptography/tests/X509Certificates/X509FilesystemTests.Unix.cs Adds an outerloop test for CRL disk-cache recovery and an EventListener helper to observe the cache filename.
src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/X509Certificates/OpenSslX509ChainEventSource.cs Adds new in-memory CRL cache ETW events and clarifies disk-cache messages.
src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/X509Certificates/OpenSslCrlCache.cs Introduces MRU in-memory cache for CRLs and refactors disk-cache/download paths to feed the cache.

@@ -174,28 +262,16 @@ private static bool AddCachedCrlCore(string crlFile, SafeX509StoreHandle store,
OpenSslX509ChainEventSource.Log.CrlCacheExpired(nextUpdate, verificationTime);
Comment on lines +325 to +350
try
{
string crlFile = GetCachedCrlPath(crlFileName, mkDir: true);

Interop.Crypto.ErrClearError();
}
using (SafeBioHandle bio = Interop.Crypto.BioNewFile(crlFile, "wb"))
{
if (bio.IsInvalid || Interop.Crypto.PemWriteBioX509Crl(bio, crl) == 0)
{
// No bio, or write failed

if (OpenSslX509ChainEventSource.Log.IsEnabled())
{
OpenSslX509ChainEventSource.Log.CrlCacheWriteFailed(crlFile);
}
}
catch (UnauthorizedAccessException) { }
catch (IOException) { }

if (OpenSslX509ChainEventSource.Log.IsEnabled())
{
OpenSslX509ChainEventSource.Log.CrlCacheWriteSucceeded();
Interop.Crypto.ErrClearError();
}
}
}
catch (UnauthorizedAccessException) { }
catch (IOException) { }

if (OpenSslX509ChainEventSource.Log.IsEnabled())
{
OpenSslX509ChainEventSource.Log.CrlCacheWriteSucceeded();
}
Comment on lines +88 to +103
[OuterLoop]
[ConditionalFact(typeof(RemoteExecutor), nameof(RemoteExecutor.IsSupported))]
public static async Task CrlDiskCacheRecovers()
{
using X509Certificate2 getDotNetCert = await GetGetDotNetCert();
string crlFileName;

using (CrlCacheNameFinderEventListener listener = new(getDotNetCert.Subject))
using (CancellationTokenSource tokenSource = new CancellationTokenSource(TimeSpan.FromSeconds(10)))
using (ChainHolder chainHolder = new ChainHolder())
{
Task<string> nameTask = listener.GetCacheFileNameAsync(tokenSource.Token);

_ = chainHolder.Chain.Build(getDotNetCert);
crlFileName = await nameTask.ConfigureAwait(false);
}
Comment on lines +136 to +142
SslOptions =
{
RemoteCertificateValidationCallback = (sender, certificate, chain, errors) =>
{
getDotNetCert = (X509Certificate2)certificate;
return errors == SslPolicyErrors.None;
}
Comment on lines +768 to +790
protected override void OnEventWritten(EventWrittenEventArgs eventData)
{
if (eventData.EventName == "CrlIdentifiersDetermined")
{
if (eventData.Payload?.Count == 3)
{
if (eventData.Payload[0] is string certName &&
certName == _certificateName &&
eventData.Payload[1] is string cdp &&
eventData.Payload[2] is string cacheName)
{
_cacheName = cacheName;
}
}
}
}

internal async Task<string> GetCacheFileNameAsync(CancellationToken cancellationToken)
{
while (_cacheName == null)
{
await Task.Delay(100, cancellationToken).ConfigureAwait(false);
}
Copilot AI review requested due to automatic review settings May 1, 2026 02:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Backports the OpenSSL (Linux) CRL handling change to introduce a bounded in-memory MRU cache (with GC-cooperative pruning) layered above the existing disk cache, reducing repeated CRL loads and mitigating high “reserved” memory growth caused by OpenSSL+glibc small-allocation behavior during revocation checking.

Changes:

  • Add an in-memory MRU cache for CRLs in OpenSslCrlCache, with GC-triggered pruning and new diagnostic events.
  • Update OpenSslX509ChainEventSource messages/events to distinguish disk cache activity and report in-memory cache behavior.
  • Add an OuterLoop cross-process test validating recovery when the on-disk CRL cache file becomes corrupted/truncated.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/libraries/System.Security.Cryptography/tests/X509Certificates/X509FilesystemTests.Unix.cs Adds an OuterLoop test that corrupts the CRL disk cache and validates a separate process repairs it.
src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/X509Certificates/OpenSslX509ChainEventSource.cs Adds in-memory CRL cache ETW events and clarifies existing disk-cache event messages.
src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/X509Certificates/OpenSslCrlCache.cs Implements bounded in-memory MRU CRL cache, integrates it into chain-building flow, and adds GC-cooperative pruning.

Comment on lines +347 to +350
if (OpenSslX509ChainEventSource.Log.IsEnabled())
{
OpenSslX509ChainEventSource.Log.CrlCacheWriteSucceeded();
}
Comment on lines +785 to +790
internal async Task<string> GetCacheFileNameAsync(CancellationToken cancellationToken)
{
while (_cacheName == null)
{
await Task.Delay(100, cancellationToken).ConfigureAwait(false);
}
@@ -174,28 +262,16 @@ private static bool AddCachedCrlCore(string crlFile, SafeX509StoreHandle store,
OpenSslX509ChainEventSource.Log.CrlCacheExpired(nextUpdate, verificationTime);
Comment on lines +743 to +747
~GCWatcher()
{
GC.ReRegisterForFinalize(this);

if (GC.GetGeneration(this) == GC.MaxGeneration)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Security Servicing-consider Issue for next servicing release review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants