Skip to content

[release/9.0-staging] Add an in-memory cache for CRLs on Linux#127625

Open
bartonjs wants to merge 2 commits intodotnet:release/9.0-stagingfrom
bartonjs:crl_cache_90
Open

[release/9.0-staging] Add an in-memory cache for CRLs on Linux#127625
bartonjs wants to merge 2 commits intodotnet:release/9.0-stagingfrom
bartonjs:crl_cache_90

Conversation

@bartonjs
Copy link
Copy Markdown
Member

Manual backport/cherry-pick of #123562 to release/9.0-staging

Customer Impact

  • Customer reported
  • Found internally

Customers have been reporting "memory leaks" on Linux related to CRL processing for quite a while. These "leaks" aren't actual leaks, but an interaction with how OpenSSL processes CRLs (using many small calls to malloc), and glibc memory arenas and small-allocation caching -- glibc holds onto the small allocs from free so it can hand them out again later.

Because we handle CRLs by loading them, checking them, and discarding them, a process that does a lot of revocation checks will end up checking the CRL on every thread, and thus can end up with large "reserved" memory for their process. As the size of the CRL goes up, the number of threads goes up, and memory limits come down (e.g. Kubernetes) the reserved memory becomes more of a potential problem.

This change (originally introduced for 11 preview 3) changes the CRL processing to use an in-memory bounded MRU cache with GC cooperation. So, a process that repeatedly hits the same endpoints over and over (or even multiple endpoints from the same CA+CRL) ideally only ever has to load the CRL once (unless it expires). Since it isn't freed while still in use, it doesn't contribute to small-allocation accumulation.

Regression

  • Yes
  • No

Testing

As with the PR into main, most of the tests are existing tests. A new test is included to show cross-process disk cache recovery.

Risk

Medium-Low. The MRU cache isn't just a drop-in layering piece, so the volume of code carries inherent risk. The risk is largely mitigated by a large amount of coverage from unit tests, and manual stress tests against the feature when it was written in main (cycling through a few hundred HTTPS hosts randomly across several threads while doing a large amount of background memory allocation/deallocation).

Users experiencing the high RES memory problem on Linux have reported that .NET 11 Preview 3 ameliorated the problem. Otherwise, no feedback has been received regarding the change (implying that it has not caused a problem for anyone).

bartonjs and others added 2 commits April 30, 2026 12:38
Introduce an extra layer of caching for CRLs.

* The cache has a fixed size of 30 elements. When full, it evicts the
least-recently-used entry.
* Using the same finalizable object sentinel approach as ArrayPool, the
cache will purge entries every time the GC finalizes.
* During a finalize, the current MRU node is marked as what to purge
next time.
* Using that node moves the purge target to the next-older entry before
the node is promoted back to MRU.
* On the subsequent finalize, the marked node (and everything after it)
are purged.
* To avoid finalizing the CRL SafeHandles, the cache does an
AddReference on every item that is returned (so the caller must Release
it), and it calls Dispose on anything it evicts.
* GC/Finalization-triggered cooperative eviction is not performed until
the probe object is promoted to the final GC generation (currently Gen2)

---------

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@bartonjs bartonjs added this to the 9.0.x milestone Apr 30, 2026
@bartonjs bartonjs self-assigned this Apr 30, 2026
Copilot AI review requested due to automatic review settings April 30, 2026 21:53
@bartonjs bartonjs added area-System.Security Servicing-consider Issue for next servicing release review labels Apr 30, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @bartonjs, @vcsjones, @dotnet/area-system-security
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This backport introduces an in-memory MRU cache for OpenSSL CRL handling on Linux to reduce repeated CRL parsing/allocations and mitigate high resident memory growth from frequent revocation checks across threads.

Changes:

  • Added an in-memory bounded MRU cache layer for CRLs (with GC-cooperative pruning) and integrated it into the OpenSSL CRL load/attach flow.
  • Extended OpenSSL chain ETW diagnostics with new in-memory CRL cache events and clarified existing “disk cache” messages.
  • Added an OuterLoop Unix test that corrupts a persisted CRL file and validates that a subsequent process rebuild recovers the disk cache.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/libraries/System.Security.Cryptography/tests/X509Certificates/X509FilesystemTests.Unix.cs Adds a cross-process disk-cache recovery test and an ETW listener helper to discover the CRL cache filename.
src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/X509Certificates/OpenSslX509ChainEventSource.cs Adds ETW events for in-memory CRL cache hit/miss/expired/prune/full and updates disk-cache message text.
src/libraries/System.Security.Cryptography/src/System/Security/Cryptography/X509Certificates/OpenSslCrlCache.cs Implements and wires up the in-memory MRU CRL cache, refactors disk-cache load to return cached entries, and adds GC-driven pruning.

@@ -174,28 +262,16 @@ private static bool AddCachedCrlCore(string crlFile, SafeX509StoreHandle store,
OpenSslX509ChainEventSource.Log.CrlCacheExpired(nextUpdate, verificationTime);
Comment on lines +322 to +350
// Saving the CRL to the disk is just a performance optimization for later requests to not
// need to use the network again, so failure to save shouldn't throw an exception or mark
// the chain as invalid.
try
{
string crlFile = GetCachedCrlPath(crlFileName, mkDir: true);

Interop.Crypto.ErrClearError();
}
using (SafeBioHandle bio = Interop.Crypto.BioNewFile(crlFile, "wb"))
{
if (bio.IsInvalid || Interop.Crypto.PemWriteBioX509Crl(bio, crl) == 0)
{
// No bio, or write failed

if (OpenSslX509ChainEventSource.Log.IsEnabled())
{
OpenSslX509ChainEventSource.Log.CrlCacheWriteFailed(crlFile);
}
}
catch (UnauthorizedAccessException) { }
catch (IOException) { }

if (OpenSslX509ChainEventSource.Log.IsEnabled())
{
OpenSslX509ChainEventSource.Log.CrlCacheWriteSucceeded();
Interop.Crypto.ErrClearError();
}
}
}
catch (UnauthorizedAccessException) { }
catch (IOException) { }

if (OpenSslX509ChainEventSource.Log.IsEnabled())
{
OpenSslX509ChainEventSource.Log.CrlCacheWriteSucceeded();
}
{
RemoteCertificateValidationCallback = (sender, certificate, chain, errors) =>
{
getDotNetCert = (X509Certificate2)certificate;
Comment on lines +774 to +790
if (eventData.Payload[0] is string certName &&
certName == _certificateName &&
eventData.Payload[1] is string cdp &&
eventData.Payload[2] is string cacheName)
{
_cacheName = cacheName;
}
}
}
}

internal async Task<string> GetCacheFileNameAsync(CancellationToken cancellationToken)
{
while (_cacheName == null)
{
await Task.Delay(100, cancellationToken).ConfigureAwait(false);
}
Comment on lines +743 to +752
~GCWatcher()
{
GC.ReRegisterForFinalize(this);

if (GC.GetGeneration(this) == GC.MaxGeneration)
{
try
{
_owner.PruneForGC();
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Security Servicing-consider Issue for next servicing release review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants