Skip to content

Restart async workload after GlobalCleanup re-entry (MemoryRandomization)#3103

Open
DrewScoggins wants to merge 1 commit intodotnet:masterfrom
DrewScoggins:fix-async-workload-restart-with-memory-randomization
Open

Restart async workload after GlobalCleanup re-entry (MemoryRandomization)#3103
DrewScoggins wants to merge 1 commit intodotnet:masterfrom
DrewScoggins:fix-async-workload-restart-with-memory-randomization

Conversation

@DrewScoggins
Copy link
Copy Markdown
Member

When MemoryRandomization=true, Engine.RunCore calls RandomizeManagedHeapMemory between iterations, which invokes GlobalCleanupAction followed by GlobalSetupAction. The generated __GlobalCleanup previously called
workloadContinuerAndValueTaskSource?.Complete() to signal the async workload loop to exit, but left the field non-null in a terminal state. The next iteration's WorkloadActionNoUnroll then called Continue() on the same source, which calls SetResult(false) on a ManualResetValueTaskSourceCore that had already been signaled, throwing InvalidOperationException from SignalCompletion. This broke every async benchmark that opted into memory randomization.

Fix:

  • Add WorkloadValueTaskSource.IsContinuerPending so the generated cleanup can detect whether the workload loop is still awaiting the next iteration. Calling Complete() when not pending would mask the original exception (e.g. when the workload itself threw and the loop has already exited via the catch block) with a double-signal InvalidOperationException.
  • Update PrependExtraGlobalCleanupImpl to:
    1. Only call Complete() when the continuer source is pending.
    2. Set workloadContinuerAndValueTaskSource = null after signalling.

The existing WorkloadActionNoUnroll codegen already handles the null case by allocating a fresh WorkloadValueTaskSource and restarting __StartWorkload(), so iterations after a between-iteration GlobalCleanup now run against a fresh, pending source.

@DrewScoggins
Copy link
Copy Markdown
Member Author

This is output from copilot, so not sure if this is the correct approach.

@timcassell
Copy link
Copy Markdown
Collaborator

timcassell commented Apr 29, 2026

It would be good to add a test for this case.

The existing WorkloadActionNoUnroll codegen already handles the null case by allocating a fresh WorkloadValueTaskSource and restarting __StartWorkload(), so iterations after a between-iteration GlobalCleanup now run against a fresh, pending source.

This will affect memory allocation measurements. I'm not sure the best way to handle it. My initial thought was just implement a custom IClock that does the diagnostic actions before starting the true clock, but it doesn't really work. Maybe we could change the API to use a new interface to make it work. I'm also open to any other ideas. [Edit] Actually looking at the IClock interface and Start extension, they just call IClock.GetTimestamp(), so we could theoretically implement it with that, though it does feel a bit hacky.

@timcassell
Copy link
Copy Markdown
Collaborator

timcassell commented Apr 29, 2026

Maybe a simpler way would be to create the workload continuer in __IterationSetup, which is already called before the diagnosers, and then the WorkloadActionNoUnroll can simply call Continue() branchlessly. And then we can even also complete it in __IterationCleanup instead of __GlobalCleanup.

[Edit] Or more simply, we can create it in __GlobalSetup unconditionally, complete it in __GlobalCleanup unconditionally. That way for non-MemoryRandomization jobs it won't allocate between iterations.

Two issues with async benchmarks under MemoryRandomization=true:

1. The engine re-invokes __GlobalCleanup/__GlobalSetup between iterations
   to randomize the managed heap. The async workload's
   WorkloadValueTaskSource was lazily created in WorkloadActionNoUnroll
   on first call and never recreated, so the second iteration would call
   Continue() on a source already in a terminal state and crash with
   System.InvalidOperationException from
   ManualResetValueTaskSourceCore.SignalCompletion.

2. If the workload itself threw, __WorkloadCore would catch and
   SetException on the source. A subsequent __GlobalCleanup that
   unconditionally called Complete() would double-signal the source and
   mask the original exception with InvalidOperationException.

Fix: allocate the WorkloadValueTaskSource and start the async workload
loop in __GlobalSetup (not lazily in the measured WorkloadActionNoUnroll
hot path), and null it out in __GlobalCleanup. Because the engine calls
__GlobalSetup again between iterations under MemoryRandomization, the
next iteration always gets a fresh source allocated *outside* the
MemoryDiagnoser-measured window. WorkloadActionNoUnroll becomes
branchless and allocation-free.

Add IsContinuerPending to WorkloadValueTaskSource and use it to guard
the Complete() call in __GlobalCleanup, so we skip the signal when the
workload has already SetException'd.
@DrewScoggins DrewScoggins force-pushed the fix-async-workload-restart-with-memory-randomization branch from 4805ed6 to a8e2310 Compare April 30, 2026 00:07
@DrewScoggins
Copy link
Copy Markdown
Member Author

Decided to go with the second suggestion.

Copy link
Copy Markdown
Collaborator

@timcassell timcassell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to update both InProcess toolchains.

"public long invokeCount;"
];

protected override string PrependExtraGlobalSetupImpl(string impl)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be Append to setup, Prepend to cleanup.

{impl}
=> $$"""
if (this.__fieldsContainer.workloadContinuerAndValueTaskSource != null
&& this.__fieldsContainer.workloadContinuerAndValueTaskSource.IsContinuerPending)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard is unnecessary.

// is awaiting the next iteration. When the workload threw or the loop has already exited,
// calling SetResult(true) again on an already-signaled source would throw.
public bool IsContinuerPending
=> continuerSource.GetStatus(continuerSource.Version) == ValueTaskSourceStatus.Pending;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary.

@timcassell timcassell added this to the v0.16.0 milestone Apr 30, 2026
@timcassell timcassell added the bug label Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants