Skip to content

Comments

Replaced WFE instruction with ISB in mi_atomic_yield on ARM64#1215

Open
akaStiX wants to merge 1 commit intomicrosoft:devfrom
akaStiX:atomic_yield_arm_fix
Open

Replaced WFE instruction with ISB in mi_atomic_yield on ARM64#1215
akaStiX wants to merge 1 commit intomicrosoft:devfrom
akaStiX:atomic_yield_arm_fix

Conversation

@akaStiX
Copy link

@akaStiX akaStiX commented Feb 2, 2026

Replaced WFE instruction with ISB in mi_atomic_yield on ARM64 for non Windows as it significantly improves performance.
Also on one closed platform I ported mimalloc to, WFE resulted in an application freeze, because no even was generated to wake threads that were waiting with WFE

… Windows as it significantly improves performance
@daanx
Copy link
Collaborator

daanx commented Feb 3, 2026

Interesting PR. Mmm, it would be best if we didn't need atomic yield at all :-( Anyways, in v3 there is (almost) no use of atomic yield so that improves matters. In the other cases, the atomic yield is really used to signify that another thread needs to make progress for the current thread to advance. As such ISB (essentially a memory barrier) doesn't quite do that? WFE (wait for another thread event) seems more appropiate .. but then, you are seeing improved perfomance? Is that on v2 ?

@akaStiX
Copy link
Author

akaStiX commented Feb 23, 2026

I tested v3 and it was not good in both perf and memory in comparison to v2 on my use case.

As such ISB (essentially a memory barrier) doesn't quite do that?

It's a recommended way to do it on ARM actually. There's nothing that acts like a PAUSE instruction on x64 on ARM, and the code that uses this yield was designed with x64 in mind. As I said before, on one closed source ARM platform that I ported mimalloc to WFE resulted in an app freeze since no event was raised. I don't remember all the details about WFE from the top of my head, but I think one must also raise an event explicitly to make sure WFE doesn't wait way too much time as the lock might have been released long time ago.
Going in and changing all code to raise the event manually would be a much bigger fix.
Also, if you go into WinSDK and check what YieldProcessor() does for ARM there, you'll see this:
dmb ishst
yield
Essentially the same as ISB, but more instructions. I tested this approach and it was slightly slower, so I proceeded with ISB

but then, you are seeing improved perfomance? Is that on v2 ?

Yes, I am seeing a huge perf improvement with ISB on v2.
I also observed perf improvements with ISB in other code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants