fix: njit fallback for broken runtime with torch #164
fix: njit fallback for broken runtime with torch #164flying-sheep merged 17 commits intoscverse:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #164 +/- ##
==========================================
+ Coverage 99.22% 99.29% +0.06%
==========================================
Files 20 21 +1
Lines 519 566 +47
==========================================
+ Hits 515 562 +47
Misses 4 4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
flying-sheep
left a comment
There was a problem hiding this comment.
Hi, thanks for this! I like the idea, with a subprocess even UB is unlikely to cause problems. Is the real fix tracked somewhere in the torch issue tracker or so?
I’m less of a fan of the gratuitous monkeypatching in the tests, but I get that avoiding it would mean re-architecting everything, e.g. by using a (data)class that can be initialized with all the variables that it by default gets from global constants and numba settings. No need to do that, I’ll do it when you’re done with the rest!
Thanks!!! |
|
thank you! |
This PR adds a narrow workaround in
fast-array-utils’ customnjitwrapper for the Apple Silicon crash reported in scanpy whentorchis loaded and a numba parallel path is used. The goal is the same “workaround” path suggested byflying-sheepin the Scanpy discussion: handle this in the shared runtime-dispatch layer, instead of removing parallelism in Scanpy itself.The approach keeps the normal parallel path by default. But, on the macOS arm64, when
torchis already loaded and the current threading config is not expicitly pinned to a safe layer likeworkqueueortbb, the wrapper runs a small cached subprocess probe before using the parallel implementation. That probe will mirror the current Numba threading config, and it will reproduce the relevant import context, and finally checking whether a tiny@numba.njit(parallel=True)function actually ran successfully or not. If it does, the parallel version is used as usual. If it fails, the wrapper falls back to the already-compiled serial version and emits a warning.I also narrowed the probe context baesd on reproduction work in the failing environnment. In that setup,
torchwas the relevant one, so the probe now mirrors only the loadedtorchstate, and the cache key includes that state as well. This keeps the workaround smaller and avoids reusing a cached “safe” result aftertorchis imported later.Tests cover the probe gating logic, lazy detection behavior, env/config mirroring for the subprocess, cache-key behavior, subprocess success and failure cases, wrapper dispatch between serial and parallel implementations, and correctness of the serial fallback path.