Skip to content

fix: njit fallback for broken runtime with torch #164

Merged
flying-sheep merged 17 commits intoscverse:mainfrom
JhonatanFelix:main
Apr 10, 2026
Merged

fix: njit fallback for broken runtime with torch #164
flying-sheep merged 17 commits intoscverse:mainfrom
JhonatanFelix:main

Conversation

@JhonatanFelix
Copy link
Copy Markdown
Contributor

This PR adds a narrow workaround in fast-array-utils’ custom njit wrapper for the Apple Silicon crash reported in scanpy when torch is loaded and a numba parallel path is used. The goal is the same “workaround” path suggested by flying-sheep in the Scanpy discussion: handle this in the shared runtime-dispatch layer, instead of removing parallelism in Scanpy itself.

The approach keeps the normal parallel path by default. But, on the macOS arm64, when torch is already loaded and the current threading config is not expicitly pinned to a safe layer like workqueue or tbb, the wrapper runs a small cached subprocess probe before using the parallel implementation. That probe will mirror the current Numba threading config, and it will reproduce the relevant import context, and finally checking whether a tiny @numba.njit(parallel=True) function actually ran successfully or not. If it does, the parallel version is used as usual. If it fails, the wrapper falls back to the already-compiled serial version and emits a warning.

I also narrowed the probe context baesd on reproduction work in the failing environnment. In that setup, torch was the relevant one, so the probe now mirrors only the loaded torch state, and the cache key includes that state as well. This keeps the workaround smaller and avoids reusing a cached “safe” result after torch is imported later.

Tests cover the probe gating logic, lazy detection behavior, env/config mirroring for the subprocess, cache-key behavior, subprocess success and failure cases, wrapper dispatch between serial and parallel implementations, and correctness of the serial fallback path.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.29%. Comparing base (d5176d6) to head (57be99a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #164      +/-   ##
==========================================
+ Coverage   99.22%   99.29%   +0.06%     
==========================================
  Files          20       21       +1     
  Lines         519      566      +47     
==========================================
+ Hits          515      562      +47     
  Misses          4        4              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 8, 2026

Merging this PR will not alter performance

✅ 232 untouched benchmarks


Comparing JhonatanFelix:main (57be99a) with main (d5176d6)

Open in CodSpeed

Copy link
Copy Markdown
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for this! I like the idea, with a subprocess even UB is unlikely to cause problems. Is the real fix tracked somewhere in the torch issue tracker or so?

I’m less of a fan of the gratuitous monkeypatching in the tests, but I get that avoiding it would mean re-architecting everything, e.g. by using a (data)class that can be initialized with all the variables that it by default gets from global constants and numba settings. No need to do that, I’ll do it when you’re done with the rest!

@JhonatanFelix
Copy link
Copy Markdown
Contributor Author

JhonatanFelix commented Apr 9, 2026

Hi, thanks for this! I like the idea, with a subprocess even UB is unlikely to cause problems. Is the real fix tracked somewhere in the torch issue tracker or so?

I’m less of a fan of the gratuitous monkeypatching in the tests, but I get that avoiding it would mean re-architecting everything, e.g. by using a (data)class that can be initialized with all the variables that it by default gets from global constants and numba settings. No need to do that, I’ll do it when you’re done with the rest!

Thanks!!!
About the torch: I actually didn't find a pytorch issue that tracks the scanpy/numbacrash itself. The closest upstream tracking I found was the problem of pytorch open macOS/OpenMP duplication issues, specially with the pytorch/pytorch#44282 and pytorch/pytorch#127973.

Copy link
Copy Markdown
Contributor Author

@JhonatanFelix JhonatanFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a wrong comment here, sorry!

@flying-sheep flying-sheep changed the title Goal 1 to Fix #4026 (scanpy): apple silicon njit fallback for broken numba parallel runtime with torch fix: apple silicon njit fallback for broken numba parallel runtime with torch Apr 9, 2026
@flying-sheep flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Apr 10, 2026
@flying-sheep flying-sheep changed the title fix: apple silicon njit fallback for broken numba parallel runtime with torch fix: njit fallback for broken runtime with torch Apr 10, 2026
@flying-sheep flying-sheep merged commit f07b25f into scverse:main Apr 10, 2026
20 checks passed
@flying-sheep
Copy link
Copy Markdown
Member

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-gpu-ci Apply this label to run GPU CI once

Projects

None yet

Development

Successfully merging this pull request may close these issues.

segmentation fault in normalize_total (Numba omp) on Apple Silicon segmentation fault (SIGSEGV) when computing neighbors

2 participants