Run service-thread event loops on a selector loop on Windows (fix debugger deadlock under Proactor)#1
Merged
Conversation
When the kernel runs under a ProactorEventLoop on Windows (enabled by ipythongh-1469 so the main loop can spawn asyncio subprocesses, ipythongh-1468), debugging deadlocks on Python >= 3.12. ProactorEventLoop has no native add_reader, so tornado drives a Proactor loop's zmq sockets via a helper "Tornado selector" thread that does select() then call_soon_threadsafe() to wake the loop. ipykernel exempts its own service threads from the debugger but not tornado's helper. When debugpy suspends every thread at a breakpoint under sys.monitoring (3.12+, interpreter-global), that un-exempt helper freezes mid-wake and the control/debug read path never advances -- the loop sits in the IOCP poll forever. On 3.11 (sys.settrace, per-thread) the helper is not frozen, so it does not reproduce there. The ipykernel service loops -- control, IOPub, the shell channel and subshells -- never need Proactor's subprocess support, so run them on a SelectorEventLoop instead: it implements add_reader natively and needs no helper thread. Only the main/user-code loop stays on Proactor. Off Windows the default loop is already selector-based, so this is a no-op there. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
BoykoNeov
added a commit
to BoykoNeov/steel-sim
that referenced
this pull request
Jun 21, 2026
…on both PRs GitHub status re-check 2026-06-21: - NewUserHa/ipykernel#1 (Fix A) MERGED into patch-2 (self-merge 3638154); NewUserHa then reverted dead %asyncio (the #1532 cleanup we'd kept out). - ipython/ipykernel#1469 now 40/40 green; NewUserHa endorsed + pinged ianthomas23 for review. Latest comment is an endorsement, not a question. - ipython/ipykernel#1529 unchanged: awaiting ianthomas23 re-review, green except the known macOS-pypy test_run_concurrently_sequence timeout flake. Nothing owed from us on either PR; both await maintainer review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds on ipython#1469 (
patch-2) and fixes the debugger deadlock that appears once the main loop runs onProactorEventLoopon Windows with Python ≥ 3.12.The bug
ProactorEventLoophas no nativeadd_reader, so tornado drives a Proactor loop's zmq sockets through a helper "Tornado selector" thread (AddThreadSelectorEventLoop):select()thencall_soon_threadsafe()to wake the loop. ipykernel's service loops — control, IOPub, the shell channel and subshells — each get one. ipykernel marks its own service threads debugger-exempt (is_pydev_daemon_thread/pydev_do_not_trace) but not tornado's helper. When debugpy suspends every thread at a breakpoint undersys.monitoring(3.12+, interpreter-global), that un-exempt helper freezes mid-wake — afterselect()returns but beforecall_soon_threadsafe()completes — and the control/debug read path never advances; the loop sits in the IOCP poll. On 3.11 (sys.settrace, per-thread) the helper isn't frozen, so it doesn't reproduce there.The fix
Run the service loops on a
SelectorEventLoop(smallmake_selector_io_loop()helper): it implementsadd_readernatively, so no helper thread is spawned and there's nothing for the debugger to freeze. Only the main/user-code loop stays on Proactor, so this PR's subprocess support (ipython#1468) is preserved. Off Windows the default loop is already selector-based, so it's a no-op there. 2 files, ~28 lines, win32-guarded.Verification (current
patch-2)patch-2.test_print_to_correct_cell_from_asyncio), so movingIOPubThread/ the service loops to selector under a Proactor main loop is fine.thread.pyis unchanged onpatch-2.Why no CI test here (honest caveat)
The debugger suite can't exercise this path today:
tests/conftest.pyforcesWindowsSelectorEventLoopPolicyand the debug fixtures run an in-processMockKernelbound toIOLoop.current()— there are no real Control/IOPub service threads. I confirmed that forcing the harness onto Proactor, the in-process breakpoint tests pass with and without this fix. That's why the deadlock slipped past CI, and why a faithful regression test would need a subprocess-kernel-under-debugger harness that doesn't exist yet — best as a follow-up rather than bundled here. The deadlock→pass evidence I cite is from manual reproduction on the earlier base under a forced-Proactor harness.test_attach_debugneeds no change here — it's already handled onpatch-2by thedebugpy >= 1.8.21version gate (from ipython#1524); that's a debugpy behavior change, unrelated to the loop.