Simplify execution model to worker + owngil modes (v3.0.0)#61
Merged
Conversation
Breaking changes for v3.0.0: - py:execution_mode/0 now returns worker | owngil only - Removed py:num_executors/0 (each context has own worker thread) - Removed multi_executor pool and dead dispatch code Architecture changes: - Per-context worker threads with stable thread affinity - Async NIF dispatch with message passing (no dirty scheduler blocking) - Request queue per context (replaces single-slot pattern) Fixes numpy/torch/tensorflow thread-local state issues.
Async API:
- Reimplement py:async_gather/1 over py_event_loop (concurrent submit,
sequential await); add async_gather/2 with explicit timeout.
- Remove py:async_stream/3,4 from public API (was always returning
{error, stream_not_implemented}).
- Fix py:async_call/3,4 + async_await/1,2 round-trip: use
py_event_loop:create_task and py_event_loop:await directly so the
receive matches the {async_result, _, _} message the NIF sends.
- Delete the legacy py_async_pool gen_server (the async_call call site
now goes through py_event_loop directly).
C-side dead code:
- Remove multi_executor_* functions, g_executors[], multi_executor_thread_main,
multi_executor_enqueue, MAX/MIN_EXECUTORS macros and the executor_t struct.
- Remove the PY_MODE_MULTI_EXECUTOR case in executor_enqueue and the unused
tl_current_req_* thread-locals from the legacy compatibility layer.
- Remove unused shared_dict_down (resource type was registered without
process monitoring).
- Remove context_dispatch_call/eval/exec (declared but never called).
- Drop executor_id field from py_worker_t and py_context_t.
Internal mode enum collapse:
- py_execution_mode_t reduced to {PY_MODE_FREE_THREADED, PY_MODE_GIL}.
Both retired values (PY_MODE_SUBINTERP, PY_MODE_MULTI_EXECUTOR) had
identical behavior in every switch.
- Replace runtime numpy-cache check with precise compile-time gate
(#if defined(HAVE_SUBINTERPRETERS) && !defined(HAVE_FREE_THREADED));
preserves today's truth table on every supported build.
- nif_execution_mode now returns free_threaded | gil; update spec on
py_nif:execution_mode/0.
Configuration cleanup:
- Remove num_executors / num_async_workers env keys (no-ops post-rework).
- Reset .app.src env to [] (the prior keys were never read).
Doc + example sync:
- README.md: drop SHARED_GIL bullet; py:execution_mode/0 returns
worker | owngil; configuration block uses num_contexts / context_mode
/ max_concurrent.
- docs/getting-started.md, scalability.md, migration.md, owngil_internals.md,
preload.md, process-bound-envs.md, reactor.md, testing-free-threading.md:
drop subinterp as a context mode; correct py_event_loop:run signature.
- src/py.erl, src/py_context.erl, src/py_context_router.erl,
src/py_context_sup.erl, src/py_reactor_context.erl: drop subinterp
from @doc mode lists.
- c_src/py_nif.c nif_context_create header doc: same scrub.
- examples/bench_owngil.erl: rename labels (worker baseline vs OWN_GIL),
gate on py_nif:owngil_supported (3.14+).
- examples/bench_reactor_modes.erl: convert subinterp arm to OWN_GIL
(run_reactor_owngil_bench, Reactor/OG legend).
- examples/reactor_subinterp_example.erl: deleted (reactor_owngil_example
covers OWN_GIL; reactor_echo covers worker).
- examples/benchmark.erl: replace py:num_executors() with
py_context_router:num_contexts().
Tests:
- New test/py_thread_affinity_SUITE.erl asserts:
exec/eval/call on one context share threading.get_native_id;
N processes targeting one context converge on its worker thread;
distinct contexts use distinct threads;
same invariants hold under owngil mode.
Replaces 8 untracked diagnostic escripts (deleted).
- test/py_SUITE.erl: tighten test_asyncio_call and test_asyncio_gather
to assert real success values instead of swallowing failures.
CHANGELOG entries describe each breaking change and added behavior.
30 files changed, +244/-1199 LOC, plus the new affinity suite.
py_context:new(#{mode => owngil}) covers the same OWN_GIL parallelism
with OTP supervision and automatic cleanup, so the older handle-based
surface (subinterp_create/destroy/call/eval/exec/cast/async_call/await
+ subinterp_pool_*) is redundant. Drop the public API, the wrapper NIFs,
and the dead reactor example that called nonexistent subinterp_reactor_*
functions. Internal subinterp_thread_pool_* (used by py_event_loop_pool's
OWN_GIL backend) and the capability probe py:subinterp_supported/0 stay.
-1390/+27 LOC across 8 files; py_subinterp_SUITE deleted.
- inspiration.md: 'auto' mode -> 'worker'; py_pool:start_link -> py_context_router:start_pool - asyncio.md: paired async_call with async_await (not await) - py.erl: bind_context lives in py_context_router, not py - reactor_echo example: 'auto' mode -> 'worker' - bench_async_task header: spawn -> spawn_task - .app.src: drop stale py_pool from registered list
py_context already validates worker | owngil via pattern matching, but
py_reactor_context calls py_nif:context_create/1 directly, which silently
mapped any non-owngil atom to worker. That hid genuine misuse — including
the auto atom that lingered in a doc snippet and one CT test.
Tighten nif_context_create to return {error, {invalid_mode, Atom}} for
anything other than worker | owngil, and update the one test that relied
on the silent-accept behavior.
- py_actor_SUITE: prefix unused Pid variable with underscore - py_async_task_SUITE: match +0.0 instead of 0.0 (OTP 27+ no longer matches -0.0 to 0.0) - py_reentrant_SUITE, py_pid_send_SUITE: replace deprecated code:lib_dir/2 with filename:join(code:lib_dir/1, "test")
3.14 deprecated the call; 3.16 removes it. Our run path (erlang.run / asyncio.Runner with loop_factory=) doesn't need the global policy — it was only convenience for bare asyncio.run() inside py:exec. Gate both Erlang-side helpers and the Python erlang.install() entry point on sys.version_info < (3, 14); the latter now raises with a migration message on 3.14+. Behavior on Python 3.9-3.13 is unchanged. Adds test/py_asyncio_policy_SUITE.erl (7 self-skipping cases) pinning the gate, the run-path-without-policy invariant, and a sentinel asserting no set_event_loop_policy DeprecationWarning fires on 3.14+ init.
EDoc expects code-quote spans as `text', not `text`; the markdown-style backticks I added in the v3.14 deprecation note crashed edoc_doclet_chunks.
The handle-level entry points (subinterp_thread_handle_create/destroy, subinterp_thread_call/eval/exec/cast/async_call) along with the py_subinterp_handle_t struct and PY_SUBINTERP_HANDLE_RESOURCE_TYPE became unreachable when the public py:subinterp_* API was removed. Only the OWN_GIL session backend (subinterp_thread_pool_*) is still used by py_event_loop_pool, so keep that path and drop the rest.
has_result and is_error in suspended_state_t / suspended_context_state_t were declared volatile, which doesn't guarantee atomicity or memory ordering on weakly-ordered architectures (ARM). Upgrade to _Atomic bool so the read in the resumer thread sees the write from the callback thread without relying on incidental sequencing.
If a worker pthread is stuck in long-running Python, the 30s shutdown join times out, ctx->leaked is set, and the helper returns. Today the BEAM later runs context_destructor (which sees ctx->destroyed and returns early) and frees the resource memory — under a thread that is still dereferencing it. Call enif_keep_resource(ctx) on the leak path so the refcount stays above zero forever and the memory survives the stuck thread. Replace the CTX_REQ_SHUTDOWN sentinel enqueue with a direct pthread_cond_broadcast under the queue mutex. The sentinel could be orphaned (worker mid-request returns to top of loop, sees shutdown_requested, exits without dequeuing) and now allocations can fail more often after ctx_request_create() honors init failures. The broadcast wakes any parked worker so it observes the predicate and exits cleanly. Gate the callback_pipe close in nif_context_destroy on !ctx->leaked. A leaked pthread inside Python that triggers erlang.call() reads from callback_pipe[0] and writes to callback_pipe[1]; closing those fds lets the kernel reissue the numbers to unrelated files and silently corrupt them. The leaked pipes stay alive with the pinned resource until VM exit.
wait_for_async_result/2 returns {error, async_timeout} after 5
minutes, but the C worker can still finish later and deliver the
result. Without a drain those messages would accumulate on the
context process's mailbox forever. Drain stale ids before the
matching receive — safe because the context process is the sole
receiver and only one wait is in flight at a time.
New test/py_context_async_drain_SUITE.erl pins the behavior by
injecting a stale {py_result, FakeRef, _} into Ctx's mailbox and
asserting it's gone after the next async-dispatched call.
pthread_mutex_init, pthread_cond_init, and enif_alloc_env can each fail under resource pressure. Today their returns are ignored, so mutex/cond init failures leave the request in an unusable state and a NULL request_env causes the next enif_make_copy to segfault. Check each return and free what was allocated on partial failure. All 14 callers already test the result for NULL.
_ErlangChildWatcher in priv/_erlang_impl/_policy.py was never instantiated (the actual watcher is asyncio.ThreadedChildWatcher / SafeChildWatcher); delete the class along with its TODO. Replace "process monitoring disabled for now to debug crash" / "Using simple resource type without process monitoring for now" in the SharedDict path with positive-form notes describing the actual GC-scoped lifecycle.
dispatch_to_worker_thread_impl blocked on a 30s pthread_cond_timedwait
in the env path. ML inference and other long-running calls returned
{error, worker_timeout} while the worker kept going. Add three
async-with-env NIFs that wrap the existing dispatch_to_worker_thread_async
(which already takes a local_env), and rewire handle_*_with_suspension_and_env
plus the {exec,_,_,_,EnvRef} loop arm to async-first / sync-fallback.
The Erlang side waits in wait_for_async_result/2, which has the
stale-result drain from ef31f76, so the env path now matches the
non-env path's behavior.
Adds test/py_context_async_env_SUITE (2 cases).
The asyncio.md and migration.md tables claimed sync erlang.sleep() always releases the dirty scheduler. In v3.0 the dirty scheduler isn't held during sync calls anyway (async NIF dispatch returns immediately); what blocks is the worker pthread for py:call, the Erlang process for py:exec/py:eval, or no thread at all for awaited async sleep. Update the table and the sleep() docstring accordingly. erlang.install() emits a DeprecationWarning on 3.12-3.13 by design, but users who knowingly use the legacy pattern had no clean local opt-out. Add a keyword-only silent=False; passing silent=True suppresses the warning without disabling DeprecationWarning globally. The 3.14+ RuntimeError stays unconditional.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is the preparation for the v3.0.0 release. The public surface gets smaller, while the internals are reorganized around per-context worker pthreads.
The set of public modes is now
worker | owngil. Other atoms likesubinterp,auto,multi_executororfree_threadedare no longer accepted. The NIF boundary validates the mode strictly and returns{error, {invalid_mode, _}}for anything else, which closes a small loophole wherepy_reactor_contextused to silently fall back to worker.A few APIs that did not really earn their keep have been removed:
py:num_executors/0,py:async_stream/3,4, and the entirepy:subinterp_*family of explicit handles. For OWN_GIL parallelism, the path is nowpy_context:new(#{mode => owngil}), which gives the same isolation with proper OTP supervision. Thenum_executorsandnum_async_workersconfig keys are gone too, since they were already no-ops, and the legacypy_async_poolgen_server with them.On the async side, two things were genuinely broken and are now fixed.
py:async_call/3,4followed bypy:async_await/1,2used to time out silently, because the await was listening for the wrong message tag. Andpy:async_gather/1,2simply returnedgather_not_implemented. Both round-trip correctly now, on top ofpy_event_loop.The architectural change underneath is that each context owns its dedicated pthread, with stable thread affinity. This is what we needed to make numpy, torch and tensorflow happy with their thread-local state. The multi-executor pool, the subinterpreter pool, and the various legacy scaffolds that used to live alongside are gone.
For Python 3.14+, we no longer install the deprecated global
asyncio.set_event_loop_policy. The two recommended entry points,erlang.run(main)andasyncio.Runner(loop_factory=erlang.new_event_loop), do not need it and work on every supported version from 3.9 to 3.14+. The legacyerlang.install()helper now raises aRuntimeErroron 3.14+ with a small message pointing at those alternatives. On 3.12-3.13 it still works, with aDeprecationWarningyou can suppress per call viaerlang.install(silent=True)if you knowingly stick to the legacy pattern.Review-driven fixes
A second round of review surfaced four issues that have all been addressed:
enif_keep_resourceon the leak path so its memory survives the stuck thread; the callback pipes are kept open with it for the same reason. We also dropped theCTX_REQ_SHUTDOWNsentinel (which could be orphaned in the queue if the worker exited mid-loop) in favor of a directpthread_cond_broadcast.{py_result, _, _}messages were piling up on the context process wheneverwait_for_async_result/2timed out (the worker can still finish later and deliver the result). A small drain now runs before the matching receive, keeping the mailbox bounded. Pinned by a new CT case that injects a fake stale result and checks the mailbox is empty after the next call.ctx_request_createdid not check the returns ofpthread_mutex_init,pthread_cond_initandenif_alloc_env. Under resource pressure that could segfault on the firstenif_make_copyof a NULL request env. The function now rolls back partial state on any init failure; all 14 call sites already test the result._ErlangChildWatcherclass on the Python side, which was never instantiated, and a couple of stale "for now" comments on the SharedDict path that no longer reflected the design.Long-running env calls
The env-bearing
call,evalandexecpaths used to take a synchronous dispatch with a 30-secondpthread_cond_timedwaitdeadline. Long Python work would return{error, worker_timeout}while the worker kept going, and the eventual result would be lost. The non-env paths had already moved to async dispatch, so we just had to add the env variants on top. Three new NIFs wrap the existing async dispatch with the local env threaded through, and the Erlang side falls back to the blocking NIF only when the worker thread isn't available. Theworker_timeouterror is no longer reachable on the env path.Tests and verification
The test suite now has dedicated coverage for thread affinity, the asyncio policy gate on 3.14+, the stale-result drain, and the env-async dispatch path. On Python 3.14 the final tally is 529 passed, 5 user-skipped, 0 failed, with
py_async_e2e_SUITEandpy_udp_e2e_SUITEboth at 10/10 and no deprecation warnings during init.The user-facing docs (README, getting-started, scalability, migration, owngil_internals, process-bound-envs, reactor, preload, testing-free-threading, asyncio) have been refreshed to remove dead snippets and stale mode names. The
erlang.sleepbehavior table has been rewritten to match the v3.0 worker-pthread architecture rather than the older "dirty scheduler released in all sync contexts" framing. The migration guide intentionally keeps the historical references, so users coming from v1.8 or v2.0 can still find their way.Net delta versus
v2.3.1: about 1,000 lines removed.