Simplify execution model to worker + owngil modes (v3.0.0) by benoitc · Pull Request #61 · benoitc/erlang-python

benoitc · 2026-04-08T16:17:57Z

Summary

This is the preparation for the v3.0.0 release. The public surface gets smaller, while the internals are reorganized around per-context worker pthreads.

The set of public modes is now worker | owngil. Other atoms like subinterp, auto, multi_executor or free_threaded are no longer accepted. The NIF boundary validates the mode strictly and returns {error, {invalid_mode, _}} for anything else, which closes a small loophole where py_reactor_context used to silently fall back to worker.

A few APIs that did not really earn their keep have been removed: py:num_executors/0, py:async_stream/3,4, and the entire py:subinterp_* family of explicit handles. For OWN_GIL parallelism, the path is now py_context:new(#{mode => owngil}), which gives the same isolation with proper OTP supervision. The num_executors and num_async_workers config keys are gone too, since they were already no-ops, and the legacy py_async_pool gen_server with them.

On the async side, two things were genuinely broken and are now fixed. py:async_call/3,4 followed by py:async_await/1,2 used to time out silently, because the await was listening for the wrong message tag. And py:async_gather/1,2 simply returned gather_not_implemented. Both round-trip correctly now, on top of py_event_loop.

The architectural change underneath is that each context owns its dedicated pthread, with stable thread affinity. This is what we needed to make numpy, torch and tensorflow happy with their thread-local state. The multi-executor pool, the subinterpreter pool, and the various legacy scaffolds that used to live alongside are gone.

For Python 3.14+, we no longer install the deprecated global asyncio.set_event_loop_policy. The two recommended entry points, erlang.run(main) and asyncio.Runner(loop_factory=erlang.new_event_loop), do not need it and work on every supported version from 3.9 to 3.14+. The legacy erlang.install() helper now raises a RuntimeError on 3.14+ with a small message pointing at those alternatives. On 3.12-3.13 it still works, with a DeprecationWarning you can suppress per call via erlang.install(silent=True) if you knowingly stick to the legacy pattern.

Review-driven fixes

A second round of review surfaced four issues that have all been addressed:

A potential use-after-free on shutdown timeout: when a worker pthread was stuck in long-running Python, the 30-second join would fail, the helper returned, and the BEAM later freed the resource memory under the still-running thread. We now pin the resource via enif_keep_resource on the leak path so its memory survives the stuck thread; the callback pipes are kept open with it for the same reason. We also dropped the CTX_REQ_SHUTDOWN sentinel (which could be orphaned in the queue if the worker exited mid-loop) in favor of a direct pthread_cond_broadcast.
Stale {py_result, _, _} messages were piling up on the context process whenever wait_for_async_result/2 timed out (the worker can still finish later and deliver the result). A small drain now runs before the matching receive, keeping the mailbox bounded. Pinned by a new CT case that injects a fake stale result and checks the mailbox is empty after the next call.
ctx_request_create did not check the returns of pthread_mutex_init, pthread_cond_init and enif_alloc_env. Under resource pressure that could segfault on the first enif_make_copy of a NULL request env. The function now rolls back partial state on any init failure; all 14 call sites already test the result.
A small bit of dead code went out: the _ErlangChildWatcher class on the Python side, which was never instantiated, and a couple of stale "for now" comments on the SharedDict path that no longer reflected the design.

Long-running env calls

The env-bearing call, eval and exec paths used to take a synchronous dispatch with a 30-second pthread_cond_timedwait deadline. Long Python work would return {error, worker_timeout} while the worker kept going, and the eventual result would be lost. The non-env paths had already moved to async dispatch, so we just had to add the env variants on top. Three new NIFs wrap the existing async dispatch with the local env threaded through, and the Erlang side falls back to the blocking NIF only when the worker thread isn't available. The worker_timeout error is no longer reachable on the env path.

Tests and verification

The test suite now has dedicated coverage for thread affinity, the asyncio policy gate on 3.14+, the stale-result drain, and the env-async dispatch path. On Python 3.14 the final tally is 529 passed, 5 user-skipped, 0 failed, with py_async_e2e_SUITE and py_udp_e2e_SUITE both at 10/10 and no deprecation warnings during init.

The user-facing docs (README, getting-started, scalability, migration, owngil_internals, process-bound-envs, reactor, preload, testing-free-threading, asyncio) have been refreshed to remove dead snippets and stale mode names. The erlang.sleep behavior table has been rewritten to match the v3.0 worker-pthread architecture rather than the older "dirty scheduler released in all sync contexts" framing. The migration guide intentionally keeps the historical references, so users coming from v1.8 or v2.0 can still find their way.

Net delta versus v2.3.1: about 1,000 lines removed.

Breaking changes for v3.0.0: - py:execution_mode/0 now returns worker | owngil only - Removed py:num_executors/0 (each context has own worker thread) - Removed multi_executor pool and dead dispatch code Architecture changes: - Per-context worker threads with stable thread affinity - Async NIF dispatch with message passing (no dirty scheduler blocking) - Request queue per context (replaces single-slot pattern) Fixes numpy/torch/tensorflow thread-local state issues.

@doc

Async API: - Reimplement py:async_gather/1 over py_event_loop (concurrent submit, sequential await); add async_gather/2 with explicit timeout. - Remove py:async_stream/3,4 from public API (was always returning {error, stream_not_implemented}). - Fix py:async_call/3,4 + async_await/1,2 round-trip: use py_event_loop:create_task and py_event_loop:await directly so the receive matches the {async_result, _, _} message the NIF sends. - Delete the legacy py_async_pool gen_server (the async_call call site now goes through py_event_loop directly). C-side dead code: - Remove multi_executor_* functions, g_executors[], multi_executor_thread_main, multi_executor_enqueue, MAX/MIN_EXECUTORS macros and the executor_t struct. - Remove the PY_MODE_MULTI_EXECUTOR case in executor_enqueue and the unused tl_current_req_* thread-locals from the legacy compatibility layer. - Remove unused shared_dict_down (resource type was registered without process monitoring). - Remove context_dispatch_call/eval/exec (declared but never called). - Drop executor_id field from py_worker_t and py_context_t. Internal mode enum collapse: - py_execution_mode_t reduced to {PY_MODE_FREE_THREADED, PY_MODE_GIL}. Both retired values (PY_MODE_SUBINTERP, PY_MODE_MULTI_EXECUTOR) had identical behavior in every switch. - Replace runtime numpy-cache check with precise compile-time gate (#if defined(HAVE_SUBINTERPRETERS) && !defined(HAVE_FREE_THREADED)); preserves today's truth table on every supported build. - nif_execution_mode now returns free_threaded | gil; update spec on py_nif:execution_mode/0. Configuration cleanup: - Remove num_executors / num_async_workers env keys (no-ops post-rework). - Reset .app.src env to [] (the prior keys were never read). Doc + example sync: - README.md: drop SHARED_GIL bullet; py:execution_mode/0 returns worker | owngil; configuration block uses num_contexts / context_mode / max_concurrent. - docs/getting-started.md, scalability.md, migration.md, owngil_internals.md, preload.md, process-bound-envs.md, reactor.md, testing-free-threading.md: drop subinterp as a context mode; correct py_event_loop:run signature. - src/py.erl, src/py_context.erl, src/py_context_router.erl, src/py_context_sup.erl, src/py_reactor_context.erl: drop subinterp from @doc mode lists. - c_src/py_nif.c nif_context_create header doc: same scrub. - examples/bench_owngil.erl: rename labels (worker baseline vs OWN_GIL), gate on py_nif:owngil_supported (3.14+). - examples/bench_reactor_modes.erl: convert subinterp arm to OWN_GIL (run_reactor_owngil_bench, Reactor/OG legend). - examples/reactor_subinterp_example.erl: deleted (reactor_owngil_example covers OWN_GIL; reactor_echo covers worker). - examples/benchmark.erl: replace py:num_executors() with py_context_router:num_contexts(). Tests: - New test/py_thread_affinity_SUITE.erl asserts: exec/eval/call on one context share threading.get_native_id; N processes targeting one context converge on its worker thread; distinct contexts use distinct threads; same invariants hold under owngil mode. Replaces 8 untracked diagnostic escripts (deleted). - test/py_SUITE.erl: tighten test_asyncio_call and test_asyncio_gather to assert real success values instead of swallowing failures. CHANGELOG entries describe each breaking change and added behavior. 30 files changed, +244/-1199 LOC, plus the new affinity suite.

py_context:new(#{mode => owngil}) covers the same OWN_GIL parallelism with OTP supervision and automatic cleanup, so the older handle-based surface (subinterp_create/destroy/call/eval/exec/cast/async_call/await + subinterp_pool_*) is redundant. Drop the public API, the wrapper NIFs, and the dead reactor example that called nonexistent subinterp_reactor_* functions. Internal subinterp_thread_pool_* (used by py_event_loop_pool's OWN_GIL backend) and the capability probe py:subinterp_supported/0 stay. -1390/+27 LOC across 8 files; py_subinterp_SUITE deleted.

- inspiration.md: 'auto' mode -> 'worker'; py_pool:start_link -> py_context_router:start_pool - asyncio.md: paired async_call with async_await (not await) - py.erl: bind_context lives in py_context_router, not py - reactor_echo example: 'auto' mode -> 'worker' - bench_async_task header: spawn -> spawn_task - .app.src: drop stale py_pool from registered list

py_context already validates worker | owngil via pattern matching, but py_reactor_context calls py_nif:context_create/1 directly, which silently mapped any non-owngil atom to worker. That hid genuine misuse — including the auto atom that lingered in a doc snippet and one CT test. Tighten nif_context_create to return {error, {invalid_mode, Atom}} for anything other than worker | owngil, and update the one test that relied on the silent-accept behavior.

- py_actor_SUITE: prefix unused Pid variable with underscore - py_async_task_SUITE: match +0.0 instead of 0.0 (OTP 27+ no longer matches -0.0 to 0.0) - py_reentrant_SUITE, py_pid_send_SUITE: replace deprecated code:lib_dir/2 with filename:join(code:lib_dir/1, "test")

3.14 deprecated the call; 3.16 removes it. Our run path (erlang.run / asyncio.Runner with loop_factory=) doesn't need the global policy — it was only convenience for bare asyncio.run() inside py:exec. Gate both Erlang-side helpers and the Python erlang.install() entry point on sys.version_info < (3, 14); the latter now raises with a migration message on 3.14+. Behavior on Python 3.9-3.13 is unchanged. Adds test/py_asyncio_policy_SUITE.erl (7 self-skipping cases) pinning the gate, the run-path-without-policy invariant, and a sentinel asserting no set_event_loop_policy DeprecationWarning fires on 3.14+ init.

EDoc expects code-quote spans as `text', not `text`; the markdown-style backticks I added in the v3.14 deprecation note crashed edoc_doclet_chunks.

The handle-level entry points (subinterp_thread_handle_create/destroy, subinterp_thread_call/eval/exec/cast/async_call) along with the py_subinterp_handle_t struct and PY_SUBINTERP_HANDLE_RESOURCE_TYPE became unreachable when the public py:subinterp_* API was removed. Only the OWN_GIL session backend (subinterp_thread_pool_*) is still used by py_event_loop_pool, so keep that path and drop the rest.

has_result and is_error in suspended_state_t / suspended_context_state_t were declared volatile, which doesn't guarantee atomicity or memory ordering on weakly-ordered architectures (ARM). Upgrade to _Atomic bool so the read in the resumer thread sees the write from the callback thread without relying on incidental sequencing.

If a worker pthread is stuck in long-running Python, the 30s shutdown join times out, ctx->leaked is set, and the helper returns. Today the BEAM later runs context_destructor (which sees ctx->destroyed and returns early) and frees the resource memory — under a thread that is still dereferencing it. Call enif_keep_resource(ctx) on the leak path so the refcount stays above zero forever and the memory survives the stuck thread. Replace the CTX_REQ_SHUTDOWN sentinel enqueue with a direct pthread_cond_broadcast under the queue mutex. The sentinel could be orphaned (worker mid-request returns to top of loop, sees shutdown_requested, exits without dequeuing) and now allocations can fail more often after ctx_request_create() honors init failures. The broadcast wakes any parked worker so it observes the predicate and exits cleanly. Gate the callback_pipe close in nif_context_destroy on !ctx->leaked. A leaked pthread inside Python that triggers erlang.call() reads from callback_pipe[0] and writes to callback_pipe[1]; closing those fds lets the kernel reissue the numbers to unrelated files and silently corrupt them. The leaked pipes stay alive with the pinned resource until VM exit.

wait_for_async_result/2 returns {error, async_timeout} after 5 minutes, but the C worker can still finish later and deliver the result. Without a drain those messages would accumulate on the context process's mailbox forever. Drain stale ids before the matching receive — safe because the context process is the sole receiver and only one wait is in flight at a time. New test/py_context_async_drain_SUITE.erl pins the behavior by injecting a stale {py_result, FakeRef, _} into Ctx's mailbox and asserting it's gone after the next async-dispatched call.

pthread_mutex_init, pthread_cond_init, and enif_alloc_env can each fail under resource pressure. Today their returns are ignored, so mutex/cond init failures leave the request in an unusable state and a NULL request_env causes the next enif_make_copy to segfault. Check each return and free what was allocated on partial failure. All 14 callers already test the result for NULL.

_ErlangChildWatcher in priv/_erlang_impl/_policy.py was never instantiated (the actual watcher is asyncio.ThreadedChildWatcher / SafeChildWatcher); delete the class along with its TODO. Replace "process monitoring disabled for now to debug crash" / "Using simple resource type without process monitoring for now" in the SharedDict path with positive-form notes describing the actual GC-scoped lifecycle.

dispatch_to_worker_thread_impl blocked on a 30s pthread_cond_timedwait in the env path. ML inference and other long-running calls returned {error, worker_timeout} while the worker kept going. Add three async-with-env NIFs that wrap the existing dispatch_to_worker_thread_async (which already takes a local_env), and rewire handle_*_with_suspension_and_env plus the {exec,_,_,_,EnvRef} loop arm to async-first / sync-fallback. The Erlang side waits in wait_for_async_result/2, which has the stale-result drain from ef31f76, so the env path now matches the non-env path's behavior. Adds test/py_context_async_env_SUITE (2 cases).

The asyncio.md and migration.md tables claimed sync erlang.sleep() always releases the dirty scheduler. In v3.0 the dirty scheduler isn't held during sync calls anyway (async NIF dispatch returns immediately); what blocks is the worker pthread for py:call, the Erlang process for py:exec/py:eval, or no thread at all for awaited async sleep. Update the table and the sleep() docstring accordingly. erlang.install() emits a DeprecationWarning on 3.12-3.13 by design, but users who knowingly use the legacy pattern had no clean local opt-out. Add a keyword-only silent=False; passing silent=True suppresses the warning without disabling DeprecationWarning globally. The 3.14+ RuntimeError stays unconditional.

benoitc added 17 commits April 8, 2026 18:17

CHANGELOG: note strict context-mode validation

c267394

Fix edoc parse error in set_default_policy/0

55cdd01

EDoc expects code-quote spans as `text', not `text`; the markdown-style backticks I added in the v3.14 deprecation note crashed edoc_doclet_chunks.

benoitc merged commit 2075f2d into main May 1, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify execution model to worker + owngil modes (v3.0.0)#61

Simplify execution model to worker + owngil modes (v3.0.0)#61
benoitc merged 17 commits intomainfrom
feature/simplify-execution-model

benoitc commented Apr 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

benoitc commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review-driven fixes

Long-running env calls

Tests and verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benoitc commented Apr 8, 2026 •

edited

Loading