Fix pdb / breakpoint() hang in workflow code#1568
Conversation
bff1622 to
8b93ffb
Compare
| self._deadlock_timeout_seconds = None if debug_mode else 2 | ||
| self._deadlock_timeout_seconds = None if self._debug_mode else 2 | ||
|
|
||
| _install_workflow_breakpoint_hook() |
There was a problem hiding this comment.
This should probably only happen during debug mode as well. It may not make any actual difference, but it would be good to give that assurance that nothing is changing outside of debug mode.
There was a problem hiding this comment.
The hook is intentionally always installed. The only case it catches is breakpoint() called from workflow code without debug_mode set, which is #1104's original silent hang. Gating on debug_mode would remove the error in exactly the scenario I think should be converted from silent hang to loud error.
Maybe I add a code comment making the always-on rationale explicit?
There was a problem hiding this comment.
I don't see anything in #1104 which indicates a need to enable breakpoints without debug_mode. Doesn't it explicitly say that it in fact fails quite loudly with sandboxing errors? What exactly is the scenario you are trying to address with this? Unsandboxed attempts to use a debug breakpoint without debug mode?
There was a problem hiding this comment.
Yeah, you're right.
What I was actually hitting: I dropped a breakpoint() into a sandboxed workflow and ran it under pytest, and it looked like a silent hang to me — the workflow task kept retrying forever and pytest was buffering the warning until I re-ran with -s. At that point the sandbox error did show up, just like #1104 describes. So it's not actually silent — the real problem is that the error tells me to "mark the import as pass through," which isn't really helpful feedback. In any case, the process-wide hook was the wrong tool.
Changed approach in the latest push:
-
_install_workflow_breakpoint_hook()now only fires inside theif self._debug_mode:block in_workflow.py -
The sandbox error itself now points users at
debug_mode=Trueinstead of the generic "mark the import as pass through" advice. Uses the existingSandboxMatcher.leaf_messagemechanism (same one already used forasyncio.as_completed→ "useworkflow.as_completed()instead").breakpointis restructured from a flatuseentry to a child matcher with a customleaf_message, and_importer.py:restrict_built_inwas wired to readchild.leaf_messageso per-builtin overrides actually flow through.
When debug_mode=True (or TEMPORAL_DEBUG=1), breakpoint() inside workflow code now opens an interactive pdb prompt -- including from a sandboxed workflow run under pytest. Four pieces: - Inline dispatch on the asyncio main thread (via loop.call_soon to avoid nesting inside the dispatch task's __step() and tripping Python 3.14's task-entry validation). - breakpoint removed from the sandbox's invalid builtins so the call reaches the worker hook. Nothing else is relaxed. - A Pdb subclass that lands at the workflow's own frame, suspends sandbox checks during each REPL interaction, and overrides q/Ctrl-D to continue the workflow instead of failing it with BdbQuit. - A defensive sys.breakpointhook that raises a clear RuntimeError when breakpoint() is called from a workflow worker thread without debug_mode, replacing the previous silent hang. When debug_mode is not set, the worker's dispatch and sandbox config are unchanged. Adds a README subsection on debugging workflows and five tests at tests/worker/test_breakpoint_hang.py. Verified on Python 3.13 and 3.14. Closes temporalio#1104.
ae0fa2d to
e8702f5
Compare
What was changed
When
debug_mode=Trueon the Worker (orTEMPORAL_DEBUG=1),breakpoint()inside workflow code now opens an interactive pdb prompt — including from a sandboxed workflow run underpytest. Withoutdebug_mode, the sandbox raises a clear error pointing the user atdebug_mode=True. Pieces:debug_mode, activations run on the asyncio main thread (scheduled vialoop.call_soonto avoid nesting inside the dispatch task's__step()), so pdb'sinput()reaches the TTY.debug_mode,breakpointis removed from the sandbox's invalid builtins so the call can reach the worker hook. Nothing else is relaxed.q/ Ctrl-D to continue the workflow instead of failing it withBdbQuit.debug_mode. The sandbox'sbreakpointrestriction now carries aleaf_messagedirecting the user atdebug_mode=True(rather than the generic "mark the import as pass through" advice, which is the wrong fix here). Uses the sameSandboxMatcher.leaf_messagemechanism already in place forasyncio.as_completed.The workflow
sys.breakpointhookis only installed whendebug_mode=True. Withdebug_modeoff, dispatch, sandbox config, andsys.breakpointhookare all identical to upstream main.Why?
breakpoint()andpdb.set_trace()inside workflow code can't reach a debugger today. In a sandboxed workflow, the sandbox raises an error — but the error tells the user to "mark the import as pass through," which is the wrong fix. In an unsandboxed workflow, the call falls into pdb on a thread without a working stdin and the worker thread hangs.Three overlapping issues for the debug-mode path:
ThreadPoolExecutorthread, so pdb'sinput()can't read the controlling TTY.breakpointas non-deterministic, so the call can't reach the debugger.cmdlooptouches more sandbox-restricted internals at runtime (e.g.readline.get_completer) — relaxing the builtin alone isn't enough.Direct synchronous activation from the dispatch coroutine doesn't work on Python 3.14:
The dispatch task is mid-
__step()whenworkflow.activatetries to step the workflow's own task; 3.14 refuses.await futureafterloop.call_soonsuspends the dispatch task first.Complements #1249 (sandbox passthrough for IDE debuggers). Independent change, different debugger.
Checklist
Closes Setting debug_mode in a Worker still doesn't allow the user of breakpoints #1104
How was this tested:
tests/worker/test_breakpoint_hang.py— five tests covering thread placement (both modes), breakpoint in a sandboxed workflow lands at the user's frame with locals visible,q/Ctrl-D continues cleanly, and (withoutdebug_mode) the sandbox error message points atdebug_mode=True. 5/5 pass on Python 3.13 and 3.14.breakpoint()into any workflow'srun()body, run viapytest -s(or a standalonepythonscript), confirm the(Pdb)prompt opens at the user's frame with locals in scope.breakpoint()/pdb" subsection to the README under Workflow Sandbox, with a runnable example and the workflow-task-timeout caveat.