Skip to content

Route DebuggerController operations through a per-controller worker queue#1090

Open
xusheng6 wants to merge 4 commits into
devfrom
test_refactor-worker-queue
Open

Route DebuggerController operations through a per-controller worker queue#1090
xusheng6 wants to merge 4 commits into
devfrom
test_refactor-worker-queue

Conversation

@xusheng6
Copy link
Copy Markdown
Member

Supersedes #1087 (the prior PR was auto-closed when the branch was renamed for CI to pick it up).

Summary

Two stacked commits:

  1. Worker-queue refactor. Replace the per-operation std::thread(...).detach() pattern (17 sites in Launch/Attach/Connect/Go/GoReverse/Step…/RunTo…/Restart/Detach/Quit/Pause) with one persistent worker thread per controller plus a FIFO work queue. Every public op routes through Submit() / SubmitAndWait(). The worker is owned by the controller and joined in the destructor (after draining), so async work cannot outlive the controller — lifetime is structural rather than reliant on per-call DbgRef captures.
  2. Adapter-stop channel. Adapter stops are now an internal signal from the adapter thread to the worker (m_adapterStopMutex / m_adapterStopCv / m_adapterStopPending), not events on the public dispatcher queue. The conditional-breakpoint silent-resume logic moves into ExecuteAdapterAndWait, eliminating the previous "drive m_adapter->Go() from the dispatcher thread" hack and the m_suppressResumeEvent flag that papered over it. Spontaneous adapter stops (user types si directly into the LLDB REPL while no controller op is in flight) are detected via m_inAdapterWait and routed through the worker queue.

Implements #1088, #1089.

Design notes (worker queue)

  • Submit detects re-entrant calls from the worker itself (std::this_thread::get_id() == m_workerThreadId) and runs them inline — lets RestartAndWaitOnWorker call QuitAndWaitOnWorker + LaunchAndWaitOnWorker without deadlocking.
  • SubmitAndWait submits a task and blocks on the future. Every XxxAndWait now takes an optional std::chrono::milliseconds timeout (default milliseconds::max() = wait forever, bypasses wait_for to avoid stdlib overflow). On non-infinite timeout: signals m_adapter->BreakInto() and still waits for the in-flight op to settle.
  • Public/Internal/OnWorker split. The existing XxxAndWaitInternal does the actual work; the old XxxAndWait (lock + Internal + notify) is renamed XxxAndWaitOnWorker (private) and invoked from the worker; the new public XxxAndWait(timeout) is a one-line SubmitAndWait wrapper.
  • Pause() / PauseAndWait() bypass the queue and call m_adapter->BreakInto() directly on the caller's thread — the worker is blocked inside ExecuteAdapterAndWait for whatever resume op is in flight; BreakInto unsticks it.

Design notes (adapter-stop channel)

  • PostDebuggerEvent intercepts AdapterStoppedEventType and routes the reason to the internal channel. If m_inAdapterWait is true the in-flight ExecuteAdapterAndWait consumes it; otherwise the stop is queued on the worker as a spontaneous handler.
  • ExecuteAdapterAndWait claims the channel for its entire duration (including the silent-resume loop), so a stop between iterations is still consumed by us, not synthesized as spontaneous.
  • TargetExitedEventType / DetachedEventType still go through the dispatcher queue (user-visible) but also signal the adapter-stop channel so any in-flight WaitForAdapterStop unblocks when the target dies.
  • Deleted: m_lastAdapterStopEventConsumed, m_suppressResumeEvent, the temporary RegisterEventCallback/Semaphore pattern inside ExecuteAdapterAndWait, the dispatcher's entire AdapterStoppedEventType-specific block, and the Go-from-dispatcher hack.
  • AdapterStoppedEventType stays in the public enum so adapters keep posting it the same way. Zero source changes across the 8 adapters.

Public API compatibility

  • Every public method keeps its existing signature; sync XxxAndWait only gains an appended optional timeout parameter defaulted to infinite. Existing callers in core/ffi.cpp and ui/uinotification.cpp don't change.

Spontaneous-stop case (e.g. si typed in the LLDB REPL)

When the user issues a command directly to the underlying adapter REPL while no controller op is queued, the engine runs and stops. m_inAdapterWait is false at that point, so PostDebuggerEvent Submits HandleSpontaneousAdapterStop onto the worker, which updates caches and calls NotifyStopped — same observable behavior as the dispatcher synthesis the old code did at debuggercontroller.cpp:2279, just on the worker. Only one engine op can run at a time, so an adapter stop is unambiguously either "the response to whatever the worker last asked for" (consumed by WaitForAdapterStop) or "the user did this themselves while we were idle" (queued as spontaneous).

Out of scope (follow-ups)

  • Adapter-internal threads: DbgEngAdapter::Attach's spawned thread (BINARYNINJA-1A), DbgEngAdapter::EngineLoop, LLDB EventListener, RSP send/receive. These need adapter destructors to act as join barriers — separate from controller-side scheduling.
  • m_targetControlMutex cleanup. Redundant once the queue serializes work, but left in place to minimize behavioral risk.
  • BinaryDataNotification / settings notification unregister in Destroy (BINARYNINJA-75, -2H — the RebaseToAddress family).
  • m_userRequestedBreak should become std::atomic<bool> now that Pause writes it from a non-worker thread.

Test plan

  • Builds cleanly (ninja — both libdebuggercore.dylib and libdebuggerui.dylib)
  • Manual: launch, step, hit breakpoint, pause while running, detach mid-step, close tab mid-operation
  • Verify spontaneous-stop case: type si directly into LLDB while idle, confirm UI updates
  • Conditional breakpoint with false condition: confirm silent resume still works
  • IL stepping: confirm step-into HLIL only surfaces one stop per user request
  • Python regression tests in test/debugger_test.py

🤖 Generated with Claude Code

xusheng6 and others added 4 commits May 20, 2026 15:02
The 17 `Launch`/`Attach`/`Connect`/`Go`/`Step*`/`RunTo*`/`Restart`/`Detach`/
`Quit`/`Pause` methods each spawn a detached `std::thread` that runs the
corresponding `*AndWait` work, capturing `this` by reference. Nothing keeps
the controller alive for the duration of the detached call, so if the
controller's refcount drops to zero before the worker exits (e.g. the user
closes the tab), the worker dereferences a dangling pointer.

Capture a `DbgRef<DebuggerController>` by value into each lambda so the
controller stays alive until the worker thread finishes.

Fixes #1083.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er queue

Replaces the previous "spawn one std::thread per operation and detach it"
pattern with a single persistent worker thread per controller plus a FIFO
work queue. Every public Xxx()/XxxAndWait() method now goes through Submit().
The worker thread is owned by the controller, started in the constructor
and joined in the destructor (after draining), so any task can no longer
outlive the controller it touches -- the lifetime guarantee is structural
rather than reliant on per-call DbgRef captures (which this PR removes).

Highlights:
* Submit() detects re-entrant calls from the worker itself and runs them
  inline, so e.g. RestartAndWaitOnWorker can call QuitAndWaitOnWorker and
  LaunchAndWaitOnWorker without deadlocking on the queue.
* SubmitAndWait() submits a task and waits on the resulting future. Every
  XxxAndWait now takes an optional std::chrono::milliseconds timeout,
  defaulting to milliseconds::max() ("wait forever") so existing callers
  in ffi.cpp and uinotification.cpp keep their current behavior. If a
  non-infinite timeout elapses, the engine is signaled to break and we
  still wait for the in-flight op to settle before returning.
* Pause()/PauseAndWait() are special-cased: they bypass the queue and call
  m_adapter->BreakInto() directly on the caller's thread. The worker is
  blocked inside ExecuteAdapterAndWait for whatever resume op is in
  flight; BreakInto unsticks it. PauseAndWait then waits for the worker
  to drain past the running task by submitting a no-op probe.

The existing public/internal split (XxxAndWaitInternal does the actual
work; the old XxxAndWait was the lock-Internal-notify wrapper) is
preserved. The old wrapper is renamed to XxxAndWaitOnWorker (private) and
the new public XxxAndWait(timeout) is a thin SubmitAndWait wrapper around
it. This keeps the public API surface identical to today for any caller
that doesn't pass a timeout.

Out of scope for this PR: adapter-internal threads (DbgEngAdapter::Attach
spawned thread, EngineLoop, LLDB EventListener, RSP send/receive loops).
Those have their own lifetime concerns and will be handled separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stop machinery

Adapter stops are now an internal signal from the adapter thread to the
worker, not events on the public dispatcher queue. PostDebuggerEvent
intercepts AdapterStoppedEventType and delivers the stop reason to a
new mutex+condvar pair (m_adapterStopMutex / m_adapterStopCv /
m_adapterStopPending). The worker consumes them via two paths:

1. In-flight ops: ExecuteAdapterAndWait now claims the channel for its
   entire duration (m_inAdapterWait = true), kicks off the adapter
   operation, then loops on WaitForAdapterStop. The conditional-
   breakpoint check moves here -- on a false condition we silently
   resume via m_adapter->Go() and wait again, without releasing the
   channel. This eliminates the old "drive m_adapter->Go() from the
   dispatcher thread" hack and the m_suppressResumeEvent flag that
   papered over it.

2. Spontaneous stops: when the user types e.g. `si` directly into the
   LLDB REPL while no controller op is in flight, m_inAdapterWait is
   false, so PostDebuggerEvent queues HandleSpontaneousAdapterStop on
   the worker -- which updates caches and calls NotifyStopped, the
   same effect as the dispatcher synthesis the old code did at
   debuggercontroller.cpp:2279.

Target-exit / detach events continue through the dispatcher queue
(they're user-visible) but ALSO signal the adapter-stop channel so an
in-flight WaitForAdapterStop unblocks when the target dies.

The dispatcher (DebuggerMainThread) no longer has any
AdapterStoppedEventType-specific code paths. m_lastAdapterStopEventConsumed
and m_suppressResumeEvent are removed; the temporary
RegisterEventCallback/Semaphore pattern inside ExecuteAdapterAndWait is
replaced by the direct WaitForAdapterStop wait.

AdapterStoppedEventType remains in the public enum so adapters keep
posting it the same way -- the change is purely in how the controller
routes it once it arrives at PostDebuggerEvent. Adapter implementations
need no changes.

Stacked on the worker-queue refactor (this PR / #1087). Implements
#1089.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs in how interrupting ops (Restart/Quit/Detach) interact with the
worker queue:

1. The public Restart/Quit/Detach methods just Submit'd their work to
   the queue. If a resume op (Go/Step/RunTo) was in flight, the worker
   was blocked inside WaitForAdapterStop -- the queued Restart/Quit/Detach
   would sit indefinitely with nothing to interrupt the running target.
   To the user, clicking Restart while running looked like a no-op.

   Fix: these methods now call RequestInterrupt() (a small helper that
   sets m_userRequestedBreak and invokes m_adapter->BreakInto()) on the
   caller's thread before queueing the task -- same pattern as Pause().
   The BreakInto unsticks the in-flight op's WaitForAdapterStop, that
   op drains, and the queued Restart/Quit/Detach runs next. Applied to
   both the async public methods and their *AndWait(timeout) variants.

2. QuitAndWaitOnWorker, when it observed IsRunning() inside a re-entrant
   call (e.g. from Restart), called the public PauseAndWait() -- which
   does BreakInto + Submit([]{}).wait(). On the worker thread Submit
   detects re-entry and runs the empty lambda inline, so the wait was
   a no-op. We then issued Quit on a still-running target, which DbgEng
   in particular doesn't handle well.

   Fix: switch that call to PauseAndWaitInternal(), which routes through
   ExecuteAdapterAndWait(Pause) and actually waits for the engine via
   the adapter-stop channel. With (1) in place this branch should rarely
   fire, but it's the correct defensive behavior.

Also pulled the BreakInto + m_userRequestedBreak pair out of Pause and
PauseAndWait into the shared RequestInterrupt() helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant