Join worker vCPU threads before tearing down guest memory#85
Conversation
Don't mention this as this commit never existed in public. |
| * fault on freed guest memory and crash the host with SIGSEGV, masking the | ||
| * real exit code. thread_join_workers() is a no-op once the workers have | ||
| * already wound down (the common single-threaded case). */ | ||
| thread_join_workers(); |
There was a problem hiding this comment.
Ordering vs gdb_stub_shutdown() at the next line leaves a residual race. gdb_stub_shutdown() (gdbstub.c:1435-1440) broadcasts resume_cond to release any GDB-paused workers so they can exit. A worker stuck in gdb_stub_handle_stop has active=1 (paused in ptrace-stop, not deactivated), so the new join here polls 100ms, sees no deactivation, and detaches. Then gdb_stub_shutdown releases the worker, it resumes execution, and main proceeds to shim_globals_counters_dump / cleanup_main_resources -- same SIGSEGV pattern the PR is fixing, just gated on a GDB session.
Move the join one line down so the GDB-paused case is also covered:
int exit_code = vcpu_run_loop(...);
/* Tear down debugger state before freeing guest/vCPU resources. */
gdb_stub_shutdown();
thread_join_workers();
if (shim_globals_stats_enabled())
shim_globals_counters_dump(&g);| * fault on freed guest memory and crash the host with SIGSEGV, masking the | ||
| * real exit code. thread_join_workers() is a no-op once the workers have | ||
| * already wound down (the common single-threaded case). */ | ||
| thread_join_workers(); |
There was a problem hiding this comment.
sc_exit_group at syscall.c:719 already calls thread_join_workers(), so this is a second join. Two consequences worth tracking:
- If a worker calls
exit_group, that worker'sthread_join_workerssnapshots all others including main (slot 0, which neverthread_deactivates itself) -- so it polls 100ms then detaches main. Detaching the host process's main pthread is legal but architecturally odd. - After that first call returns and main eventually runs this new one, any worker that wound down in between has
active=0-- main's call then triespthread_joinon a thread that was already joined or detached. POSIX says undefined; macOS libpthread returnsEINVALin practice (return is unchecked at thread.c:354), so not catastrophic but contract drift.
Cleaner: drop the thread_join_workers() from sc_exit_group. The exit_group caller still does proc_request_exit_group + thread_for_each(thread_force_exit_cb) which forces every sibling out of hv_vcpu_run; main observes the flag, exits the loop, and runs the single authoritative join here. Removes the detach-main wart and the double-join. Can be a follow-up -- not a blocker for this PR.
951abef to
24287df
Compare
On exit_group the main thread leaves vcpu_run_loop as soon as it observes the exit-group flag and proceeds to cleanup_main_resources(), which unmaps the guest slab via guest_destroy(). Sibling vCPU threads may still be mid-iteration in their own run loops (e.g. in shim_globals_recompute_attention, which touches guest memory). A worker that reads the slab after the main thread frees it faults at the host level and the elfuse process dies with SIGSEGV, so a guest that requested exit_group(0) is reported as exit 139. This was masked until now because workloads that exercise it (multi-threaded JVMs) crashed earlier; with the fault-delivery fix javac runs to completion and reaches the exit_group teardown, exposing the race. Have the main thread call thread_join_workers() after vcpu_run_loop() returns and before any teardown. It waits for the workers to wind down (they respond to the hv_vcpus_exit() that exit_group already issued) and is a no-op once they have. javac now exits 0. Call gdb_stub_shutdown() before thread_join_workers() rather than after: a worker parked in gdb_stub_handle_stop() stays active (not deactivated) until gdb_stub_shutdown() broadcasts resume_cond, so joining first only times out and detaches it while it is still paused in the GDB stop, reintroducing the same freed-memory race whenever a GDB session is attached.
24287df to
b981ccc
Compare
On exit_group the main thread leaves vcpu_run_loop as soon as it observes
the exit-group flag and proceeds to cleanup_main_resources(), which unmaps
the guest slab via guest_destroy(). Sibling vCPU threads may still be
mid-iteration in their own run loops (e.g. in shim_globals_recompute_attention,
which touches guest memory). A worker that reads the slab after the main
thread frees it faults at the host level and the elfuse process dies with
SIGSEGV, so a guest that requested exit_group(0) is reported as exit 139.
This was masked until now because workloads that exercise it (multi-threaded
JVMs) crashed earlier; with the fault-delivery fix javac runs to completion
and reaches the exit_group teardown, exposing the race.
Have the main thread call thread_join_workers() after vcpu_run_loop()
returns and before any teardown. It waits for the workers to wind down
(they respond to the hv_vcpus_exit() that exit_group already issued) and is
a no-op once they have. javac now exits 0.
Summary by cubic
Join worker vCPU threads before tearing down guest memory to prevent host SIGSEGVs and preserve correct exit codes on exit_group. Shut down the debugger first so paused GDB sessions don’t cause join timeouts.
Written for commit b981ccc. Summary will update on new commits.