From b981ccc22fc49ce9ccd0c4b35084932ebc8f1985 Mon Sep 17 00:00:00 2001 From: Max042004 Date: Sat, 6 Jun 2026 21:09:52 +0800 Subject: [PATCH] Join worker vCPUs before tearing down guest memory On exit_group the main thread leaves vcpu_run_loop as soon as it observes the exit-group flag and proceeds to cleanup_main_resources(), which unmaps the guest slab via guest_destroy(). Sibling vCPU threads may still be mid-iteration in their own run loops (e.g. in shim_globals_recompute_attention, which touches guest memory). A worker that reads the slab after the main thread frees it faults at the host level and the elfuse process dies with SIGSEGV, so a guest that requested exit_group(0) is reported as exit 139. This was masked until now because workloads that exercise it (multi-threaded JVMs) crashed earlier; with the fault-delivery fix javac runs to completion and reaches the exit_group teardown, exposing the race. Have the main thread call thread_join_workers() after vcpu_run_loop() returns and before any teardown. It waits for the workers to wind down (they respond to the hv_vcpus_exit() that exit_group already issued) and is a no-op once they have. javac now exits 0. Call gdb_stub_shutdown() before thread_join_workers() rather than after: a worker parked in gdb_stub_handle_stop() stays active (not deactivated) until gdb_stub_shutdown() broadcasts resume_cond, so joining first only times out and detaches it while it is still paused in the GDB stop, reintroducing the same freed-memory race whenever a GDB session is attached. --- src/main.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/src/main.c b/src/main.c index b36b4f1..dc5aae3 100644 --- a/src/main.c +++ b/src/main.c @@ -35,6 +35,7 @@ #include "runtime/forkipc.h" #include "runtime/proctitle.h" +#include "runtime/thread.h" #include "syscall/fuse.h" #include "syscall/path.h" @@ -547,9 +548,22 @@ int main(int argc, char **argv) */ int exit_code = vcpu_run_loop(vcpu, vexit, &g, verbose, timeout_sec); - /* Tear down debugger state before freeing guest/vCPU resources. */ + /* Tear down debugger state before joining workers: a worker parked in + * gdb_stub_handle_stop() stays active (not deactivated) until this + * broadcasts resume_cond, so joining first would just time out and + * detach it while it is still paused. */ gdb_stub_shutdown(); + /* Wait for worker vCPU threads to stop before tearing down guest memory. + * The main thread leaves the run loop as soon as it observes the + * exit_group flag, but sibling vCPU threads may still be mid-iteration in + * their own run loops (e.g. touching shim_globals). cleanup_main_resources + * unmaps the guest slab via guest_destroy, so a still-running worker would + * fault on freed guest memory and crash the host with SIGSEGV, masking the + * real exit code. thread_join_workers() is a no-op once the workers have + * already wound down (the common single-threaded case). */ + thread_join_workers(); + /* Diagnostic counter dump runs before guest_destroy so the shim_data * mapping is still valid. ELFUSE_SHIM_STATS is the gate; an unset variable * produces no output.