From b981ccc22fc49ce9ccd0c4b35084932ebc8f1985 Mon Sep 17 00:00:00 2001
From: Max042004 <sau525@gmail.com>
Date: Sat, 6 Jun 2026 21:09:52 +0800
Subject: [PATCH] Join worker vCPUs before tearing down guest memory

On exit_group the main thread leaves vcpu_run_loop as soon as it
observes the exit-group flag and proceeds to cleanup_main_resources(),
which unmaps the guest slab via guest_destroy(). Sibling vCPU threads
may still be mid-iteration in their own run loops (e.g. in
shim_globals_recompute_attention, which touches guest memory). A worker
that reads the slab after the main thread frees it faults at the host
level and the elfuse process dies with SIGSEGV, so a guest that
requested exit_group(0) is reported as exit 139.

This was masked until now because workloads that exercise it
(multi-threaded JVMs) crashed earlier; with the fault-delivery fix javac
runs to completion and reaches the exit_group teardown, exposing the
race.

Have the main thread call thread_join_workers() after vcpu_run_loop()
returns and before any teardown. It waits for the workers to wind down
(they respond to the hv_vcpus_exit() that exit_group already issued) and
is a no-op once they have. javac now exits 0.

Call gdb_stub_shutdown() before thread_join_workers() rather than after:
a worker parked in gdb_stub_handle_stop() stays active (not deactivated)
until gdb_stub_shutdown() broadcasts resume_cond, so joining first only
times out and detaches it while it is still paused in the GDB stop,
reintroducing the same freed-memory race whenever a GDB session is
attached.
---
 src/main.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/main.c b/src/main.c
index b36b4f1..dc5aae3 100644
--- a/src/main.c
+++ b/src/main.c
@@ -35,6 +35,7 @@
 
 #include "runtime/forkipc.h"
 #include "runtime/proctitle.h"
+#include "runtime/thread.h"
 
 #include "syscall/fuse.h"
 #include "syscall/path.h"
@@ -547,9 +548,22 @@ int main(int argc, char **argv)
      */
     int exit_code = vcpu_run_loop(vcpu, vexit, &g, verbose, timeout_sec);
 
-    /* Tear down debugger state before freeing guest/vCPU resources. */
+    /* Tear down debugger state before joining workers: a worker parked in
+     * gdb_stub_handle_stop() stays active (not deactivated) until this
+     * broadcasts resume_cond, so joining first would just time out and
+     * detach it while it is still paused. */
     gdb_stub_shutdown();
 
+    /* Wait for worker vCPU threads to stop before tearing down guest memory.
+     * The main thread leaves the run loop as soon as it observes the
+     * exit_group flag, but sibling vCPU threads may still be mid-iteration in
+     * their own run loops (e.g. touching shim_globals). cleanup_main_resources
+     * unmaps the guest slab via guest_destroy, so a still-running worker would
+     * fault on freed guest memory and crash the host with SIGSEGV, masking the
+     * real exit code. thread_join_workers() is a no-op once the workers have
+     * already wound down (the common single-threaded case). */
+    thread_join_workers();
+
     /* Diagnostic counter dump runs before guest_destroy so the shim_data
      * mapping is still valid. ELFUSE_SHIM_STATS is the gate; an unset variable
      * produces no output.