hook pthread_mutex_lock/pthread_mutex_unlock/WaitOnAddress#444
Conversation
Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/a4199b83-4d02-43f6-aa5a-f79a5303135a Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/a4199b83-4d02-43f6-aa5a-f79a5303135a Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #444 +/- ##
==========================================
+ Coverage 74.53% 74.78% +0.25%
==========================================
Files 45 45
Lines 4744 4744
==========================================
+ Hits 3536 3548 +12
+ Misses 1208 1196 -12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
loongs-zhang
left a comment
There was a problem hiding this comment.
Locate the root cause of CI fails and fix CI.
…ss hooks, cfg-gate test Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/66bb6c33-ec19-440e-82f5-08d9566b66a5 Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/66bb6c33-ec19-440e-82f5-08d9566b66a5 Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Root causes identified and fixed in commits Linux ( macOS (SIGBUS/SIGILL crash):
Windows (stack overflow / test timeout on all variants): The
|
loongs-zhang
left a comment
There was a problem hiding this comment.
Locate the root cause of CI fails and fix CI.
Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/541adcfb-08ea-4748-b20e-89587e89b562 Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Two remaining CI failures fixed in commit Linux aarch64/riscv64 preemptive deadlock ( Windows nightly MSVC |
loongs-zhang
left a comment
There was a problem hiding this comment.
Locate the root fail of https://github.com/acl-dev/open-coroutine/actions/runs/25895183498?pr=444 , then make fix.
…O overhead and recursion Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/78752aa5-9ec8-47c6-ac7c-b34468cc19ce Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Root causes identified and fixed in commit Windows (all variants) — socket test timeouts:
Fix: macOS (unused import Linux x86_64 / loongarch64 SIGSEGV in |
Continues and finishes #443.
Changes
hook/src/syscall/unix.rs: Enable hooks forpthread_mutex_lockandpthread_mutex_unlock. On macOS,once_cell::sync::Lazyinitialization recurses throughpthread_mutex_t, so the hooks useAtomicPtr+ a thread-local re-entrancy guard (PTHREAD_MUTEX_IN_HOOK) to break the cycle. On Linux and other non-macOS Unix platforms,once_cell::Lazyuses futex instead of pthread_mutex, so the standardimpl_hook!macro is used — this also avoids a cross-coroutine deadlock that the per-thread flag would cause under preemptive scheduling (a coroutine yielding mid-NIO-loop leaves the flag set, blocking any other coroutine from using the NIO path).hook/src/syscall/windows.rs: Enable hook forWaitOnAddress. The hook is written manually (rather than viaimpl_hook!) so that theOnceCellstoring the real function pointer is pre-initialised before any hook is active, preventing parking_lot recursion during initialisation. No re-entrancy guard is needed becauseNioWaitOnAddressSyscallno longer callsEventLoops::wait_event.core/src/syscall/windows/WaitOnAddress.rs:NioWaitOnAddressSyscall::WaitOnAddressis a simple pass-through that delegates directly to the real function. A NIO polling loop was tried but caused two problems: (1) recursion —EventLoops::wait_eventaccesses DashMap/parking_lot internals which callWaitOnAddress, creating an infinite chain; (2) excessive overhead — on nightly Windows,stdusesWaitOnAddressfor many internal mutex operations, and each call from within a coroutine incurred ~11 ms overhead, causingsocket_co_serverand similar tests to exceed their 30 s timeout. Passing through directly avoids both issues.core/Cargo.toml: Addlibcto[target.'cfg(unix)'.dev-dependencies]to allow integration tests to use libc types directly.core/tests/scheduler.rs: Addscheduler_pthread_mutex_locktest (gated on#[cfg(all(unix, feature = "syscall"))]) that verifiespthread_mutex_lockandpthread_mutex_unlockwork correctly within coroutines using a shared static mutex.Make sure that: