fix(iocp): bound wait reactor WSAPoll timeout to prevent lost-wakeup …#259
Conversation
…hang The auxiliary wait reactor blocked in WSAPoll(-1, infinite), relying entirely on the self-pipe wakeup. wake_self() coalesces wakes via the wake_pending_ flag and ignores send()'s return value, so a failed or lost wakeup leaves wake_pending_ stuck true: every subsequent wake is coalesced away and the reactor never re-checks pending_register_ / pending_cancel_ / stop_. A newly registered wait fd then never enters the poll set and its readiness is never detected, hanging ioc.run() forever. This surfaced as Windows coverage-build (gcc + gcov) timeouts in the local_stream_socket.iocp, native.local_stream_socket.iocp, and wait.iocp suites, whose newly enabled local-stream-on-IOCP tests exercise acceptor wait readiness through the reactor. The heavy gcov instrumentation widens the timing window; the regular (clang/msvc) CI and uninstrumented builds pass. Use a bounded 500 ms WSAPoll timeout as a safety net so a missed wakeup costs at most one poll interval of latency instead of a permanent hang. This mirrors the existing 500 ms GQCS safety timeout in win_scheduler.
|
An automated preview of the documentation is available at https://259.corosio.prtest3.cppalliance.org/index.html If more commits are pushed to the pull request, the docs will rebuild at the same URL. 2026-05-29 23:33:35 UTC |
|
GCOVR code coverage report https://259.corosio.prtest3.cppalliance.org/gcovr/index.html Build time: 2026-05-29 23:44:21 UTC |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #259 +/- ##
========================================
Coverage 77.77% 77.77%
========================================
Files 96 96
Lines 7262 7262
Branches 1773 1773
========================================
Hits 5648 5648
Misses 1104 1104
Partials 510 510 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
…hang
The auxiliary wait reactor blocked in WSAPoll(-1, infinite), relying entirely on the self-pipe wakeup. wake_self() coalesces wakes via the wake_pending_ flag and ignores send()'s return value, so a failed or lost wakeup leaves wake_pending_ stuck true: every subsequent wake is coalesced away and the reactor never re-checks pending_register_ / pending_cancel_ / stop_. A newly registered wait fd then never enters the poll set and its readiness is never detected, hanging ioc.run() forever.
This surfaced as Windows coverage-build (gcc + gcov) timeouts in the local_stream_socket.iocp, native.local_stream_socket.iocp, and wait.iocp suites, whose newly enabled local-stream-on-IOCP tests exercise acceptor wait readiness through the reactor. The heavy gcov instrumentation widens the timing window; the regular (clang/msvc) CI and uninstrumented builds pass.
Use a bounded 500 ms WSAPoll timeout as a safety net so a missed wakeup costs at most one poll interval of latency instead of a permanent hang. This mirrors the existing 500 ms GQCS safety timeout in win_scheduler.