Skip to content

fix(iocp): bound wait reactor WSAPoll timeout to prevent lost-wakeup …#259

Merged
sgerbino merged 1 commit into
cppalliance:developfrom
sgerbino:fix/wait-reactor-bounded-timeout
May 30, 2026
Merged

fix(iocp): bound wait reactor WSAPoll timeout to prevent lost-wakeup …#259
sgerbino merged 1 commit into
cppalliance:developfrom
sgerbino:fix/wait-reactor-bounded-timeout

Conversation

@sgerbino
Copy link
Copy Markdown
Collaborator

…hang

The auxiliary wait reactor blocked in WSAPoll(-1, infinite), relying entirely on the self-pipe wakeup. wake_self() coalesces wakes via the wake_pending_ flag and ignores send()'s return value, so a failed or lost wakeup leaves wake_pending_ stuck true: every subsequent wake is coalesced away and the reactor never re-checks pending_register_ / pending_cancel_ / stop_. A newly registered wait fd then never enters the poll set and its readiness is never detected, hanging ioc.run() forever.

This surfaced as Windows coverage-build (gcc + gcov) timeouts in the local_stream_socket.iocp, native.local_stream_socket.iocp, and wait.iocp suites, whose newly enabled local-stream-on-IOCP tests exercise acceptor wait readiness through the reactor. The heavy gcov instrumentation widens the timing window; the regular (clang/msvc) CI and uninstrumented builds pass.

Use a bounded 500 ms WSAPoll timeout as a safety net so a missed wakeup costs at most one poll interval of latency instead of a permanent hang. This mirrors the existing 500 ms GQCS safety timeout in win_scheduler.

…hang

The auxiliary wait reactor blocked in WSAPoll(-1, infinite), relying
entirely on the self-pipe wakeup. wake_self() coalesces wakes via the
wake_pending_ flag and ignores send()'s return value, so a failed or
lost wakeup leaves wake_pending_ stuck true: every subsequent wake is
coalesced away and the reactor never re-checks pending_register_ /
pending_cancel_ / stop_. A newly registered wait fd then never enters
the poll set and its readiness is never detected, hanging ioc.run()
forever.

This surfaced as Windows coverage-build (gcc + gcov) timeouts in the
local_stream_socket.iocp, native.local_stream_socket.iocp, and
wait.iocp suites, whose newly enabled local-stream-on-IOCP tests
exercise acceptor wait readiness through the reactor. The heavy gcov
instrumentation widens the timing window; the regular (clang/msvc) CI
and uninstrumented builds pass.

Use a bounded 500 ms WSAPoll timeout as a safety net so a missed
wakeup costs at most one poll interval of latency instead of a
permanent hang. This mirrors the existing 500 ms GQCS safety timeout
in win_scheduler.
@cppalliance-bot
Copy link
Copy Markdown

An automated preview of the documentation is available at https://259.corosio.prtest3.cppalliance.org/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-05-29 23:33:35 UTC

@cppalliance-bot
Copy link
Copy Markdown

GCOVR code coverage report https://259.corosio.prtest3.cppalliance.org/gcovr/index.html
LCOV code coverage report https://259.corosio.prtest3.cppalliance.org/genhtml/index.html
Coverage Diff Report https://259.corosio.prtest3.cppalliance.org/diff-report/index.html

Build time: 2026-05-29 23:44:21 UTC

@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.77%. Comparing base (2046d55) to head (f0e6000).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #259   +/-   ##
========================================
  Coverage    77.77%   77.77%           
========================================
  Files           96       96           
  Lines         7262     7262           
  Branches      1773     1773           
========================================
  Hits          5648     5648           
  Misses        1104     1104           
  Partials       510      510           

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2046d55...f0e6000. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sgerbino sgerbino merged commit ca33581 into cppalliance:develop May 30, 2026
41 checks passed
@sgerbino sgerbino deleted the fix/wait-reactor-bounded-timeout branch May 30, 2026 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants