Skip to content

fix(bindgen): avoid async future and stream host injection deadlocks#1437

Open
andreiltd wants to merge 2 commits intobytecodealliance:mainfrom
andreiltd:fix/future-deadlock
Open

fix(bindgen): avoid async future and stream host injection deadlocks#1437
andreiltd wants to merge 2 commits intobytecodealliance:mainfrom
andreiltd:fix/future-deadlock

Conversation

@andreiltd
Copy link
Copy Markdown
Member

@andreiltd andreiltd commented Apr 29, 2026

The spec is subtle here but IIUC we should start host side future and stream injection without awaiting it before the canonical read operation can see BLOCKED.

So if the copy does not immediately produce a pending event, the implementation must set ASYNC_COPYING and return BLOCKED. Our old code awaited the host promise during host injection before allowing this path to return BLOCKED, which deadlocks the guest scheduler.

This is consistent with python code from canon ABI spec:

  if not e.has_pending_event():
    if not opts.async_:
      e.state = CopyState.SYNC_COPYING
      thread.wait_until(e.has_pending_event)
    else:
      e.state = CopyState.ASYNC_COPYING
      return [BLOCKED]
  code,index,payload = e.get_pending_event()
  assert(code == event_code and index == i)
  return [payload]

This is the same for futures and streams. An example of deadlock on the guest side with an old implementation looks like this:

let (mut tx, rx) = wit_stream::new();
futures::join!(
    async {
        wasi::cli::stdout::write_via_stream(rx).await.unwrap();
    },
    async {
        tx.write(b"hello, world\n".to_vec()).await;
        drop(tx);
    },
);

I've added concurrency tests for both streams and futures that this patch fixes.

@andreiltd andreiltd force-pushed the fix/future-deadlock branch from 8353412 to 2f3384d Compare April 29, 2026 17:28
@andreiltd andreiltd marked this pull request as draft April 29, 2026 18:08
@andreiltd
Copy link
Copy Markdown
Member Author

This requires some more thinking - I will keep it as a draft until I figure out what caused a regression.

The spec is subtle here but IIUC we should start host side future and
stream injection without awaiting it before the canonical read operation
can see BLOCKED.

So if the copy does not immediately produce a pending event, the
implementation must set ASYNC_COPYING and return BLOCKED. Our old code
awaited the host promise during host injection before allowing this path
to return BLOCKED, which deadlocks the guest scheduler.

This is consistent with python code from canon ABI spec:

```python
  if not e.has_pending_event():
    if not opts.async_:
      e.state = CopyState.SYNC_COPYING
      thread.wait_until(e.has_pending_event)
    else:
      e.state = CopyState.ASYNC_COPYING
      return [BLOCKED]
  code,index,payload = e.get_pending_event()
  assert(code == event_code and index == i)
  return [payload]
```

This is the same for futures and streams. An example of deadlock on the
guest side with an old implementation looks like this:

```rust
let (mut tx, rx) = wit_stream::new();
futures::join!(
    async {
        wasi::cli::stdout::write_via_stream(rx).await.unwrap();
    },
    async {
        tx.write(b"hello, world\n".to_vec()).await;
        drop(tx);
    },
);
```

I've added concurrency tests for both streams and futures that this
patch fixes.
@andreiltd andreiltd force-pushed the fix/future-deadlock branch from 2f3384d to 36fdf90 Compare April 30, 2026 07:24
@andreiltd
Copy link
Copy Markdown
Member Author

I think the problem is in this code:

https://github.com/bytecodealliance/jco/blob/main/crates/js-component-bindgen/src/intrinsics/p3/async_stream.rs#L912

I'm not sure why we call copy() again here instead of getting result from pending event? I believe the python code counterpart does that:

  e.copy(thread.task.inst, buffer, on_copy, on_copy_done)

  if not e.has_pending_event():
    if not opts.async_:
      e.state = CopyState.SYNC_COPYING
      thread.wait_until(e.has_pending_event)
    else:
      e.state = CopyState.ASYNC_COPYING
      return [BLOCKED]
  code,index,payload = e.get_pending_event()  # <= here, do not copy again but just get result from event
  assert(code == event_code and index == i and payload != BLOCKED)
  return [payload]

The way I understand this is that after a blocked copy is awaken, the pending event contains the completed copy and calling copy() again will re-enter the stream state machine? The problem is symmetric so it's the same for read and write.

I can try to fix that, but I won't be confident to land it without thorough review 😄

@andreiltd andreiltd marked this pull request as ready for review April 30, 2026 10:01
@andreiltd
Copy link
Copy Markdown
Member Author

Last commit seems to fix the regression and with previous changes the deadlock is also fixed.

Copy link
Copy Markdown
Collaborator

@vados-cosmonic vados-cosmonic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

I left some nits, but in general this looks good to me -- thanks for adding the new tests and fixing so many impl bugs!

if (injectedWritePromise) {{
const cleanupFn = await injectedWritePromise;
cleanupFn();
if (this.hasPendingEvent()) {{
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: what do you think about doing this inside the cleanup function itself?

Comment on lines +745 to +746
if (injectHostWrite && !this.hasPendingEvent()) {{
injectedWritePromise = this.#hostInjectFn({{ count }});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: What do you think about moving this inside the host injection function?

i.e. having it check if there is an event and do nothing?

Definitely don't feel strongly about this, but this feels like it's saying "if there's a pending event, don't write" and the host injection function can be smart enough to essentially turn into a no-op itself, I think.

this.setCopyState({stream_end_class}.CopyState.ASYNC_COPYING);
{debug_log_fn}('[{stream_end_class}#copy()] blocked', {{ componentIdx, eventCode, self: this }});
if (injectedWritePromise) {{
injectedWritePromise.then(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a quick explainer on why we don't wait for the injected write to actually complete here?

And with the previous nits about trying to contain the logic to the injected write promise function itself or the cleanup fn it returns -- would be nice if we could do that here and avoid the chaining here (I think this logic is consistent across places we're using the injected write promise, no?)

jco: {
transpile: {
extraArgs: {
asyncExports: ["jco:test-components/local-run-async#run"],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to add this if the func is async in the WIT -- did you find that you had to? is there a bug there?

import { setupAsyncTest } from "../helpers.js";
import { LOCAL_TEST_COMPONENTS_DIR } from "../common.js";

function deferred() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems equivalent to Promise.withResolvers ? We use a pony fill in the bindgen code but here you can just use it IMO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants