fix(bindgen): avoid async future and stream host injection deadlocks#1437
fix(bindgen): avoid async future and stream host injection deadlocks#1437andreiltd wants to merge 2 commits intobytecodealliance:mainfrom
Conversation
8353412 to
2f3384d
Compare
|
This requires some more thinking - I will keep it as a draft until I figure out what caused a regression. |
The spec is subtle here but IIUC we should start host side future and
stream injection without awaiting it before the canonical read operation
can see BLOCKED.
So if the copy does not immediately produce a pending event, the
implementation must set ASYNC_COPYING and return BLOCKED. Our old code
awaited the host promise during host injection before allowing this path
to return BLOCKED, which deadlocks the guest scheduler.
This is consistent with python code from canon ABI spec:
```python
if not e.has_pending_event():
if not opts.async_:
e.state = CopyState.SYNC_COPYING
thread.wait_until(e.has_pending_event)
else:
e.state = CopyState.ASYNC_COPYING
return [BLOCKED]
code,index,payload = e.get_pending_event()
assert(code == event_code and index == i)
return [payload]
```
This is the same for futures and streams. An example of deadlock on the
guest side with an old implementation looks like this:
```rust
let (mut tx, rx) = wit_stream::new();
futures::join!(
async {
wasi::cli::stdout::write_via_stream(rx).await.unwrap();
},
async {
tx.write(b"hello, world\n".to_vec()).await;
drop(tx);
},
);
```
I've added concurrency tests for both streams and futures that this
patch fixes.
2f3384d to
36fdf90
Compare
|
I think the problem is in this code: I'm not sure why we call e.copy(thread.task.inst, buffer, on_copy, on_copy_done)
if not e.has_pending_event():
if not opts.async_:
e.state = CopyState.SYNC_COPYING
thread.wait_until(e.has_pending_event)
else:
e.state = CopyState.ASYNC_COPYING
return [BLOCKED]
code,index,payload = e.get_pending_event() # <= here, do not copy again but just get result from event
assert(code == event_code and index == i and payload != BLOCKED)
return [payload]The way I understand this is that after a blocked copy is awaken, the pending event contains the completed copy and calling I can try to fix that, but I won't be confident to land it without thorough review 😄 |
|
Last commit seems to fix the regression and with previous changes the deadlock is also fixed. |
vados-cosmonic
left a comment
There was a problem hiding this comment.
LGTM 🚀
I left some nits, but in general this looks good to me -- thanks for adding the new tests and fixing so many impl bugs!
| if (injectedWritePromise) {{ | ||
| const cleanupFn = await injectedWritePromise; | ||
| cleanupFn(); | ||
| if (this.hasPendingEvent()) {{ |
There was a problem hiding this comment.
NIT: what do you think about doing this inside the cleanup function itself?
| if (injectHostWrite && !this.hasPendingEvent()) {{ | ||
| injectedWritePromise = this.#hostInjectFn({{ count }}); |
There was a problem hiding this comment.
NIT: What do you think about moving this inside the host injection function?
i.e. having it check if there is an event and do nothing?
Definitely don't feel strongly about this, but this feels like it's saying "if there's a pending event, don't write" and the host injection function can be smart enough to essentially turn into a no-op itself, I think.
| this.setCopyState({stream_end_class}.CopyState.ASYNC_COPYING); | ||
| {debug_log_fn}('[{stream_end_class}#copy()] blocked', {{ componentIdx, eventCode, self: this }}); | ||
| if (injectedWritePromise) {{ | ||
| injectedWritePromise.then( |
There was a problem hiding this comment.
Would you mind adding a quick explainer on why we don't wait for the injected write to actually complete here?
And with the previous nits about trying to contain the logic to the injected write promise function itself or the cleanup fn it returns -- would be nice if we could do that here and avoid the chaining here (I think this logic is consistent across places we're using the injected write promise, no?)
| jco: { | ||
| transpile: { | ||
| extraArgs: { | ||
| asyncExports: ["jco:test-components/local-run-async#run"], |
There was a problem hiding this comment.
You shouldn't need to add this if the func is async in the WIT -- did you find that you had to? is there a bug there?
| import { setupAsyncTest } from "../helpers.js"; | ||
| import { LOCAL_TEST_COMPONENTS_DIR } from "../common.js"; | ||
|
|
||
| function deferred() { |
There was a problem hiding this comment.
This seems equivalent to Promise.withResolvers ? We use a pony fill in the bindgen code but here you can just use it IMO
The spec is subtle here but IIUC we should start host side future and stream injection without awaiting it before the canonical read operation can see
BLOCKED.So if the copy does not immediately produce a pending event, the implementation must set
ASYNC_COPYINGand returnBLOCKED. Our old code awaited the host promise during host injection before allowing this path to returnBLOCKED, which deadlocks the guest scheduler.This is consistent with python code from canon ABI spec:
This is the same for futures and streams. An example of deadlock on the guest side with an old implementation looks like this:
I've added concurrency tests for both streams and futures that this patch fixes.