shim/streaming: return Stream on first bridge direction completion#167
shim/streaming: return Stream on first bridge direction completion#167eginez wants to merge 4 commits intocontainerd:mainfrom
Conversation
The Stream handler waited for both bridge directions to finish. When the VM closes its side first, the ttrpc->vm goroutine stays blocked on srv.Recv; the server stream never closes and the daemon's ReceiveStream never sees EOF. Return on the first direction. The other unblocks via deferred vmConn.Close and ttrpc's per-handler context cancellation. Signed-off-by: Esteban Ginez <175813+eginez@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates the shim’s TTRPC streaming Stream handler to return when either bridge direction completes, ensuring the server-side stream closes promptly (preventing daemon-side ReceiveStream hangs when the VM closes first).
Changes:
- Change
service.Streamto return after the first bridge direction completes (instead of waiting for both). - Add end-to-end tests validating stream closure on VM-side close and repeated sequential streams.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
plugins/shim/streaming/plugin.go |
Returns from Stream on first bridge completion to ensure the server stream closes and unblocks daemon ReceiveStream. |
plugins/shim/streaming/plugin_test.go |
Adds regression/stress tests that reproduce the hang and validate prompt stream release. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Drop unused ptypes import and the dead var _ = ptypes.Empty{} guard.
- Check MarshalAnyToProto error in TestMultipleStreamsDoNotAccumulate
for consistency with TestStreamReturnsOnVMClose.
- Capture ttrpc.Server.Serve error and surface it on cleanup so test
infra failures are not silently masked. Allow ErrServerClosed and
net.ErrClosed as expected outcomes when the cleanup closes the
server.
Signed-off-by: Esteban Ginez <175813+eginez@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- TestStreamReturnsOnVMClose: fail on non-EOF close errors instead of silently logging. - TestMultipleStreamsDoNotAccumulate: assert each Recv returns an EOF-like close, not just any value. - Use a per-iteration StreamInit ID to avoid masking ID-uniqueness bugs. Signed-off-by: Esteban Ginez <175813+eginez@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I was able to implement some stress testing on the bidirection stream and this issue fixed one case, but regressed in another. I'll keep look into this one. |
Signed-off-by: Esteban Ginez <175813+eginez@users.noreply.github.com>
Summary
The shim's TTRPC streaming
Streamhandler waits for both bridgedirections (
ttrpc->vmandvm->ttrpc) to finish before returning.When the VM closes its side first — the common case for short
transfers — the
ttrpc->vmgoroutine remains parked onsrv.Recv,the server stream never closes, and the daemon's
ReceiveStreamnever observes EOF.
vmConn.Closeandttrpc's per-handler context cancellation.
Why
end with the VM closing its data channel after the payload is
delivered. Without this fix the daemon-side stream sits open
indefinitely, surfacing as a hang on follow-up RPCs sharing the
connection's flow-control state.
server: cancel per-stream context when handler returns ttrpc#231.
Testing
TestStreamReturnsOnVMClose— VM closes its side; handlermust release the server stream within 3s. Times out without the
fix.
TestMultipleStreamsDoNotAccumulate— five sequentialstreams, each released by the VM, each must close promptly.
go test ./plugins/shim/streaming/...clean. (Some./integration/...tests require
libkrun.dyliband were skipped locally; CI coversthose.)