netty: Propagate initial handshake failure before close#12626
netty: Propagate initial handshake failure before close#12626becomeStar wants to merge 1 commit intogrpc:masterfrom
Conversation
When a handshake failure occurs before any writes are buffered on the server side, WriteBufferingAndExceptionHandler can record the failure internally but never surface it to downstream inbound handlers. This makes the original handshake error unobservable and complicates debugging and instrumentation. Propagate only the first failure via exceptionCaught, gated on the absence of a previous failure, so that the canonical error becomes observable while avoiding duplicate propagation and preserving existing close semantics.
|
Replied my thought on issue #8495. |
|
Thank you very much for your detailed analysis and for taking the time to simulate the failure. Your observation about the object handle changing is incredibly helpful and provides a clear clue as to why the original root cause may be getting lost. It does seem that failCause can effectively be reset when a new instance of WriteBufferingAndExceptionHandler is introduced into the pipeline, which explains why a secondary exception ends up being surfaced instead of the original handshake failure. I’ll dig further into where and why the handler instance is being replaced and look for a way to ensure the first meaningful exception is preserved across instances. Based on your feedback, I’ll work toward a refined solution that addresses this state-loss issue directly. Once I have a clearer fix, I can either update this PR or follow up with a new one, depending on what you think makes the most sense. Thanks again for the detailed investigation and guidance — it’s been extremely helpful. |
Handshake failures that occur before any writes are buffered can currently be lost to downstream inbound handlers. In this case, the failure is surfaced via the write / promise path, but exceptionCaught is never observed by handlers placed after WriteBufferingAndExceptionHandler.
This makes the original handshake error difficult to diagnose and inconsistent with failures that occur after buffering has started.
This change propagates the exception via fireExceptionCaught before closing the channel when handling the first failure on an active channel. Doing so preserves the original failure while the pipeline is still intact and avoids losing the exception due to close-triggered teardown or reentrancy.
Fixes #8495