fix: reconnect MCP client on terminated session#8694
Conversation
There was a problem hiding this comment.
Code Review
This pull request expands the MCP client's reconnection logic to trigger on a broader set of session termination errors, rather than just anyio.ClosedResourceError. It introduces a helper function _is_mcp_reconnect_error to identify these errors, updates the retry decorator, and adds comprehensive unit tests. The review feedback points out a potential NameError at runtime if anyio is not installed, as it is an optional dependency, and suggests guarding the check with "anyio" in globals().
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def _is_mcp_reconnect_error(exc: BaseException) -> bool: | ||
| if isinstance(exc, anyio.ClosedResourceError): | ||
| return True | ||
|
|
||
| message = str(exc).lower() | ||
| return any(marker in message for marker in _MCP_RECONNECT_ERROR_MESSAGES) |
There was a problem hiding this comment.
Since anyio is an optional dependency (imported inside a try...except block on line 100), referencing anyio.ClosedResourceError directly will raise a NameError at runtime if anyio is not installed and _is_mcp_reconnect_error is executed (for example, when call_tool_with_reconnect raises a ValueError because self.session is None). Guarding the check with "anyio" in globals() prevents this runtime error.
| def _is_mcp_reconnect_error(exc: BaseException) -> bool: | |
| if isinstance(exc, anyio.ClosedResourceError): | |
| return True | |
| message = str(exc).lower() | |
| return any(marker in message for marker in _MCP_RECONNECT_ERROR_MESSAGES) | |
| def _is_mcp_reconnect_error(exc: BaseException) -> bool: | |
| if "anyio" in globals() and isinstance(exc, anyio.ClosedResourceError): | |
| return True | |
| message = str(exc).lower() | |
| return any(marker in message for marker in _MCP_RECONNECT_ERROR_MESSAGES) |
There was a problem hiding this comment.
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="tests/unit/test_mcp_client_reconnect.py" line_range="30-33" />
<code_context>
+ }
+
+
+def test_mcp_reconnect_error_detection_is_narrow() -> None:
+ assert mcp_client._is_mcp_reconnect_error(RuntimeError("Session terminated"))
+ assert not mcp_client._is_mcp_reconnect_error(
+ RuntimeError("business flow terminated normally")
+ )
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add more edge cases to the reconnect error detection test to better pin down the intended matching behavior.
Since `_is_mcp_reconnect_error` drives retry behavior, it would be good to exercise a few more explicit cases here. Consider adding:
- An `anyio.ClosedResourceError` to cover the type-based branch.
- A case-insensitivity check (e.g. `"SESSION TERMINATED"`).
- The alternate message `"session was terminated"` from `_MCP_RECONNECT_ERROR_MESSAGES`.
- A `RuntimeError("terminated")` or similar to show that plain "terminated" without "session" is not treated as reconnectable.
Parametrizing these cases would clearly document the contract and help prevent regressions in the matching logic.
Suggested implementation:
```python
import anyio
import pytest
from tenacity import wait_none
from astrbot.core.agent import mcp_client
```
```python
@pytest.mark.parametrize(
("error", "expected"),
[
# Exact known reconnectable message
(RuntimeError("Session terminated"), True),
# Case insensitivity
(RuntimeError("SESSION TERMINATED"), True),
# Alternate configured reconnectable message
(RuntimeError("session was terminated"), True),
# Type-based reconnectable error
(anyio.ClosedResourceError(), True),
# Non-reconnectable messages
(RuntimeError("business flow terminated normally"), False),
# "terminated" without "session" should not be treated as reconnectable
(RuntimeError("terminated"), False),
],
)
def test_mcp_reconnect_error_detection_is_narrow(error: BaseException, expected: bool) -> None:
assert mcp_client._is_mcp_reconnect_error(error) is expected
```
</issue_to_address>
### Comment 2
<location path="tests/unit/test_mcp_client_reconnect.py" line_range="37-46" />
<code_context>
+ )
+
+
+@pytest.mark.asyncio
+async def test_call_tool_reconnects_on_session_terminated(monkeypatch) -> None:
+ monkeypatch.setattr(mcp_client, "wait_exponential", lambda **_: wait_none())
+
+ client = mcp_client.MCPClient()
+ session = FlakyMcpSession()
+ reconnects = 0
+
+ async def reconnect() -> None:
+ nonlocal reconnects
+ reconnects += 1
+ client.session = session
+
+ client.session = session
+ client._reconnect = reconnect
+
+ result = await client.call_tool_with_reconnect(
+ tool_name="lookup",
+ arguments={"url": "https://example.com"},
+ read_timeout_seconds=timedelta(seconds=5),
+ )
+
+ assert result == {
+ "name": "lookup",
+ "arguments": {"url": "https://example.com"},
+ "timeout": 5.0,
+ }
+ assert session.calls == 2
+ assert reconnects == 1
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test ensuring non-reconnectable exceptions are not retried and propagate instead of triggering reconnect.
To exercise the new narrow error handling, please add a test where the first call raises a non-reconnectable exception (e.g., `ValueError("business logic failed")`) and assert that no reconnects occur (e.g., `reconnects == 0`) and the exception is re-raised (e.g., `with pytest.raises(ValueError): call_tool_with_reconnect(...)`). This will cover the `if not _is_mcp_reconnect_error(exc): raise` path and guard against masking unrelated failures.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def test_mcp_reconnect_error_detection_is_narrow() -> None: | ||
| assert mcp_client._is_mcp_reconnect_error(RuntimeError("Session terminated")) | ||
| assert not mcp_client._is_mcp_reconnect_error( | ||
| RuntimeError("business flow terminated normally") |
There was a problem hiding this comment.
suggestion (testing): Add more edge cases to the reconnect error detection test to better pin down the intended matching behavior.
Since _is_mcp_reconnect_error drives retry behavior, it would be good to exercise a few more explicit cases here. Consider adding:
- An
anyio.ClosedResourceErrorto cover the type-based branch. - A case-insensitivity check (e.g.
"SESSION TERMINATED"). - The alternate message
"session was terminated"from_MCP_RECONNECT_ERROR_MESSAGES. - A
RuntimeError("terminated")or similar to show that plain "terminated" without "session" is not treated as reconnectable.
Parametrizing these cases would clearly document the contract and help prevent regressions in the matching logic.
Suggested implementation:
import anyio
import pytest
from tenacity import wait_none
from astrbot.core.agent import mcp_client@pytest.mark.parametrize(
("error", "expected"),
[
# Exact known reconnectable message
(RuntimeError("Session terminated"), True),
# Case insensitivity
(RuntimeError("SESSION TERMINATED"), True),
# Alternate configured reconnectable message
(RuntimeError("session was terminated"), True),
# Type-based reconnectable error
(anyio.ClosedResourceError(), True),
# Non-reconnectable messages
(RuntimeError("business flow terminated normally"), False),
# "terminated" without "session" should not be treated as reconnectable
(RuntimeError("terminated"), False),
],
)
def test_mcp_reconnect_error_detection_is_narrow(error: BaseException, expected: bool) -> None:
assert mcp_client._is_mcp_reconnect_error(error) is expected| @pytest.mark.asyncio | ||
| async def test_call_tool_reconnects_on_session_terminated(monkeypatch) -> None: | ||
| monkeypatch.setattr(mcp_client, "wait_exponential", lambda **_: wait_none()) | ||
|
|
||
| client = mcp_client.MCPClient() | ||
| session = FlakyMcpSession() | ||
| reconnects = 0 | ||
|
|
||
| async def reconnect() -> None: | ||
| nonlocal reconnects |
There was a problem hiding this comment.
suggestion (testing): Add a test ensuring non-reconnectable exceptions are not retried and propagate instead of triggering reconnect.
To exercise the new narrow error handling, please add a test where the first call raises a non-reconnectable exception (e.g., ValueError("business logic failed")) and assert that no reconnects occur (e.g., reconnects == 0) and the exception is re-raised (e.g., with pytest.raises(ValueError): call_tool_with_reconnect(...)). This will cover the if not _is_mcp_reconnect_error(exc): raise path and guard against masking unrelated failures.
b9a2502 to
7985fdd
Compare
Summary
Session terminatedfailures as reconnectable errors, while keeping genericterminatedtext out of the retry path.Exception, so cancellation/shutdown signals still propagate.Fixes #8681.
Evidence
mcp.shared.exceptions.McpError: Session terminatedfromastrbot/core/agent/mcp_client.pyafter restarting an MCP server.BaseExceptionhandling risks. This PR keeps the fix smaller.Verification
git diff --checkuv run pytest tests/unit/test_mcp_client_reconnect.py tests/unit/test_mcp_client_schema.py -q(7 passed)uv run ruff check .uv run ruff format --check .uv run astrbot --helpLocal full-suite note
uv run pytest tests/ -qinitially failed because pytest could not accessC:\Users\zacza\AppData\Local\Temp\pytest-of-zacza.--basetemp C:\tmp\pytest-automation-4-mcpcompleted collection/execution but exposed 21 existing Windows-path/newline/fixture failures unrelated toastrbot/core/agent/mcp_client.py.Summary by Sourcery
Handle MCP client reconnects on session-terminated errors while preserving existing retry behavior and error propagation.
Bug Fixes:
Tests: