-
Notifications
You must be signed in to change notification settings - Fork 100
Description
Summary
codebase-memory-mcp hangs for 60 seconds before responding to tools/list when an MCP client sends the three standard initialization messages without artificial delays between them. This manifests as a "connecting..." state in Claude Code that resolves only after STORE_IDLE_TIMEOUT_S (60s) elapses.
Root Cause
The MCP event loop in cbm_mcp_server_run (src/mcp/mcp.c) mixes poll() on the raw file descriptor with getline() on a buffered FILE*. These two abstractions operate at different layers of the I/O stack, and the combination creates a correctness hazard:
- The client sends three messages back-to-back with no delay between them (all arrive in the kernel receive buffer simultaneously)
poll()fires — data is availablegetline()readsinitializeand over-reads — libc'sFILE*buffer drains the entire kernel buffer, pulling all three messages into userspacecbm_mcp_server_handle()processesinitializeand returns a responsegetline()processesnotifications/initialized(a notification with noid) —cbm_mcp_server_handle()returnsNULL(correct per spec), no response written- The loop calls
poll()again for the next message — but thetools/listpayload is already in libc'sFILE*buffer, not the kernel fd poll()sees an empty kernel fd and blocks for 60 secondstools/listnever receives a response within any reasonable timeout
The bug was reliably triggered by Claude Code 2.1.80, which sends all three initialization messages as a rapid burst (no inter-message delay). Earlier client versions or clients that insert delays between messages may never observe the bug.
Reproduction:
import subprocess, json, time
binary = \"codebase-memory-mcp\"
proc = subprocess.Popen([binary], stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True)
msgs = [
{\"method\":\"initialize\",\"params\":{\"protocolVersion\":\"2025-11-25\",\"capabilities\":{},\"clientInfo\":{\"name\":\"test\",\"version\":\"1.0\"}},\"jsonrpc\":\"2.0\",\"id\":0},
{\"method\":\"notifications/initialized\",\"jsonrpc\":\"2.0\"},
{\"method\":\"tools/list\",\"jsonrpc\":\"2.0\",\"id\":1},
]
# Send all three with NO delay — triggers the hang
for m in msgs:
proc.stdin.write(json.dumps(m) + \"\\n\")
proc.stdin.flush()
start = time.time()
for _ in range(2): # expect initialize response + tools/list response
line = proc.stdout.readline()
print(f\"{time.time()-start:.2f}s: {line[:80]}\")
proc.terminate()Expected: both responses arrive within ~1 second.
Observed (before fix): initialize response arrives immediately; tools/list response arrives after ~60 seconds.
The comment at the original poll() call site stated "MCP is request-response (one line at a time), so mixing poll() on the raw fd with getline() on the buffered FILE is safe in practice."* This assumption does not hold when multiple messages arrive in a single kernel receive event.
Trigger Context: Claude Code 2.1.80
Claude Code 2.1.80 changed its MCP client startup to send the three initialization messages (initialize, notifications/initialized, tools/list) in rapid succession as part of a single write burst. This is legal behavior under the MCP specification — the protocol does not require delays between messages. The server bug was latent before this client change; 2.1.80 made it reliably reproducible.
The three messages CC 2.1.80 sends on startup (captured via spy):
{"method":"initialize","params":{"protocolVersion":"2025-11-25","capabilities":{"roots":{},"elicitation":{"form":{},"url":{}}},"clientInfo":{"name":"claude-code","version":"2.1.80"}},"jsonrpc":"2.0","id":0}
{"method":"notifications/initialized","jsonrpc":"2.0"}
{"method":"tools/list","jsonrpc":"2.0","id":1}Fix
Replace the single blocking poll() call with a three-phase approach that correctly handles data already buffered in the FILE* layer:
Phase 1: Non-blocking poll(timeout=0) — fast path, catches data already in the kernel fd.
Phase 2: If Phase 1 returns 0 (no kernel data), peek one byte from the FILE* buffer using fgetc(in) + ungetc(). This detects data that a prior getline() over-read pulled into libc's buffer. If data is found, skip the blocking poll and fall through to getline().
Phase 3: Only if both Phase 1 and Phase 2 confirm no data — call blocking poll(STORE_IDLE_TIMEOUT_S * 1000) for idle eviction.
This approach is fully POSIX-portable and does not require making the fd non-blocking (which would complicate getline() error handling for EAGAIN), nor does it rely on GNU-only extensions like __fpending().
The inaccurate comment at the original call site is also corrected to document the actual hazard.
Test Coverage
- C unit test (
tests/test_mcp.c):mcp_server_run_rapid_messages— usespipe()+alarm(5)to verify all three init messages are processed without hanging - Python integration test (
scripts/test_mcp_rapid_init.py): sends all three messages simultaneously viaproc.communicate(), assertstools/listresponse arrives within 5 seconds against the installed binary
Test results: 2043/2043 tests pass. Python integration test passes against built binary and installed binary.
Affected Versions
Triggered reliably by Claude Code ≥ 2.1.80. Latent in earlier versions where client insert inter-message delays.