Skip to content

Stateless mode: per-request McpServer+Protocol allocation causes memory leak at scale #2090

@RomKadria

Description

@RomKadria

Problem

In stateless mode (sessionIdGenerator: undefined), the recommended pattern — including the SDK's own simpleStatelessStreamableHttp.ts example — creates a full McpServer + Protocol + StreamableHTTPServerTransport on every HTTP request:

app.post('/mcp', async (req, res) => {
    const server = getServer();  // new McpServer per request
    const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
    await server.connect(transport);
    await transport.handleRequest(req, res, req.body);
    res.on('close', () => { transport.close(); server.close(); });
});

Each request allocates:

  • McpServerServerProtocol: 9 Maps/Sets (_requestHandlers, _responseHandlers, _progressHandlers, _notificationHandlers, _requestHandlerAbortControllers, _timeoutInfo, _pendingDebouncedNotifications, _taskProgressTokens, _requestResolvers), plus _loggingLevels Map
  • Server: new AjvJsonSchemaValidator (compiles JSON schemas)
  • StreamableHTTPServerTransportWebStandardStreamableHTTPServerTransport: 3 Maps (_streamMapping, _requestToStreamMapping, _requestResponseMap), plus getRequestListener from @hono/node-server

This works fine for low-traffic dev/demo scenarios. But for production HTTP servers handling sustained concurrent traffic, V8's GC can't reclaim these objects fast enough, causing steady memory growth until OOMKill.

Real-world impact

We run an MCP server (platform-mcp-gateway) in production on Kubernetes with 1200Mi memory limit. Using this pattern, memory grew ~1-2% per hour until hitting the limit, triggering repeated OOMKill alerts. The service has been running for months — this is a slow leak, not a burst.

Benchmark

We benchmarked the per-request McpServer approach vs. a lightweight JSON-RPC dispatcher that reuses the same handler functions (2,000 requests, --expose-gc):

Metric McpServer per request Lightweight dispatcher Delta
Throughput 2,797 req/s 6,536 req/s 2.3x faster
Heap growth +3.78 MB +1.41 MB 2.7x less
Per-request retained ~1,984 bytes ~738 bytes -63%

Why you can't just reuse a McpServer

The obvious fix — share one McpServer across concurrent requests — doesn't work because Protocol.connect(transport) replaces this._transport. If request A and B overlap:

  1. connect(transportA) → sets this._transport = transportA
  2. connect(transportB) → sets this._transport = transportB
  3. Request A's onmessage fires → _onrequest captures this._transport (now transportB) → response goes to wrong client

Suggestions

  1. Lightweight stateless mode: for stateless servers, the full Protocol/Transport stack is overkill — there's no session state, no SSE streaming needed, no server-initiated notifications. A StatelessMcpServer (or a flag on McpServer) could skip all the per-request infrastructure and just dispatch JSON-RPC directly.

  2. Fix the connect() transport race: if _onrequest captured the transport from the onmessage callback's closure (the transport that received the message) instead of from this._transport, a single McpServer could safely handle concurrent stateless requests.

  3. At minimum, document the trade-off: the stateless example should note that creating a server per request has significant overhead at scale and suggest alternatives for production deployments.

Environment

  • @modelcontextprotocol/sdk: 1.29.0
  • Node.js: 24.x
  • Runtime: Kubernetes pods (1200Mi limit, --max-old-space-size=900)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions