docs(migration): close guide gaps surfaced by real v1-to-v2 migrations#2382
Conversation
🦋 Changeset detectedLatest commit: f47d25f The changes in this PR will be included in the next version bump. This PR includes changesets to release 0 packagesWhen changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@modelcontextprotocol/client
@modelcontextprotocol/codemod
@modelcontextprotocol/core
@modelcontextprotocol/server
@modelcontextprotocol/server-legacy
@modelcontextprotocol/express
@modelcontextprotocol/fastify
@modelcontextprotocol/hono
@modelcontextprotocol/node
commit: |
Behavioral changes: initialize skipped on sessionId-bearing transports, async dispatch vs immediate aborts, error-message prefix removal, encode-seam -32002 mapping for deliberate throws, advertised tool-schema wire-shape changes, protocol-version pinning via supportedProtocolVersions, first-page list escape hatch, silent issuer-drop escalation in saveTokens. Errors: duck-typed .code HTTP-status reads (SdkHttpError .status vs SseError .code), raw numeric -32000/-32001 comparisons vs string SdkErrorCode. Auth: token verifiers must throw the v2 OAuthError or invalid tokens surface as HTTP 500 behind requireBearerAuth. Types & schemas: gateway/proxy passthrough result schema via @modelcontextprotocol/core ResultSchema, toJsonSchemaCompat removal note, RegisteredTool.update schema objects, zod fix ladder incl. dropped .describe() descriptions in the <4.2.0 fallback, validate() vs .parse() semantics, ElicitResult.content narrowing, Protocol/fallbackRequestHandler note. support-2026-07-28: schema artifact location, who should not default to auto, probe envelope timing, in-process testing recipe via the handler fetch override, DiscoverResult wire-only cache fields, supportedProtocolVersions cross-reference. Also: run the codemod at the package root (it rewrites package.json), typedoc frontmatter (titles + children) so deep links resolve on the rendered site, and the Unchanged-APIs footnote no longer contradicts the Types section.
SdkHttpError constructor mapping for code that constructs the error (test stubs, custom transports); leftover v1 dependency cleanup when the codemod reports no changes on already-migrated sources; the declared-zod-range trap (installs and typechecks cleanly, fails at first registration); and the broken Protocol named-import some early codemod versions emitted.
Review follow-up, each verified against source: request() dispatches before its first await (the async pre-work is in the typed verbs on top of it); schema-less request() throws a TypeError for non-spec methods rather than resolving undefined; a missing issuer stamp is used as-is with a console warning (only a mismatched stamp triggers re-authorization); versionNegotiation consults supportedProtocolVersions in auto mode only (a pin is honored as given); the leftover-v1-dependency cause is the updater only touching the target-directory manifest, not an already-migrated skip. Protocol import note switched to present tense (the codemod fix is tracked, not shipped). Adds an empty changeset for the changeset bot.
75b4a25 to
ecbe8b2
Compare
The supportedProtocolVersions bullet now matches resolveVersionNegotiation and the legacy handshake exactly (first pre-2026 entry offered in list order, counter-offers gated on the list's pre-2026 entries, auto-mode candidates from the list's modern entries with the SDK default as fallback, pin not checked against the list) and no longer contradicts the support guide's corrected sentence. The package.json caveat now describes the real mechanism: the updater rewrites the nearest manifest walking up from the target directory, so workspace members are the leftover-dependency cause, and already-migrated sources get the v1 dependency removed without the v2 additions. index.md's package-root rationale switched to the valid reason (test/scripts/fixtures imports).
| **Zod v3 is no longer supported** (v1 peer was `^3.25 || ^4.0`). Check the **declared | ||
| range** in your `package.json`, not just the installed version: a zod-3 range that | ||
| satisfied the v1 peer installs and typechecks cleanly under v2 and only fails at | ||
| runtime, when the first registration throws — under a spawning harness that surfaces | ||
| as an opaque child exit two hops from the cause. A Zod v3 schema | ||
| hard-errors with a pointer at `fromJsonSchema()`. Zod **≥4.2.0** self-converts via |
There was a problem hiding this comment.
🔴 The reworked Zod-v3 paragraph says a zod-3 declared range "only fails at runtime, when the first registration throws — under a spawning harness that surfaces as an opaque child exit", but for the scenario it describes (codemod-produced z.object()-wrapped schemas built with the user's zod 3) registration deliberately swallows the conversion failure (mcp.ts 'warn, never throw' try/catch), the server starts and connects normally, and the fromJsonSchema()-pointing error surfaces as a JSON-RPC error on the first tools/list / client.listTools() while the process keeps running. Reword the symptom to a tools/list error response rather than a registration throw / opaque child exit (only the unwrapped raw-shape-with-zod-3-fields form throws at registration, and that message points at zod/v4, not fromJsonSchema()).
Extended reasoning...
What the doc claims. The new failure-mode sentence at docs/migration/upgrade-to-v2.md:392-397 tells migrators that a zod-3 declared range satisfying the v1 peer (^3.25) "installs and typechecks cleanly under v2 and only fails at runtime, when the first registration throws — under a spawning harness that surfaces as an opaque child exit two hops from the cause. A Zod v3 schema hard-errors with a pointer at fromJsonSchema()." That composites two distinct failure paths into a symptom the implementation does not produce for the scenario the sentence covers.
The schema shape this scenario actually produces. The codemod wraps raw shapes with z.object() using the user's own zod import, and every zod >=3.25 (the v1 peer floor) implements ~standard, so isStandardSchema() is true. normalizeRawShapeSchema (packages/core-internal/src/util/zodCompat.ts:58-80) passes such a schema straight through — its zod-3 TypeError branch fires only for an unwrapped raw shape whose field values look like zod 3, and that message points at zod/v4 / "wrap with z.object({...})", not at fromJsonSchema().
Registration does not throw. McpServer._createRegisteredTool wraps standardSchemaToJsonSchema(inputSchema, 'input') in a try/catch explicitly documented as a 'warn, never throw' contract with an empty catch (packages/server/src/server/mcp.ts:812-836), and convertOutputSchemaJson (mcp.ts:1327-1334) likewise catches and returns undefined. The zod-3 hard error ("Schema appears to be from zod 3 … wrap your JSON Schema with fromJsonSchema()", packages/core-internal/src/util/standardSchema.ts:188-192) is therefore swallowed at registration: the tool registers, the server starts, connect() succeeds, and no child-process exit occurs.
Where the failure actually surfaces. The first uncaught call to standardSchemaToJsonSchema is inside the tools/list handler (mcp.ts:189); the throw is converted by protocol dispatch into a JSON-RPC error response and the process keeps running. toolInputSchemaJson() (mcp.ts:108-114) also catches, so the SEP-2243 pre-dispatch path never crashes either. Meanwhile callTool input validation runs through zod 3's own ~standard.validate and keeps working.
Step-by-step proof. (1) A migrated project pins zod@^3.25; the codemod has wrapped its raw shapes with z.object() from that zod. (2) registerTool() runs: normalizeRawShapeSchema passes the schema through (it is a Standard Schema), the eager standardSchemaToJsonSchema call throws the zod-3 error, the empty catch swallows it. (3) The server starts under its spawning harness, connects, answers initialize — the child process is healthy, no exit. (4) The client calls listTools(): the tools/list handler converts the schema uncaught, the zod-3 error becomes a JSON-RPC error response containing the fromJsonSchema() message, and the server stays up. So the migrator's signal is a tools/list error response, not an "opaque child exit".
Why it matters. This sentence exists precisely to help migrators recognize the zod-3 failure mode quickly. As written it sends them to the wrong place — process supervision and spawn-harness debugging — when the actual signal is an error on listTools(); conversely, someone whose server starts and connects fine may wrongly conclude their zod range is safe. It also fuses the only real registration-time throw (the unwrapped raw-shape-with-zod-3-fields path, whose TypeError points at zod/v4, not fromJsonSchema()) with the conversion error's message, producing a composite that matches neither path. Notably, the doc's own later parenthetical ("they fail at registration or at the first tools/list") is closer to reality than this headline sentence.
How to fix. Reword along the lines of: "…only fails at runtime: registration accepts the schema (the conversion failure is swallowed), and the first tools/list (so client.listTools()) answers with an error pointing at fromJsonSchema() while the server keeps running; only the deprecated unwrapped raw-shape form with zod-3 field values throws at registration, with a message pointing at zod/v4." The rest of the fix-ladder paragraph is accurate and can stay.
| - **`DiscoverResult` strips its cache fields too.** `ttlMs` / `cacheScope` on | ||
| `server/discover` are wire-only — consumed by the client's response-cache layer and | ||
| absent from the public `DiscoverResult` type returned by `getDiscoverResult()`. | ||
| Tooling that displays the server's advertised cache policy must parse raw frames. |
There was a problem hiding this comment.
🔴 The "strips its cache fields too" / "wire-only" claim here is only true at the type level: nothing at runtime removes ttlMs/cacheScope — the probe path stores the wire-parse output verbatim and getDiscoverResult() returns it with both fields still present (the client's own _freshness() reads them off that runtime body). Reword to scope the bullet to the public type surface (the fields are undeclared on DiscoverResult, readable via a cast, with the .catch() 0/'private' defaults making absent-vs-default indistinguishable) — "must parse raw frames" overstates it.
Extended reasoning...
What the bullet claims vs. what the code does. The new bullet (docs/migration/support-2026-07-28.md:305-308) says DiscoverResult "strips its cache fields too", that ttlMs / cacheScope are "wire-only — consumed by the client's response-cache layer", and that "tooling that displays the server's advertised cache policy must parse raw frames." Only the type-level half holds. Unlike the resultType bullet directly above it (which describes a real runtime delete), nothing in the implementation removes the cache fields from the value getDiscoverResult() returns.
The code path. The 2026 wire DiscoverResultSchema explicitly declares ttlMs (.catch(0)) and cacheScope (.catch('private')) (packages/core-internal/src/wire/rev2026-07-28/schemas.ts:808-822), so the parse output carries both fields. The probe path stores that output verbatim: probeClassifier.ts classifyResult() returns { kind: 'modern', ..., discover: parsed.value } (packages/client/src/client/probeClassifier.ts:137-152), _connectNegotiated assigns this._discoverResult = result.discover (packages/client/src/client/client.ts:1093), and getDiscoverResult() returns it as-is (client.ts:1290-1291). The codec's decode lift consumes only resultType (delete lifted['resultType'], codec.ts:258-261); there is no equivalent strip of the cache fields anywhere. The discover() typed-verb path validates with the neutral DiscoverResultSchema, which extends the loose ResultSchema, so the runtime keys survive there too.
Why "consumed by the client's response-cache layer" is the opposite of stripping. Client._freshness() reads the cache hints off the runtime result body, and its JSDoc says exactly that: "The fields pass through the loose result schema, so they are read off the runtime body" (client.ts:~1660-1675). The SDK relies on the fields not being stripped — "consumed" here means "read", not "removed".
Step-by-step proof. (1) A client with versionNegotiation: { mode: 'auto' } connects to a 2026-capable server; the server/discover response body carries ttlMs: 30000, cacheScope: 'public'. (2) classifyResult() parses it with the 2026 wire codec — parsed.value includes both fields (the wire schema declares them; nothing deletes them). (3) client.ts:1093 stores parsed.value as this._discoverResult. (4) client.getDiscoverResult() returns that exact object. (5) (client.getDiscoverResult() as Record<string, unknown>).ttlMs reads 30000 — no raw-frame parsing needed. A v2-alpha consumer auditing tooling against this bullet would conclude such reads stopped working at runtime; in fact they keep working (only the TypeScript declaration is gone).
The accurate part, and the genuine caveat. The public neutral DiscoverResultSchema (packages/core-internal/src/types/schemas.ts:577) does not declare the cache fields, and StripWireOnly only removes resultType (types/types.ts:211-218) — so the type-level claim ("absent from the public DiscoverResult type") is correct. There is also one real reason a faithful policy display might want raw frames: on the probe path the wire schema's .catch() fills 0 / 'private' for absent or malformed hints, so a runtime read cannot distinguish "server advertised 0/private" from "server omitted the fields".
Why it matters. This is a docs-only PR whose stated bar is that every claim was verified against the SDK source, and the bullet deliberately echoes the resultType bullet above it — which is a real runtime delete — making the over-claim more misleading. All three verifiers confirmed against current main; none refuted.
How to fix. Reword to scope the claim to the type surface, e.g.: "ttlMs / cacheScope on server/discover are undeclared on the public DiscoverResult type returned by getDiscoverResult() — the client reads them off the runtime body for its response cache, and they remain present on the returned object (readable via a cast). Note the wire schema's lenient defaults mean absent or malformed hints read back as 0 / 'private'; tooling that must distinguish 'omitted' from 'advertised default' should parse raw frames." Drop the "strips" / "wire-only" wording.
| - **`connect()` skips the `initialize` handshake when the transport already exposes a | ||
| `sessionId`** — it assumes it is reconnecting to an existing session (v1 always | ||
| initialized). A custom or test transport that sets `sessionId` at construction | ||
| silently skips initialization: `getServerCapabilities()` stays `undefined` and the | ||
| list verbs return empty results. Expose `sessionId` only after the first request has | ||
| been sent. |
There was a problem hiding this comment.
🔴 The parenthetical "(v1 always initialized)" is incorrect: v1's Client.connect() had the identical skip-on-sessionId guard (and the same reconnect comment) that v2 carries at packages/client/src/client/client.ts:941-953, so this is not a v1→v2 behavioral change. Drop the parenthetical and reword the bullet as a carried-over v1 pitfall (or move it out of the "Behavioral changes" list); the actionable advice — expose sessionId only after the first send — is fine and applies to both versions.
Extended reasoning...
What the doc claims. The new "Client connection & dispatch" bullet (docs/migration/upgrade-to-v2.md:988-993) states that in v2 "connect() skips the initialize handshake when the transport already exposes a sessionId" and adds the parenthetical "(v1 always initialized)", warning migrators that a custom or test transport setting sessionId at construction now silently skips initialization. Sitting in the "Behavioral changes" section — whose explicit purpose is listing runtime differences a v1 consumer must adapt to — this presents the skip as new in v2.
What v1 actually did. v1's Client.connect() (e.g. v1.20.0, src/client/index.ts:119-125, verified against the published source at raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/1.20.0/src/client/index.ts) reads:
override async connect(transport: Transport, options?: RequestOptions): Promise<void> {
await super.connect(transport);
// When transport sessionId is already set this means we are trying to reconnect.
// In this case we don't need to initialize again.
if (transport.sessionId !== undefined) {
return;
}
// ... initialize request ...i.e. v1 already skipped the initialize handshake whenever transport.sessionId was defined — exactly the same guard, with the same comment text, that v2 carries at packages/client/src/client/client.ts:941-953 (and again in _connectNegotiated around line 1051). The parenthetical "(v1 always initialized)" is therefore false, and the behavior described is not a v1→v2 change at all.
Step-by-step proof. (1) A test/custom transport sets this.sessionId = 'fixed-id' in its constructor. (2) Under v1, client.connect(transport) calls super.connect(), hits the transport.sessionId !== undefined guard, and returns — no initialize is sent; getServerCapabilities() stays undefined. (3) Under v2, connect() hits the identical guard at client.ts:941-953 and behaves the same way. (4) The doc tells this reader their v1 code "always initialized" and that the silent skip is new — both directions of that comparison are wrong: the v1 code already skipped, and nothing changed in v2.
Why it matters. The PR's stated bar is that "every claim was verified against the SDK source", and this bullet is in the section reserved for genuine runtime differences. A reader debugging an initialization-skip on v1 is told (incorrectly) that v1 cannot exhibit it; a migrator is told the workaround ("expose sessionId only after the first request has been sent") is newly required when it applied identically to v1. It also pads the Behavioral-changes list with a non-change, diluting the items that do require adaptation.
Why nothing else prevents it. No other text in the guide qualifies the claim, and no existing review comment covers this bullet's v1-comparison — the other comments on this section target the dispatch-async and pinning bullets.
How to fix. Either delete the bullet from the "Behavioral changes" list (the behavior is unchanged), or keep it as a useful pitfall but drop "(v1 always initialized)" and reframe it, e.g. "(unchanged from v1.x — listed here because custom/test transports often hit it during migration)". The actionable advice itself can stay as-is.
Corrections verified against source after the final automated review round landed post-merge on #2382: the SEP-2352 paragraph now describes what a missing issuer stamp actually does (reads succeed, refresh keeps working, the per-AS check does not apply to that credential, a warning is logged on each read) and drops a duplicated clause; the zod-3 paragraph now matches the real failure path (registration does not throw for z.object() schemas - the server starts normally and the first tools/list answers with an error pointing at fromJsonSchema(); only the deprecated raw-shape form errors at registration, pointing at zod/v4); the DiscoverResult bullet now says the cache fields are hidden at the type level only (the runtime object carries both, a cast suffices, raw frames only distinguish omitted hints from defaulted values); and the connect() sessionId-skip bullet notes the same guard exists in v1 since 1.10.0, so it is a gotcha rather than a migration delta.
Closes documentation gaps in the v1→v2 migration guides found by dry-running the migration on a range of real v1 consumers (CLI tools, gateways/proxies, auth-heavy services, and large host applications) using only the published guidance.
Motivation and Context
Several behavioral changes and removed-API replacements were discoverable only from typings or trial-and-error. The costliest gaps were silent ones: duck-typed
.codeHTTP-status reads that stop matching, raw numeric-32000/-32001comparisons against the now-stringSdkErrorCode, token verifiers that must throw the v2OAuthError(or invalid tokens surface as HTTP 500 behindrequireBearerAuth), the dropped-.describe()consequence of the zod <4.2.0 conversion fallback, andsaveTokens()implementations that drop theissuerstamp and silently escalate refresh to full re-auth.Highlights:
connect()skipsinitializeon sessionId-bearing transports; request dispatch is async (immediate aborts no longer emitnotifications/cancelled);supportedProtocolVersionsdocumented as the pinning option.-32002mapping applies to deliberate throws, duck-typed.codeand raw-numeric-code pitfalls (with theSseErrorasymmetry).request()now enforces spec result schemas; passResultSchemafrom@modelcontextprotocol/corefor v1-identical forwarding;toJsonSchemaCompatreplacement;RegisteredTool.update()takes schema objects; advertised tool-schema wire-shape changes;ElicitResult.contentnarrowing; zod fix ladder.versionNegotiation: 'auto', probe envelope timing, in-process testing recipe via the handlerfetchoverride,DiscoverResultwire-only cache fields..) — it also rewritespackage.json.children:) so deep links resolve on the rendered site; fixed an internal contradiction in the Unchanged-APIs footnote.How Has This Been Tested?
Docs-only change.
prettier --check,pnpm sync:snippets --check, andpnpm docs:check(typedoc) all pass. Every claim was verified against the SDK source at the current main before being written.Breaking Changes
None.
Types of changes
Checklist
Additional context
A follow-up codemod fix is planned separately: the spec-method rule currently drops the result-schema argument even when the request method is not a literal, which breaks verbatim-forwarding proxies (covered in the new gateway guidance here).