Skip to content

feat(servers): move oauthClientSecret and stdio env values to OS keychain (#1356)#1360

Open
cliffhall wants to merge 5 commits into
v2/mainfrom
feat/1356-oauth-client-secret-keychain
Open

feat(servers): move oauthClientSecret and stdio env values to OS keychain (#1356)#1360
cliffhall wants to merge 5 commits into
v2/mainfrom
feat/1356-oauth-client-secret-keychain

Conversation

@cliffhall
Copy link
Copy Markdown
Member

Summary

Closes #1356.

mcp.json is designed to be tool-shareable (symlinked from Claude Desktop's config, pasted into bug reports, synced via dotfiles). Storing OAuth client secrets and stdio env values in plaintext there meant any of those flows could leak them. This PR lifts both into the OS keychain.

  • Library: @napi-rs/keyring (active replacement for the archived keytar). Service mcp-inspector; account ${serverId}:${field} with fields oauth-client-secret and env:${KEY}.
  • On disk: oauth.clientSecret removed entirely; stdio env keys preserved with empty-string placeholders so the file still documents the env interface the server expects.
  • On the wire: unchanged. GET /api/servers rehydrates values from the keychain so the browser sees the same JSON it has always seen. The OAuth handshake still runs in the browser via the MCP SDK; moving the token exchange to Node is a separate piece of work and explicitly out of scope.
  • Migration: idempotent — every GET walks the freshly-read config, lifts any plaintext secrets into the keychain, and rewrites mcp.json with the stripped shape. If the keychain already holds a value for (serverId, field), the keychain wins and the disk plaintext is dropped unread.
  • Linux without libsecret: the keyring backend raises KeychainUnavailableError; handlers translate to a 503 with a message pointing at libsecret/gnome-keyring install. macOS and Windows always have a working keychain.
  • Tests: SecretStore is an interface with an InMemorySecretStore test double that the integration suite injects via RemoteServerOptions.secretStore, so the suite doesn't need libsecret on Linux CI.

Files

  • core/auth/secret-fields.ts (new) — shared field-name constants (browser-safe).
  • core/auth/node/secret-store.ts (new) — SecretStore interface + KeyringSecretStore (production) + InMemorySecretStore (test) + KeychainUnavailableError.
  • core/mcp/serverList.ts — pure helpers extractSecretsFromStored / mergeSecretsIntoStored / expectedSecretFields.
  • core/mcp/remote/node/server.ts — keychain hooks across GET (rehydrate + migration), POST (strip + write), PUT (reconcile / rename), DELETE (sweep).
  • specification/v2_servers_file.md — documents the keychain split, migration, and Linux fallback behavior.

Test plan

  • npm run validate from clients/web/ — passes (format / lint / build / coverage).
  • npm run test:storybook — 306/306 pass.
  • New unit tests on the converter helpers (serverList.test.ts) — 14 added.
  • New unit tests on InMemorySecretStore contract (secret-store.test.ts) — 10 added.
  • New integration tests covering the keychain flow on every /api/servers route (servers-route.test.ts) — 9 added, total file now 50/50.
  • Manual: POST a server with an OAuth secret via the UI, confirm mcp.json contains no plaintext and Keychain Access on macOS shows the entry under service mcp-inspector.
  • Manual: hand-edit mcp.json to put a plaintext secret back, hit refresh, confirm the file is rewritten with the value stripped.

🤖 Generated with Claude Code

…hain (#1356)

mcp.json is designed to be tool-shareable (symlinked from Claude Desktop's
config, pasted into bug reports, synced via dotfiles), so storing OAuth
client secrets and stdio env values in plaintext there meant any of those
flows could leak them. Lift both into the OS keychain via @napi-rs/keyring
(active replacement for the archived keytar). The wire shape is unchanged:
the GET /api/servers handler rehydrates from the keychain so browser code
sees the same JSON; the on-disk file no longer contains the secret
material. Includes an idempotent migration that lifts plaintext from older
mcp.json files (or hand-edited ones) into the keychain on first read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cliffhall cliffhall added the v2 Issues and PRs for v2 label May 25, 2026
CI on Linux failed because `@napi-rs/keyring` hard-errors without
libsecret, and the route handlers were touching the keychain even for
no-secret flows (defensive `deleteAllForServer` on POST/DELETE, sweep
on PUT). The user's intent for the Linux fallback was "hard fail when a
secret is actually involved" — not on every keychain touch.

Make `KeyringSecretStore.get / delete / deleteAllForServer` silently
degrade when the keychain is unavailable (return null / no-op); `set`
remains the one operation that hard-fails with
`KeychainUnavailableError`. The migration path catches the error and
preserves the disk plaintext rather than losing the secret.

Also inject `InMemorySecretStore` in `useServers.test.tsx` and
`servers-events.test.ts` so tests never touch a developer's real
keychain even when libsecret is present.

Adds `KeyringSecretStore` tests (vi.mock'd native bindings) and an
integration suite that simulates Linux-without-libsecret to verify the
tolerance contract end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cliffhall
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 25, 2026

Claude finished @cliffhall's task in 3m 24s —— View job


Code Review

Solid, well-documented move of OAuth client secrets and stdio env values into the OS keychain. The SecretStore abstraction is clean, the migration logic is idempotent, and the test coverage is impressive (the UnavailableSecretStore suite in servers-route.test.ts:1329-1500 is especially nice). One real concurrency bug in GET /api/servers, plus a handful of smaller observations.


🐛 Concurrency: the GET migration write can clobber concurrent POST/PUT/DELETE

core/mcp/remote/node/server.ts:1271-1306 — the read + migrate happens outside the write lock; only the final write enters it. The comment at line 1288 says "the rewrite happens inside the same write lock as POST/PUT/DELETE so concurrent mutations can't race with our cleanup" — but that's only half true.

Trace:

  1. readStoreFile(mcpConfigPath) — unlocked read (line 1273)
  2. A concurrent POST /api/servers grabs the lock, reads the same baseline, adds entry alpha, writes, releases
  3. migratePlaintextSecrets(onDisk) — builds migrated from the pre-POST snapshot
  4. withWriteLock grabs the lock; writes serializeStore(migrated) — clobbers POST's alpha

Same shape affects the first-launch seed write at server.ts:1279writeMcpAndTrackMtime(serializeStore(DEFAULT_SEED_CONFIG)) runs outside any lock, so a racing POST into an empty directory can be overwritten by the seed.

The existing write lock around mutating routes was introduced precisely to prevent this class of lost-update (server.ts:1108-1126 — "avoids a real lost-update once file watching (#1345) or any remote/ multi-client usage lands"). The /api/servers/events SSE channel makes multi-tab usage first-class now, so the race is easy to hit: open two tabs, refresh one while the other is mid-POST.

Fix: put the entire read + migrate + write sequence inside withWriteLock whenever changed === true (or the seed branch fires). The re-read inside the lock would then become authoritative. Alternatively, do an unlocked read first for the common no-migration case, and re-read inside the lock only when migration is needed.

Fix this →


⚠️ POST writes file before writing keychain — half-state on partial failure

core/mcp/remote/node/server.ts:1353-1357:

const { stripped, secrets } = extractSecretsFromStored(built);
await secretStore.deleteAllForServer(id);
current.mcpServers[id] = stripped;
await writeMcpAndTrackMtime(serializeStore(current));
await writeKeychainEntriesFor(id, secrets);   // ← can throw midway

If writeKeychainEntriesFor throws KeychainUnavailableError on the second of three secrets:

  • File already has the entry with empty env values
  • Some secrets landed in the keychain, others didn't
  • POST returns 503
  • Retrying the POST hits 409 (entry exists) — user is stuck

Consider reversing to writeKeychain → writeFile. On partial keychain failure, the file doesn't get the half-configured entry, and the defensive deleteAllForServer sweep on the retry clears whatever partial keychain state existed. Applies equally to the PUT path (server.ts:1489-1500).


Minor observations

  • core/auth/node/secret-store.ts:127 — the regex /no entry|no matching|not found/i for distinguishing "absent" from "keychain down" couples behavior to @napi-rs/keyring's error text. If that changes across versions, delete falls into the silent-no-op path instead of the "treat as success" path. The outward behavior is the same (both return undefined), so the practical impact is nil, but the comment at line 127 becomes aspirational. Worth a test that pins which version of @napi-rs/keyring is verified to produce these messages.

  • core/auth/node/secret-store.ts:133-148deleteAllForServer silently swallows per-entry delete errors after a successful findCredentialsAsync. If only specific entries fail (e.g. transient keychain hiccup), we'd leak those entries. In practice the prefix-filter + per-delete silent-no-op behavior is consistent with the rest of the destructive-op contract, so this is an observation rather than a bug.

  • core/mcp/serverList.ts:342-354extractSecretsFromStored clears every stdio env value to "" unconditionally on the stripped shape, even if the entry had no env values to lift. Currently a no-op because the outer if (env) guards it, but worth noting: a server with env: {} round-trips through as env: {} (empty keys preserved). That's probably fine but gets lost in the on-disk minimization theme elsewhere.

  • core/mcp/remote/node/server.ts:1165-1175readKeychainEntriesFor sequentially awaits secretStore.get() for every field. Each keychain read on macOS can be 10-50ms; for a user with 20 stdio servers × 5 env vars each, the GET rehydrate becomes 5-10s. Probably acceptable for v2 but a candidate for Promise.all later.

  • Docs (specification/v2_servers_file.md:248) — "returns a hard error on any operation" is slightly inaccurate post-commit 8941935: per the production impl, get / delete / deleteAllForServer silently no-op on unavailability; only set throws. Consider reflecting the tolerance contract there.


✅ What's good

  • SecretStore abstraction with InMemorySecretStore for tests is exactly right — CI on Linux without libsecret works cleanly.
  • Migration idempotency (server.ts:1188-1226): keychain wins on conflict, disk plaintext is dropped unread. Handles the "user hand-edited plaintext back in" case correctly (tested at servers-route.test.ts:1290-1326).
  • Defensive sweep before new writes on POST + rename path handles orphans from prior partial failures.
  • Field-name constants split into core/auth/secret-fields.ts so the browser-safe converter in serverList.ts doesn't drag the native binding through.
  • Vite optimizeDeps.exclude + rollup onwarn entries for @napi-rs/keyring keep the browser bundle clean.
  • Test coverage: secret-store.test.ts mocks the native side and covers the full tolerance contract (get→null, delete→no-op, set→throws); the keychain unavailable describe block in servers-route.test.ts exercises the 503 path and the plaintext-preservation contract.

Summary

Severity Count
Bug (concurrency) 1
Robustness (partial-failure) 1
Minor / nits 4

The concurrency bug in GET is the only item I'd block on; everything else is nice-to-have. Approve after the read/write-lock fix lands.
· feat/1356-oauth-client-secret-keychain

@cliffhall cliffhall linked an issue May 25, 2026 that may be closed by this pull request
@cliffhall cliffhall self-assigned this May 25, 2026
…1356)

Addresses code review:

1. **Concurrency**: GET /api/servers was reading + migrating outside the
   write lock, then writing inside it — a concurrent POST/PUT/DELETE
   between the read and the lock acquisition could be clobbered by the
   migration write. Also affected the first-launch seed write. Fix:
   take a fast unlocked peek; if the file is absent (seed branch) or
   carries plaintext to migrate (migration branch), re-read inside the
   write lock and decide there. Added `hasPlaintextSecrets()` as a
   pure predicate so the common no-migration path stays lock-free.

2. **Write ordering**: POST and PUT wrote disk first, then keychain.
   A keychain `set` failure mid-write left a half-configured entry on
   disk and trapped retries at 409. Reversed to keychain-first so a
   503 leaves both stores in their pre-write state. PUT in-place now
   sets new fields, writes disk, then deletes obsolete fields (a
   failed disk write leaves orphan keychain entries — recoverable —
   rather than premature deletions of fields the user still expects
   to see).

3. **Tolerance simplification**: dropped the no-entry regex in
   `KeyringSecretStore.delete` — both "no entry" and "keychain
   unavailable" collapse to the same desired outcome (the entry isn't
   there anymore), so a uniform swallow is clearer.

4. **Parallelism**: `readKeychainEntriesFor` now `Promise.all`s the
   per-field `get` calls. macOS Keychain round-trips are 10-50ms;
   serializing 5+ env vars per server × N servers is a meaningful
   wall-clock cost on GET.

5. **Spec doc**: reflects the tolerance contract (only `set`
   hard-fails) and the keychain-before-disk write ordering rationale.

New regression test in servers-route.test.ts covers the retry-after-503
contract directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cliffhall
Copy link
Copy Markdown
Member Author

Thanks for the thorough review. Addressed in c2bd6d00:

🐛 GET migration race — fixed. The handler now takes a fast unlocked peek; only the seed branch and the migration-needed branch enter withWriteLock, and they re-read inside the lock so the decision is based on the same snapshot they're about to mutate. Added a hasPlaintextSecrets() pure predicate so the common no-migration path stays lock-free. Server.ts lines 1283–1336.

⚠️ POST/PUT write ordering — reversed. Both routes now do keychain-first, disk-second. PUT in-place additionally moves obsolete-field deletion after the disk write so a failed disk doesn't leave the user with the old shape on disk but missing keychain values. The PUT rename case keeps the old-id sweep at the end for the same reason — better orphan keychain entries than missing ones. The new POST 503 leaves no disk entry — retry is not trapped at 409 test in servers-route.test.ts is the regression test for the specific failure mode you flagged.

Minor observations:

  • Regex coupling in delete — agreed, simplified. Both error paths (NoEntry from the library + the keychain-unavailable runtime error) collapse to the same desired outcome (entry is gone), so a uniform swallow is clearer. The comment now spells that out instead of pinning the library's error messages.
  • deleteAllForServer per-entry swallow — left as-is per your acknowledgment.
  • Empty env: {} round-trip — left as-is, not a bug.
  • Sequential keychain reads — fixed with Promise.all in readKeychainEntriesFor. macOS Keychain round-trips are 10-50ms each so this is real wall-clock, not just stylistic.
  • Spec doc — updated to describe the tolerance contract accurately (only set hard-fails; get/delete/deleteAllForServer silently degrade) and to document the keychain-before-disk write ordering rationale.

Full suite green: 143 files, 1743 tests, coverage thresholds met. CI should reflect the same.

@cliffhall
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 25, 2026

Claude finished @cliffhall's task in 2m 1s —— View job


Re-review ✅

All four items from the prior review are addressed correctly in c2bd6d00. The concurrency fix in particular is cleanly structured — the fast-path peek stays lock-free while the mutating branches re-read inside the lock, so the decision is based on the same snapshot being mutated.


🐛 The GET race fix (server.ts:1298–1355)

Resolved. The structure now reads:

  unlocked peek → if file exists AND no plaintext → rehydrate & return   (fast path)
  withWriteLock:
    re-read
    if absent → seed & return
    if no plaintext (concurrent POST beat us) → return latest snapshot
    else → migrate (with set-failure tolerance) & write

The "no plaintext after taking the lock" branch at line 1341 is the detail that makes this airtight: it handles the case where a concurrent POST landed between the peek and the lock acquisition — we no-op cleanly instead of speculatively migrating a stale snapshot. The new hasPlaintextSecrets predicate keeps the fast path lock-free for the overwhelmingly common case.

⚠️ POST/PUT write ordering (server.ts:1390–1419, 1474–1580)

Resolved. Both routes now do deleteAllForServer sweep → keychain write → disk write, with obsolete-field cleanup moved to after the disk write on PUT in-place. The comment at 1542–1557 articulates the reasoning concisely: a failed set leaves both stores in their pre-write state, and a failed disk write leaves orphan keychain entries (recoverable) rather than premature deletions.

The new regression test POST 503 leaves no disk entry — retry is not trapped at 409 (servers-route.test.ts:1482–1528) exercises the exact failure mode — good regression anchor.

Minor observations — all addressed

  • Regex coupling in delete (secret-store.ts:120–133): simplified to unconditional swallow with a clearer comment. Good call — the library's error text is no longer a coupling surface.
  • Spec doc (v2_servers_file.md:248–250): accurately describes the tolerance contract (only set hard-fails) and documents the write-ordering rationale.
  • readKeychainEntriesFor (server.ts:1170–1182): now Promise.all over fields. On macOS this is real wall-clock (10-50ms per round-trip).

Two small things worth noting (not blocking)

  • writeKeychainEntriesFor (server.ts:1151–1158) is still sequential. On POST/PUT with a stdio server carrying 5–10 env vars, that's 50-500ms of serialized native calls. Not worth parallelizing yet, but the asymmetry with readKeychainEntriesFor is worth calling out in a future perf pass.
  • migratePlaintextSecrets partial-migration state (server.ts:1208–1246): when set throws mid-loop (e.g. keychain becomes unavailable between servers), some servers have already been set in the keychain but the disk plaintext is preserved for all (we return the original config with changed: false). This is the correct failure mode — the next successful GET reconciles everything via the "keychain wins" path — but it's subtle enough that a one-line comment explaining the partial-state behavior in the catch block would help future readers.

Summary

Severity Count
Blocking 0
Nice-to-have 2

Approve once CI is green — the blocking concurrency fix is in, the partial-failure path is now retry-safe, and the new regression test pins the contract. Nice work on the iteration.
· feat/1356-oauth-client-secret-keychain

…mment (#1356)

Two nice-to-have observations from the re-review:

- `writeKeychainEntriesFor` now `Promise.all`s its per-field sets,
  matching the read side. No ordering requirement among distinct
  (id, field) keys; on unavailability every set rejects and Promise.all
  surfaces the first rejection (still 503 from the route handler).
- `migratePlaintextSecrets` catch block now spells out the
  partial-migration semantics: we return the original config so the
  disk plaintext stays intact, and the next successful GET retries
  with the keychain-wins branch absorbing already-set entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cliffhall
Copy link
Copy Markdown
Member Author

Thanks for the re-review and approve. Picked up both nice-to-haves in 18012ae1:

  • writeKeychainEntriesFor parallelism: now Promise.alls its per-field sets, matching the read side. No ordering requirement among distinct (id, field) keys; on unavailability every set rejects and Promise.all surfaces the first rejection (route still translates to 503).
  • Migration partial-state comment: added an inline explanation in the catch block clarifying that we deliberately return the original config rather than the partial next, and that the next successful GET re-runs migration with the "keychain wins on conflict" branch silently absorbing already-set entries. No data loss; one wasted set on retry per already-migrated entry.

CI should be green.

@cliffhall
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 25, 2026

Claude finished @cliffhall's task in 1m 0s —— View job


Third review — 18012ae1

Both nice-to-haves from the prior round land cleanly. Nothing new surfaced; the keychain pathway is in good shape across all four routes.


writeKeychainEntriesFor parallelism (server.ts:1151-1165)

The Promise.all shape mirrors readKeychainEntriesFor (server.ts:1177-1189) exactly, which makes the symmetry easy to read. The inline rationale calls out the right two invariants:

  1. No ordering requirement among distinct (id, field) keys — true, since each set is a write to an independent keychain row keyed on ${serverId}:${field}.
  2. On unavailability every set rejects; Promise.all surfaces the first rejection — and the route's keychainErrorResponse translates to a 503. The fail-fast behavior is the right contract here: once the keychain is down, any remaining sets are guaranteed to fail too, so Promise.all's short-circuit is not a correctness loss.

✅ Migration partial-state comment (server.ts:1248-1256)

The comment makes the self-healing loop explicit: changed:false + return config (not partial next) means disk stays as the source of truth for any not-yet-migrated entries, and the next successful GET walks the same entries through the existing !== null "keychain wins" branch without duplicating work. This exactly matches the behavior I asked for — good.


One micro-observation, not blocking

deleteKeychainFields (server.ts:1295-1302) is still a sequential for … await. It's only used in one place — the PUT in-place obsolete-fields cleanup (server.ts:1586) — and only runs when the user actually removes fields from an entry, so the exposure is small. For symmetry with the new writeKeychainEntriesFor (and the same 10-50ms-per-round-trip reasoning on macOS), it could also be Promise.all'd. Noting it rather than requesting — the write-side parallelism is the hot path and that's where it matters.


Summary

Severity Count
Blocking 0
Nice-to-have 1 (deleteKeychainFields symmetry)

LGTM — approve once CI is green. The full arc across the three review rounds (race fix → write-ordering flip → parallelism + partial-state doc) leaves the keychain subsystem in a retry-safe, low-latency, well-documented state. Nice close.
· feat/1356-oauth-client-secret-keychain

Picks up the symmetry observation from the third review: `deleteKeychainFields`
now uses `Promise.all` like its read and write siblings. Distinct (id, field)
deletes have no ordering requirement, and `secretStore.delete` is already a
silent no-op on unavailability so Promise.all has no failure-mode surprise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cliffhall
Copy link
Copy Markdown
Member Author

Thanks for the approve and the symmetry catch. Picked it up in 7ed78b52deleteKeychainFields now uses Promise.all like its read/write siblings. No ordering requirement among distinct (id, field) deletes, and since secretStore.delete is already a silent no-op on unavailability, Promise.all has no failure-mode surprise to introduce.

All three keychain helpers (read / write / delete) now have the same shape.

@cliffhall
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 25, 2026

Claude finished @cliffhall's task in 1m 17s —— View job


Fourth review — 7ed78b52

Small, focused follow-up — the symmetry catch from round three is landed cleanly. Nothing new surfaces; the keychain subsystem is in good shape across every route.


deleteKeychainFields parallelism (server.ts:1295-1304)

The shape now matches its read/write siblings exactly:

// Read  (1177)  await Promise.all(fields.map(async (field) => [field, await secretStore.get(id, field)] as const))
// Write (1160)  await Promise.all(Object.entries(secrets).map(([f,v]) => secretStore.set(id, f, v)))
// Delete (1303) await Promise.all(fields.map((field) => secretStore.delete(id, field)))

The inline rationale at server.ts:1295-1298 calls out the two invariants that justify Promise.all here:

  1. No ordering requirement — distinct (id, field) keys are independent rows; deletes don't observe each other.
  2. No failure-mode surpriseKeyringSecretStore.delete already silently swallows both NoEntry and keychain-unavailability errors (secret-store.ts:120-133), so Promise.all's first-rejection short-circuit has nothing to short-circuit on. The InMemorySecretStore analogue is a Map.delete — can't throw either.

✅ Coverage stays sound

The in-place PUT regression at servers-route.test.ts:1082-1124 exercises the obsolete-fields path end-to-end: seeds two env keys in the keychain, PUTs with one removed, asserts both the kept key's updated value and the removed key's null read. The parallelism change is observationally identical to the sequential form here (final state, not interleaving, is what the test observes) — no new test needed.

✅ Final shape of the keychain helper trio

All three helpers now share the same calling convention and comment block structure, and the three write-site orderings (POST → sweep/set/disk, PUT rename → set/disk/sweep-old-id, PUT in-place → set/disk/delete-obsolete) are consistent with the "hard-failing step first, destructive step last" principle documented at server.ts:1560-1575.


Summary across all four review rounds

Round Items Status
1 — initial GET migration race (blocking), POST/PUT write ordering, 4 nits All addressed in c2bd6d00
2 — re-review 2 nice-to-haves (writeKeychainEntriesFor parallelism, migration partial-state comment) Addressed in 18012ae1
3 — third pass 1 nice-to-have (deleteKeychainFields symmetry) Addressed in 7ed78b52
4 — this review 0

LGTM — approve once CI is green. The full arc (race fix → write-ordering flip → parallel write + partial-state doc → parallel delete) leaves the keychain subsystem retry-safe, low-latency, symmetrical across read/write/delete, and well-documented in both code and the spec. Nice close.
· feat/1356-oauth-client-secret-keychain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v2 Issues and PRs for v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move oauthClientSecret out of mcp.json into the OS keychain

1 participant