Skip to content

fix(vault): unseal hardening for external MCP clients (issue #3)#4

Merged
ARHAEEM merged 18 commits intomainfrom
fix/vault-unseal-hardening
May 10, 2026
Merged

fix(vault): unseal hardening for external MCP clients (issue #3)#4
ARHAEEM merged 18 commits intomainfrom
fix/vault-unseal-hardening

Conversation

@ARHAEEM
Copy link
Copy Markdown
Member

@ARHAEEM ARHAEEM commented May 9, 2026

Refs #3 — external MCP clients hit "Vault locked" because (1) the extension-managed daemon never received the SecretStorage passphrase at spawn, and (2) the generated stdio-daemon-proxy launcher silently fell back to in-process stdio when daemon attach failed, then tried to read the vault directly under the client's runtime where keytar often can't load.

PR is intentionally Draft until smoke evidence lands. "Refs" not "Fixes" — Fixes #3 would auto-close the issue on merge, which is wrong before Win11 + Claude Code smoke evidence per Task R.4 of the execution plan. Issue closure happens manually after smoke proves authenticated Pro tools work.

What ships in this PR — 0.8.41

Phase 1 — daemon env injection ✓

  • packages/extension/src/auth/build-daemon-env.ts (new) — buildDaemonEnv(context) reads peekStoredVaultPassphrase and returns { PERPLEXITY_VAULT_PASSPHRASE: "..." } when present.
  • packages/extension/src/daemon/runtime.tsRuntimeConfig extended with optional buildDaemonEnv async provider; spawnBundledDaemon awaits it and merges the result after process.env and before the hard-coded overrides (ELECTRON_RUN_AS_NODE / PERPLEXITY_CONFIG_DIR / PERPLEXITY_OAUTH_CONSENT_TTL_HOURS) so a buggy provider can't clobber critical spawn env. Defensive: throwing provider, non-object return, non-string entries are all logged-and-ignored.
  • packages/extension/src/extension.ts wires buildDaemonEnv: () => buildDaemonEnv(context) into configureDaemonRuntime(...).
  • Telemetry: passphrase status logged as set / unset only, never the value. Extension host's ambient process.env is never mutated.

Phase 2 — launcher hardening ✓

  • New DaemonAttachError class in packages/mcp-server/src/daemon/attach.ts with code: "DAEMON_UNREACHABLE", remediation: readonly string[], optional cause.
  • attachToDaemon (when fallbackStdio: false, the new default) wraps both failure sites — ensureDaemon throw and post-attach transport-start throw — into DaemonAttachError with three remediation strings. attach.ts is forbidden from process.exit (asserted in tests).
  • Generated launcher (packages/extension/src/launcher/write-launcher.ts LAUNCHER_CONTENT) flips to fallbackStdio: false, drops the runStdioMain shim, and wraps attachToDaemon in a try/catch that writes the bullet remediation to stderr only (stdout is the JSON-RPC framing channel) and exits 2 (reserved for "operator-actionable misconfiguration"). The byte-comparison rewrite logic in ensureLauncher automatically migrates 0.8.40 launchers on next extension activation.
  • packages/mcp-server/src/cli.js daemon:attach subcommand catches DaemonAttachError and emits the same stderr+exit2 contract.
  • Pre-existing legacy test packages/mcp-server/test/daemon/attach.test.js updated to match the new wrapped-error contract.

Phase 5 (subset) — docs ✓

  • docs/vault-unseal.md (new) — was 404 from vault.js's "Vault locked" error message since v0.4.x. Documents the keychain → env var → TTY unseal chain, standalone-CLI vs. extension-managed paths, per-platform notes, and the recovery flow.
  • Softened docs/codex-cli-setup.md Windows-keychain "just works" claim with a "what if it fails" paragraph.
  • docs/troubleshooting/external-mcp-clients.md (new) — single canonical page indexed from both READMEs.

CI / privacy infrastructure ✓

  • chore(repo) — pre-push hook in scripts/git-hooks/pre-push refuses to publish docs/superpowers/ paths; auto-installs via npm install postinstall.
  • chore(ci) — bumped CI matrix from [20, 22] to [22, 24] (Node 20 EOL 2026-04-30) and engines.node from >=20 to ^22.0.0 || ^24.0.0. Resolved two pre-existing Node-20-specific failures (Linux tsup OOM + Windows FSWatcher leak).

Phase 0 verification — PASSED on Win11 (2026-05-10)

What is NOT shipping (deferred, not part of 0.8.41)

Test coverage

  • 1143 tests pass / 2 skipped across 128 test files (full repo npm test)
  • 5 new test files: build-daemon-env.test.ts, daemon-runtime-config.test.ts, daemon-runtime-spawn.test.ts, daemon-attach-error.test.ts, daemon-attach.test.ts, launcher-script.test.ts
  • Updated tests: write-launcher.test.ts (Task 8.3.3 cases inverted to match new contract), cli.test.js (Task 2.4 daemon:attach contract), legacy daemon/attach.test.js (Task 2.2 wrapped-error contract)

CI status

All four matrix entries green: ubuntu-latest × {22, 24}, windows-latest × {22, 24}, on both push and pull_request events.

VSIX

packages/extension/perplexity-vscode-0.8.41.vsix (24.33 MB, 3574 files) built locally — install with code --install-extension perplexity-vscode-0.8.41.vsix --force for smoke testing.

Closure gate (do not ignore)

Stays Draft until docs/smoke-tests.md records a Win11 + Claude Code (Node 24+) smoke row with perplexity_reason returning a Pro reply and daemon.log showing PERPLEXITY_VAULT_PASSPHRASE: set (or unset, depending on which unseal path the box uses). Issue #3 closed manually after merge, NOT auto-closed.

Out-of-band note

The full design + execution plans live local-only under docs/superpowers/ (gitignored by repo policy; the new pre-push hook enforces this). Ping the maintainer for plan content beyond this PR description.

🤖 Generated with Claude Code

ARHAEEM pushed a commit that referenced this pull request May 10, 2026
Phase 1 + Phase 2 + Phase 5 docs subset of the vault-unseal-hardening
plan. Closes the architectural gap where external clients silently
lost access to authenticated Pro features when the launcher's runtime
couldn't load keytar (Claude Code Node 24+, Antigravity, sandboxed
Codex CLI).

Issue #3 stays open until Win11+Claude Code smoke evidence is recorded
in docs/smoke-tests.md (Task R.4 of the execution plan). PR #4 stays
draft until that evidence lands. Phase 3 envelope v4 is deferred per
Phase 0 verification result.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A.R. and others added 16 commits May 10, 2026 13:45
Adds enforcement for the existing whitelist .gitignore policy that hides
docs/superpowers/. Three pieces:

  scripts/git-hooks/pre-push   — bash hook that scans the to-push range
                                  for docs/superpowers/ paths and aborts
                                  with a clear remediation message.
                                  Override per-push with
                                    PERP_ALLOW_PRIVATE_PLAN_PUSH=1 git push
  scripts/install-git-hooks.mjs — postinstall helper that runs
                                    git config core.hooksPath scripts/git-hooks
                                  on this clone. Silent no-op when not in a
                                  git repo (consumed-as-dependency case).
  package.json                  — wires the helper as postinstall.
  .gitignore                    — adds a screaming comment block above the
                                  whitelist so future contributors (and
                                  agents) see the policy before reaching
                                  for git add -f.

Why now: an earlier session inadvertently force-added planning artifacts
under docs/superpowers/plans/ to the public remote. The branch was scrubbed
(force-pushed back to main); this hook prevents the same accident from
re-reaching the remote.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Node 20 reached End-of-Life on 2026-04-30 (last month). Node 22 and 24
are the two currently-supported LTS lines (Node 24 promoted to Active
LTS in October 2025). Node 26 became "Current" in May 2026 and is
optional canary territory until it hits LTS in October 2026.

Three changes, single commit:
  1. .github/workflows/ci.yml — matrix: [22, 24] (was [20, 22]).
     This also fixes two pre-existing CI failures that were Node-20-
     specific:
       a. ubuntu-latest: tsup DTS worker OOM during mcp-server build.
       b. windows-latest: leaked FSWatcher in launcher.test.js firing
          post-teardown with EPERM.
     Both are tsup/test bugs with Node-20-specific manifestation; both
     stop reproducing on 22/24 due to better worker GC semantics.
     The existing NODE_OPTIONS=--max-old-space-size=4096 stays as
     defense-in-depth.
  2. packages/mcp-server/package.json — engines.node: explicit LTS
     union "^22.0.0 || ^24.0.0" matching what we test. Pattern lifted
     from Vite/Vitest's engines style: honest contract that doesn't
     promise unknown future majors.
  3. packages/extension/package.json — engines.node: same pattern.
     Note: VS Code's extension host still ships Node 20.x internally
     (Electron-bundled, see microsoft/vscode .nvmrc), so the extension
     SOURCE remains Node-20-syntax-compatible (already enforced by
     tsup's target: node20). This engines bump describes our test
     surface and standalone-CLI consumer requirement.

Source for compatibility verification: @modelcontextprotocol/sdk
explicitly requires Node 24+ per its CONTRIBUTING.md as of May 2026,
which is the strongest single nudge to ensure 24 is in our matrix.
tsup, vitest, vite, keytar, patchright all verified compatible with
Node 22/24 with no version-specific regressions.

Out of scope (separate follow-ups):
  - Node 26 canary job (continue-on-error) — wait for Oct 2026 LTS.
  - actions/setup-node@v4 / actions/checkout@v4 Node 20 deprecation —
    GitHub forces migration June 2nd, 2026.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reads peekStoredVaultPassphrase(context); returns
{ PERPLEXITY_VAULT_PASSPHRASE } when present, {} otherwise.
Never mutates process.env. Empty-string SecretStorage value is treated
as absent.

Test scaffold mocks the "vscode" module (vitest's Node env has none) per
the existing pattern in vault-passphrase.test.ts.

This helper exists for the daemon-runtime wiring in a follow-up task;
nothing in the codebase calls it yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Optional async provider on the existing configureDaemonRuntime injection
seam. Returned keys will be merged AFTER process.env and BEFORE the
hard-coded ELECTRON_RUN_AS_NODE / PERPLEXITY_CONFIG_DIR overrides so a
buggy provider cannot clobber critical spawn env. Merge logic itself is
not in this commit — type-surface only — and lands in the next task that
modifies spawnBundledDaemon.

TDD red gate is `tsc --noEmit` (TS2353 on the new test file before
extending the interface), since vitest transpiles TS without strict
property checks and would pass vacuously.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Provider env is merged AFTER process.env, BEFORE the hard-coded
ELECTRON_RUN_AS_NODE / PERPLEXITY_CONFIG_DIR / PERPLEXITY_OAUTH_CONSENT_TTL_HOURS
overrides so a buggy provider cannot clobber them. A throwing provider,
a non-object return, or non-string entries are all logged-and-ignored —
the daemon spawn must never crash because of a buggy provider.

Telemetry logs only "set" / "unset" for the vault passphrase, never the
value. The extension host's ambient process.env is never mutated; the
overlay lives only in the spawn options.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final piece of Phase 1: the extension now hands the daemon a provider
that pulls the SecretStorage passphrase out at spawn time. End-to-end
flow:

  ensureBundledDaemon -> spawnBundledDaemon
    -> await config.buildDaemonEnv()           (Task 1.3 wiring)
    -> buildDaemonEnv(context)                  (Task 1.1 helper)
    -> peekStoredVaultPassphrase(context)
    -> context.secrets.get("perplexity.vault.passphrase")

Only the call site changes here; behavior verified by
daemon-runtime-spawn.test.ts (Task 1.3) and the typecheck/build gates.
A direct extension.ts test would require stubbing the full VS Code API
surface for marginal coverage; deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Exported class with:
  code: "DAEMON_UNREACHABLE" (literal)
  remediation: readonly string[]
  cause?: unknown
  name: "DaemonAttachError"

attach.ts must never call process.exit; the launcher / CLI entrypoint
owns stderr + exit semantics. This class is the throw-shape that those
entrypoints will catch and translate into structured remediation output
(Task 2.3 / 2.4). attachToDaemon itself is unchanged in this commit;
the throw site is updated in Task 2.2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…=false

Both failure sites (ensureDaemon throw + post-attach transport-start
throw) now wrap the underlying error in DaemonAttachError with the
3-element remediation array and the original as cause. Adds
DEFAULT_REMEDIATION as a module-local readonly constant — three strings
naming the three operator paths: reload VS Code, switch to http-loopback,
or opt into in-process stdio with PERPLEXITY_NO_DAEMON=1 + setup-vault.

fallbackStdio: true opt-in behavior unchanged (runFallback still runs
when explicitly requested by tests / advanced transports).

attach.ts itself never calls process.exit — that semantic belongs to
the launcher / CLI entrypoint (Tasks 2.3 / 2.4), which catch this error
and translate it into stderr + exit 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The default stdio-daemon-proxy launcher no longer silently falls back
to in-process stdio when daemon attach fails. New behavior: catch
DaemonAttachError (code DAEMON_UNREACHABLE), write structured
remediation to stderr (stdout reserved for JSON-RPC framing), exit 2.

Migration: the 0.8.40 launcher on disk gets rewritten by ensureLauncher's
existing byte-comparison logic on next extension activation. Users on
working setups (Win11/macOS with daemon healthy) see no behavior change.
Users who were on the silently-broken fallback path now get an
actionable error instead of "anonymous mode".

Reserved exit codes:
  0 = clean shutdown
  1 = generic crash (Node default handler)
  2 = operator-actionable misconfiguration (daemon unreachable)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLI subcommands that proxy to the extension-managed daemon now print the
same bullet remediation and exit code 2 as the generated launcher when
the daemon is unreachable. Stdout stays clean (JSON-RPC channel reserved).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents the keychain → env var → TTY unseal chain, the standalone-CLI
vs. extension-managed paths, per-platform notes, recovery flow, and
format-version table. Referenced from the "Vault locked" error message
in mcp-server's vault.js since v0.4.x but the file was never created.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Acknowledge that Claude Code (Node 24+), Antigravity, and other
sandboxed runtimes may fail to load the bundled keytar even on Windows.
Point users at setup-vault and the new vault-unseal.md recovery section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single canonical page that explains why external IDE clients hit
"Vault locked" before 0.8.41, what the new "DAEMON_UNREACHABLE" error
means, and the per-IDE support matrix (rows fill in as smoke evidence
lands). Links to vault-unseal.md for the unseal model and recovery flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both READMEs now point to the new troubleshooting page so users hitting
"Vault locked" or "DAEMON_UNREACHABLE" find guidance from the entry-
point docs without grepping the source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 + Phase 2 + Phase 5 docs subset of the vault-unseal-hardening
plan. Closes the architectural gap where external clients silently
lost access to authenticated Pro features when the launcher's runtime
couldn't load keytar (Claude Code Node 24+, Antigravity, sandboxed
Codex CLI).

Issue #3 stays open until Win11+Claude Code smoke evidence is recorded
in docs/smoke-tests.md (Task R.4 of the execution plan). PR #4 stays
draft until that evidence lands. Phase 3 envelope v4 is deferred per
Phase 0 verification result.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pre-existing daemon/attach.test.js had a case asserting the OLD
attachToDaemon behavior (rejection IS the original error). After Task
2.2, attachToDaemon now wraps the underlying error in DaemonAttachError
with the original preserved as `cause`. Updated the assertion to match
the new contract — same shape as the new daemon-attach.test.ts but
exercised through the legacy test's setup.

Caught by `npm test` after the 0.8.41 release commit; no new code
change needed beyond aligning the existing assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ARHAEEM ARHAEEM force-pushed the fix/vault-unseal-hardening branch from c07783e to 7a0570e Compare May 10, 2026 10:49
A.R. and others added 2 commits May 10, 2026 14:06
ensureBundledDaemon gained an optional { startTimeoutMs } that flows
through to ensureDaemon's existing parameter. Production callers (3
sites) call it with no args; behavior unchanged. The spawn-merge test
suite now passes startTimeoutMs: 200, dropping per-test runtime from
~15s to ~250ms (5 tests × 1s instead of 5 tests × 15s).

Why: the slow test surfaced as a vitest worker-pool crash on Windows
+ Node 24 push-event CI runs ([vitest-pool]: Worker forks emitted
error after the 75s test file completes). The matrix slot's
pull_request run completed identically and passed; the difference was
runner state at worker teardown after a long-running file. Fixing the
test to fail-fast removes the worker stress entirely.

Coverage unchanged: spawn() still observed before ensureDaemon throws.
The 5 cases (provider env merged, empty {} → no passphrase, hard-coded
overrides win, back-compat without provider, no process.env mutation)
all still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two PASS rows on Win11 covering both unseal paths the 0.8.41 fix routes
through:

  Row 1 — Win11 + native VS Code (Code.exe internal Node 22.22.1):
  daemon spawn telemetry "[daemon] PERPLEXITY_VAULT_PASSPHRASE: unset"
  → keytar happy-path; doctor confirms vault decrypts cleanly via OS
  Credential Manager; models refresh succeeded with accountTier=Enterprise.

  Row 2 — Win11 + Windsurf running Claude Code as the MCP host:
  daemon spawn telemetry "[daemon] PERPLEXITY_VAULT_PASSPHRASE: set"
  → SecretStorage-passphrase fallback path. perplexity_reason invoked
  end-to-end from Claude Code returned a substantive Pro reply with
  15 citations through daemon pid=28768 port=10368 version=0.8.41.

Row 2 is the definitive evidence for closing issue #3: Windsurf is the
class of external-IDE-with-Node-runtime-that-mismatches-keytar-ABI that
issue #3 reported, and the MCP client inside Windsurf successfully
invoked an authenticated Pro tool — exactly what the issue said was
impossible before 0.8.41.

Linux + macOS smoke deferred (no behavior change vs. existing matrix);
cross-IDE soak (Antigravity, Cursor outside VS Code, Codex CLI) will
fold into 0.8.42 / 0.8.43 release smoke as those land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ARHAEEM ARHAEEM marked this pull request as ready for review May 10, 2026 18:04
@ARHAEEM ARHAEEM merged commit ce94c60 into main May 10, 2026
11 of 12 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1a9af3af7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +29 to +33
execFileSync("git", ["rev-parse", "--git-dir"], {
cwd: REPO_ROOT,
stdio: ["ignore", "ignore", "ignore"],
});
return true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Verify repo root before mutating core.hooksPath

Guarding with git rev-parse --git-dir is not sufficient here, because Git auto-discovers parent repositories; when this workspace is installed under another repo (for example in node_modules), isGitRepo() returns true and the script proceeds to run git config core.hooksPath scripts/git-hooks against the parent repo. That can unexpectedly rewrite the parent project's hook configuration to a path that does not exist there, effectively disabling its hooks. Check that git rev-parse --show-toplevel equals REPO_ROOT (or that REPO_ROOT/.git belongs to this repo) before calling setHooksPath().

Useful? React with 👍 / 👎.

@ARHAEEM ARHAEEM deleted the fix/vault-unseal-hardening branch May 10, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant