Skip to content

fix(sessions): ancestor-chain ownership — Cursor session close / audit / worklog were dead#153

Merged
George-iam merged 2 commits into
mainfrom
fix/cursor-session-ownership-20260611
Jun 11, 2026
Merged

fix(sessions): ancestor-chain ownership — Cursor session close / audit / worklog were dead#153
George-iam merged 2 commits into
mainfrom
fix/cursor-session-ownership-20260611

Conversation

@George-iam

Copy link
Copy Markdown
Contributor

Why

The full functional QA pass on extension 0.1.6 (today) found its one real bug — and it's a launch blocker for the Cursor channel: axme_begin_close returns "No active AXME session found" on every Cursor extension install, which silently kills session close, the audit pipeline, and worklog updates for the whole channel. (Almost certainly also the root cause behind the earlier remote-machine report of being unable to close a session.)

Root cause

Hooks record ownerPpid = getClaudeCodePid() — their grandparent, one step above the sh wrapper. The MCP server matched with strict ownerPpid === process.ppid:

Claude Code:                          Cursor:
claude(A) ─┬─ sh → hook   owner=A     cursor-server(A) ─┬─ sh → hook        owner=A
           └─ server      ppid=A ✓                      └─ exthost(B) → server  ppid=B ✗

One extra process layer → strict equality never matches. The stale-adoption fallback doesn't fire either: cursor-server is alive. QA captured the live process tree: hooks' owner 1382370 (cursor-server) vs server's ppid 1383835 (exthost).

Fix

getOwnAncestorPids(maxDepth=4) walks the server's ancestor chain and ownership checks test membership in that set:

  • Linux: /proc/<pid>/stat walk (microseconds; reuses the parser getClaudeCodePid already had)
  • macOS: ps -o ppid= per level
  • Windows: the whole chain in one PowerShell CIM call (per-level spawns would cost seconds at server startup)
  • Any failure stops the walk → chain degrades to [process.ppid] = the old strict behavior

chain[0] is process.ppid, so Claude Code behavior is unchanged; Cursor matches at chain[1]. Applied to all three ownership sites (getOwnedSessionIdForLogging, cleanupAndExit, auditOrphansInBackground). The stale-adoption fallback (VS Code reload recovery) is untouched.

Verification

  • 613/613 tests (5 new in test/session-ownership.test.ts: chain[0]==ppid, depth, uniqueness, Claude-Code membership invariant), tsc clean, build clean
  • E2E against the built dist/server.js with a real bash interposer reproducing the Cursor topology:
    • mapping owned by the server's grandparentbegin_close returns the checklist
    • control with an alive unrelated owner pid → still No active AXME session found ✅ (matching is selective, not always-true)
    • dead owner pid → stale-adoption still adopts ✅
  • After release: re-run QA checks 2.6 / 5.1–5.3 in Cursor (needs the fixed CLI inside the .vsix → ships in v0.6.2 + extension-v0.1.7)

Risk

PID-reuse window widens from one pid to ≤4 ancestor pids — bounded by: same workspace storage, mapping file must exist, and the matched pid must be an ancestor of a live MCP server. Negligible vs. the channel being completely broken.

🤖 Generated with Claude Code

…ssion close was dead

QA finding (extension 0.1.6 full functional pass, 2026-06-11):
axme_begin_close returned "No active AXME session found" on every Cursor
extension install, which silently killed session close, the audit
pipeline and worklog updates for the whole channel.

Root cause: hooks record ownerPpid = getClaudeCodePid() (their
grandparent above the sh wrapper). Under Claude Code that PID equals the
MCP server's PARENT — one claude process spawns both, so the strict
`ownerPpid === process.ppid` equality worked. Cursor adds a layer:
hooks hang off the cursor-server main process while the MCP server is a
child of the EXTENSION HOST:

  cursor-server(A) ─┬─ sh → hook            ownerPpid = A
                    └─ exthost(B) → server  process.ppid = B  ≠ A

The stale-adoption fallback never fired either — A is alive.

Fix: getOwnAncestorPids(maxDepth=4) walks the server's ancestor chain
(Linux: /proc, microseconds; macOS: ps per level; Windows: whole chain
in ONE powershell CIM call) and ownership checks now test membership in
that set. chain[0] is process.ppid, so Claude Code behavior is bit-for-
bit unchanged; Cursor matches at chain[1]. Applied to all three sites:
getOwnedSessionIdForLogging, cleanupAndExit, auditOrphansInBackground.
Stale-adoption fallback untouched.

Verification:
- 613/613 tests (5 new in test/session-ownership.test.ts), tsc, build.
- E2E against the built dist/server.js with a real bash interposer
  reproducing the Cursor topology: mapping owned by the server's
  grandparent -> begin_close returns the checklist; control with an
  ALIVE unrelated owner pid -> still "No active AXME session found"
  (matching is selective); dead owner pid -> stale-adoption still
  fires (VS Code reload recovery preserved).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The CI matrix caught a real defect in the ancestor-chain fix: the lazy
`require("node:child_process")` inside readParentPidPosix and
getOwnAncestorPidsWindows throws ReferenceError under ESM (the package
is type:module), gets swallowed by the try/catch, and silently degrades
the chain to [process.ppid] — i.e. the Cursor session-ownership fix
would have been a no-op on macOS and Windows. Linux was unaffected
(/proc path, no exec), which is why the local 613/613 run was green
while the macOS CI leg failed the new ">=2 ancestors" assertion
(chain came back as a single element).

Replaced with a top-level static import; node:child_process is
side-effect-free and cheap.

Note: the parallel `ensureAxmeSessionForClaude` E2E in audit-dedup
flaked once during the full local run — 3x green in isolation and a
full-suite rerun is 0-fail; pre-existing flake (also seen during the
v0.6.0 release prep), unrelated to this change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@George-iam George-iam merged commit b4c5173 into main Jun 11, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant