Skip to content

Ride out daemon restarts in MCP daemon clients#179

Merged
ScriptedAlchemy merged 2 commits into
masterfrom
codex/update-daemon-reconnect
Jul 2, 2026
Merged

Ride out daemon restarts in MCP daemon clients#179
ScriptedAlchemy merged 2 commits into
masterfrom
codex/update-daemon-reconnect

Conversation

@ScriptedAlchemy

Copy link
Copy Markdown
Owner

Summary

tracedecay update replaces the binary and (via post-update) restarts the daemon service with systemctl --user restart tracedecay.service. The old daemon unlinks its socket on shutdown, and the new one rebinds it a moment later. Because Cursor's tracedecay serve stdio proxy opens a fresh socket connection per request (send_daemon_request_line), any MCP request landing in that restart window failed instantly with a hard JSON-RPC error (No such file or directory / Connection refused) even though the daemon was back milliseconds later.

This PR makes live MCP sessions ride out a self-update without breaking:

  • connect_with_restart_grace: retries transient connect failures (NotFound, ConnectionRefused) for up to 8s (200ms poll) before erroring. Retrying at connect time is safe — nothing has been written yet, so requests can't be duplicated. Non-transient errors (e.g. permission denied) still fail immediately.
  • Wired into both daemon client paths: the serve stdio proxy (send_daemon_request_line) and the CLI daemon tool-call path (call_tool).
  • When a request still fails, the error now tells the agent why and what to do: connect failures hint that the daemon may be restarting after tracedecay update, and the mid-request "daemon closed the connection" error now says the daemon may have restarted and the request can be retried.

Not changed (assessment): the daemon restart itself already works — tracedecay update runs the new binary's post-update, which rewrites the systemd unit and restarts the service, and the per-request reconnect design means the proxy transparently talks to the new daemon afterwards. The old serve process keeps running the old binary as a thin line proxy, which is compatible since all MCP logic lives daemon-side.

Test plan

  • New unit tests in src/daemon.rs:
    • connect_with_restart_grace_reconnects_once_daemon_rebinds — socket absent at first, daemon binds 300ms later, connect succeeds.
    • connect_with_restart_grace_gives_up_with_restart_hint — no daemon ever binds; error names the socket and hints at the update/restart cause.
    • proxied_request_survives_daemon_restart_window — full send_daemon_request_line round trip against a fake daemon that only starts listening mid-request.
    • transient_daemon_connect_errors_cover_restart_window_only — error-kind classification.
  • cargo test --lib daemon:: (29 passed)
  • cargo test --test mcp_cli_serve_test --test tool_daemon_test (31 passed)
  • cargo clippy --all-targets and cargo fmt --check clean

@changeset-bot

changeset-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 2410265

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9e6a2a687

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/daemon.rs Outdated
ScriptedAlchemy added 2 commits July 2, 2026 00:18
`tracedecay update` restarts the daemon service; between the old daemon
unlinking its socket and the new one binding it, client connects fail
with NotFound/ConnectionRefused. Retry transient connect errors for up
to 8s (200ms poll) in the serve stdio proxy and CLI call_tool paths so
live MCP sessions survive a self-update. Non-transient errors still
fail fast, and retries only happen before any bytes are written.

Also drop the listener and unlink the socket before draining engine
state on shutdown, so clients connecting during the drain window get a
retryable error instead of a connection that is never served, and add
restart-aware hints to connect/close errors.
@ScriptedAlchemy ScriptedAlchemy force-pushed the codex/update-daemon-reconnect branch from 1788694 to 2410265 Compare July 2, 2026 00:26
@ScriptedAlchemy ScriptedAlchemy merged commit e898814 into master Jul 2, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant