Ride out daemon restarts in MCP daemon clients by ScriptedAlchemy · Pull Request #179 · ScriptedAlchemy/tracedecay

ScriptedAlchemy · 2026-07-01T23:53:15Z

Summary

tracedecay update replaces the binary and (via post-update) restarts the daemon service with systemctl --user restart tracedecay.service. The old daemon unlinks its socket on shutdown, and the new one rebinds it a moment later. Because Cursor's tracedecay serve stdio proxy opens a fresh socket connection per request (send_daemon_request_line), any MCP request landing in that restart window failed instantly with a hard JSON-RPC error (No such file or directory / Connection refused) even though the daemon was back milliseconds later.

This PR makes live MCP sessions ride out a self-update without breaking:

connect_with_restart_grace: retries transient connect failures (NotFound, ConnectionRefused) for up to 8s (200ms poll) before erroring. Retrying at connect time is safe — nothing has been written yet, so requests can't be duplicated. Non-transient errors (e.g. permission denied) still fail immediately.
Wired into both daemon client paths: the serve stdio proxy (send_daemon_request_line) and the CLI daemon tool-call path (call_tool).
When a request still fails, the error now tells the agent why and what to do: connect failures hint that the daemon may be restarting after tracedecay update, and the mid-request "daemon closed the connection" error now says the daemon may have restarted and the request can be retried.

Not changed (assessment): the daemon restart itself already works — tracedecay update runs the new binary's post-update, which rewrites the systemd unit and restarts the service, and the per-request reconnect design means the proxy transparently talks to the new daemon afterwards. The old serve process keeps running the old binary as a thin line proxy, which is compatible since all MCP logic lives daemon-side.

Test plan

New unit tests in src/daemon.rs:
- connect_with_restart_grace_reconnects_once_daemon_rebinds — socket absent at first, daemon binds 300ms later, connect succeeds.
- connect_with_restart_grace_gives_up_with_restart_hint — no daemon ever binds; error names the socket and hints at the update/restart cause.
- proxied_request_survives_daemon_restart_window — full send_daemon_request_line round trip against a fake daemon that only starts listening mid-request.
- transient_daemon_connect_errors_cover_restart_window_only — error-kind classification.
cargo test --lib daemon:: (29 passed)
cargo test --test mcp_cli_serve_test --test tool_daemon_test (31 passed)
cargo clippy --all-targets and cargo fmt --check clean

changeset-bot · 2026-07-01T23:53:20Z

⚠️ No Changeset found

Latest commit: 2410265

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9e6a2a687

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

`tracedecay update` restarts the daemon service; between the old daemon unlinking its socket and the new one binding it, client connects fail with NotFound/ConnectionRefused. Retry transient connect errors for up to 8s (200ms poll) in the serve stdio proxy and CLI call_tool paths so live MCP sessions survive a self-update. Non-transient errors still fail fast, and retries only happen before any bytes are written. Also drop the listener and unlink the socket before draining engine state on shutdown, so clients connecting during the drain window get a retryable error instead of a connection that is never served, and add restart-aware hints to connect/close errors.

…-reconnect

chatgpt-codex-connector Bot reviewed Jul 1, 2026

View reviewed changes

Comment thread src/daemon.rs Outdated

ScriptedAlchemy added 2 commits July 2, 2026 00:18

Merge remote-tracking branch 'origin/master' into codex/update-daemon…

2410265

…-reconnect

ScriptedAlchemy force-pushed the codex/update-daemon-reconnect branch from 1788694 to 2410265 Compare July 2, 2026 00:26

ScriptedAlchemy merged commit e898814 into master Jul 2, 2026
18 checks passed

ScriptedAlchemy mentioned this pull request Jul 2, 2026

feat(daemon): version handshake, skew warnings, restart-aware serve #182

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ride out daemon restarts in MCP daemon clients#179

Ride out daemon restarts in MCP daemon clients#179
ScriptedAlchemy merged 2 commits into
masterfrom
codex/update-daemon-reconnect

ScriptedAlchemy commented Jul 1, 2026

Uh oh!

changeset-bot Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ScriptedAlchemy commented Jul 1, 2026

Summary

Test plan

Uh oh!

changeset-bot Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

changeset-bot Bot commented Jul 1, 2026 •

edited

Loading