Skip to content

fix(sessions): durably write-ahead the start prompt so it can't be lost#2666

Open
pauldambra wants to merge 4 commits into
mainfrom
posthog-code/durable-prompt-outbox
Open

fix(sessions): durably write-ahead the start prompt so it can't be lost#2666
pauldambra wants to merge 4 commits into
mainfrom
posthog-code/durable-prompt-outbox

Conversation

@pauldambra

Copy link
Copy Markdown
Member

Problem

Starting a local task run can lose the user's prompt. The prompt only exists in renderer memory until the session has fully initialized and session/prompt has been delivered (a separate step after agent.start). If session init exceeds the 30s SESSION_VALIDATION_TIMEOUT_MS — common in large monorepos, where loading skills/commands blows the budget — or the app is reloaded/quit/crashes during the auto-retry window, the prompt is gone and the run is left stranded in queued.

This is the root cause behind "I lost my prompt while starting a task" reports in the monorepo: session init repeatedly times out and the only copy of the prompt is in-memory state tied to a session that never came up.

Why: the prompt is the one thing in the start-a-task flow a user can't cheaply reproduce, yet nothing durable owned it before delivery.

Changes

Persist the prompt as a durable write-ahead record before any session/agent work, and clear it only once it has actually been delivered.

  • New PendingPromptStore contract in @posthog/core and an electronStorage-backed zustand store in @posthog/ui (keyed by taskId, mirrors the existing sessionConfigStore).
  • createNewLocalSession now: writes the prompt ahead before createTaskRun/agent.start; records the taskRunId once the run exists; recovers a saved prompt when a later connect arrives without one; clears the record after sendPrompt resolves.
  • A persisted record means "this prompt is owed delivery" — the hook for a future server-side reconciler to re-drive orphaned queued runs.

Out of scope (follow-ups): bumping/adapting the 30s init timeout, and the server-side reconciler + startup sweep that extends the guarantee to the case where the device never returns.

How did you test this?

  • pnpm --filter @posthog/core typecheck and pnpm --filter @posthog/ui typecheck — clean.
  • biome check on all touched files — clean.
  • New unit tests for the store (pendingPromptStore.test.ts).
  • Full suites pass: @posthog/core 1504 tests, @posthog/ui 726 tests (including both modified host suites).

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Created with PostHog Code from a Slack thread

Starting a local task run can lose the user's prompt: it only exists in
memory until the session has fully initialized and `session/prompt` has
been delivered. If `agent.start` exceeds the 30s `SESSION_VALIDATION_TIMEOUT_MS`
(common in large monorepos) or the app is reloaded/quit/crashes during the
retry window, the prompt is gone and the run is stranded in `queued`.

Persist the prompt as a durable write-ahead record BEFORE any session/agent
work, and clear it only once it has actually been delivered:

- New `PendingPromptStore` contract in core and an `electronStorage`-backed
  zustand store in ui (keyed by taskId, mirrors sessionConfigStore).
- `createNewLocalSession` writes the prompt ahead before `createTaskRun` /
  `agent.start`, records the taskRunId once the run exists, recovers a
  saved prompt when a later connect arrives without one, and clears the
  record after `sendPrompt` resolves.

A persisted record means "this prompt is owed delivery", which is also the
hook for a future server-side reconciler to re-drive orphaned runs.

Generated-By: PostHog Code
Task-Id: 22d61751-8d18-4d56-a79e-4741045320c1
@github-actions

github-actions Bot commented Jun 14, 2026

Copy link
Copy Markdown

React Doctor found no issues in the changed files. 🎉

Reviewed by React Doctor for commit 9ec4d33.

@pauldambra pauldambra marked this pull request as ready for review June 14, 2026 13:29
@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
packages/core/src/sessions/sessionService.ts:1085-1130
**Duplicate save object and drifting `createdAt`**

All nine fields in the first `pendingPrompts.save` call are repeated verbatim in the second — the only difference is `taskRunId`. Because both calls evaluate `recovered?.createdAt ?? Date.now()` independently, the second call's `createdAt` is captured *after* the `await client.createTaskRun(taskId)` round-trip, so the value that actually lands in persistent storage is later than "when the prompt was first written ahead". If this timestamp is ever used for staleness checks or a future reconciler, it will undercount age. Building the base record once before both saves fixes the duplication and pins the timestamp correctly.

### Issue 2 of 2
packages/ui/src/features/sessions/sessionServiceHost.test.ts:189-201
**Write-ahead behaviour is not asserted in the host integration tests**

The `pendingPromptStore` mock is wired in but its methods are never checked in any `connectToTask` test. The new critical-path behaviours — `save` called before `createTaskRun`, `save` called again with `taskRunId` after the run exists, `remove` called only after `sendPrompt` resolves, and the recovery path where `get` returns a saved record and re-drives the prompt — are all exercised by the production code under test but invisible to the suite. A future regression here would pass every existing test.

Reviews (1): Last reviewed commit: "fix(sessions): durably write-ahead the s..." | Re-trigger Greptile

Comment thread packages/core/src/sessions/sessionService.ts Outdated
Comment thread packages/ui/src/features/sessions/sessionServiceHost.test.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e9cc0082f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/core/src/sessions/sessionService.ts
Comment thread packages/core/src/sessions/sessionService.ts Outdated

@pauldambra pauldambra left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated comment by QA Swarm — not written by a human

QA Swarm multi-perspective review (paul-reviewer, xp-reviewer, security-audit). qa-team was unavailable and skipped. See inline comments + the summary comment.

Comment thread packages/core/src/sessions/sessionService.ts Outdated
Comment thread packages/core/src/sessions/sessionService.ts Outdated
Comment thread packages/core/src/sessions/sessionService.ts
Comment thread packages/ui/src/features/sessions/sessionServiceHost.recovery.integration.test.ts Outdated
Comment thread packages/ui/src/features/sessions/pendingPromptStore.ts Outdated
Comment thread packages/ui/src/features/sessions/pendingPromptStore.ts

Copy link
Copy Markdown
Member Author

Note

🤖 Automated comment by QA Swarm — not written by a human

Multi-perspective review: paul-reviewer, xp-reviewer, security-audit (qa-team unavailable, skipped)

Verdict: 💬 APPROVE WITH NITS

Solid, well-motivated change that slots cleanly into the existing sessionConfigStore/adapterStore shape. No correctness or security blockers — all findings are non-blocking improvements. Both voice reviewers independently said "ship as you see fit".

Key findings

  • 🟡 MEDIUM — duplicated write-ahead save block (convergent: paul + xp): the PendingPromptRecord is assembled twice, differing only by taskRunId. Build it once and re-save with { ...pending, taskRunId }.
  • 🟡 MEDIUM — recovery is silent: no log/track when a prompt is actually recovered, so we can't tell the safety net is catching anything (or whether the 30s-timeout theory holds in the wild).
  • 🟡 MEDIUM — recovery path untested: get() is mocked to undefined everywhere, so createNewLocalSession's recovery branch never executes in a test. Add one where get() returns a record and assert deliver + remove.
  • 🟢 LOW — "impossible to lose" oversells it: persist is async fire-and-forget and can be dropped pre-hydration; soften the wording.
  • 🟢 LOW — YAGNI: listPrompts() / taskRunId have no consumer until the future reconciler.
  • NIT: removePrompt guard diverges from the sibling removeConfigOptions.

Convergence

  • The duplicated save block was flagged independently by paul and xp → highest-confidence finding.

Reviewer summaries

Reviewer Assessment
👤 paul approve with comments — add a recovery log line + an integration test, soften "impossible", dedupe the save block
📐 xp sound write-ahead design; main concern is the say-it-once duplication in an already-long method; minor YAGNI on list()/taskRunId
🛡 security-audit no findings — key is a server-generated task UUID (no cross-tenant adoption), persisted via the same encrypted electronStorage the existing config store already uses

Automated by QA Swarm — not a human review

…overy, soften wording

From qa-swarm review of this PR:

- Build the PendingPromptRecord once and re-save with `{ ...pending, taskRunId }`
  instead of assembling the 9-field payload twice (convergent paul + xp).
- Log when a written-ahead prompt is actually recovered, so the safety net is
  observable in the wild.
- Soften the "impossible to lose" wording in the doc + comment: persistence is
  async/best-effort, so it's "very unlikely", not guaranteed.

Deferred to a follow-up: an integration test exercising the recovery branch of
createNewLocalSession (currently the store mock returns undefined everywhere).

Generated-By: PostHog Code
Task-Id: 22d61751-8d18-4d56-a79e-4741045320c1

Copy link
Copy Markdown
Member Author

Note

🤖 Automated triage by PR Shepherd — not written by a human

Actioned the qa-swarm findings in bc252db:

  • Dedupe the write-ahead save (convergent paul + xp) — record built once, re-saved with { ...pending, taskRunId }.
  • Recovery is observablelog.info("Recovered a written-ahead prompt…", { taskId, ageMs }) when the safety net catches a prompt.
  • Softened "impossible to lose" → "very unlikely" in both the doc comment and the contract, noting persistence is async/best-effort.

Deferred (author's call):

  • Integration test for the recovery branch — worth pinning, but it means making the store mock return a record in the 950-line host harness; left for a focused follow-up rather than auto-applied.
  • YAGNI on listPrompts()/taskRunId and the removePrompt guard divergence — both are intentional groundwork for the server-side reconciler; keeping as-is.

Tests stay green: core 162 (sessions), ui host+store suites 122.

@pauldambra pauldambra added the Stamphog This will request an autostamp by stamphog on small changes label Jun 14, 2026 — with PostHog
…elivering

From Codex review of this PR: sendPrompt can resolve without actually
delivering — "queued" (session busy) or "rate_limited" (usage-limit error
swallowed by sendLocalPrompt) — but we cleared the write-ahead record
unconditionally after it resolved, reintroducing a narrow loss window. Only
clear the record on confirmed delivery.

Generated-By: PostHog Code
Task-Id: 22d61751-8d18-4d56-a79e-4741045320c1

Copy link
Copy Markdown
Member Author

Note

🤖 Automated triage by PR Shepherd — not written by a human

Triaged the Codex + Greptile findings (they converged with qa-swarm on the duplicate save, already fixed):

  • Codex P2 — clear only on real delivery (ebc75e6): sendPrompt can resolve without delivering ("queued" when the session is busy, "rate_limited" when sendLocalPrompt swallows the usage-limit error). We now only remove() the write-ahead record when stopReason is neither — otherwise the prompt is kept. Good catch; this was a real (narrow) loss window I'd reintroduced.
  • Greptile P2 — duplicate save / drifting createdAt (bc252db): record built once, so Date.now() is evaluated once too.
  • Codex P1 — recover owed prompts on the resume-existing route (deferred): correct and important — routeLocalConnect returns resume-existing whenever a run id + log URL exist, so a run stranded by a timed-out first attempt routes to reconnectToLocalSession, which doesn't consult pendingPrompts. The client-side recovery in this PR only covers the create-new route. Full coverage of the resume path belongs with the server-side reconciler / startup-sweep follow-up already called out in the PR description (it needs "resumed run that never received its prompt" detection + tests). Tracking it there rather than widening this PR.
  • Greptile P2 / paul — integration test for the recovery branch (deferred): same as noted above; follow-up.

Net: two real fixes applied, the deeper resume-path recovery consciously deferred to the reconciler follow-up. Re-pushed at ebc75e6.

@stamphog stamphog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two unresolved bot inline comments raise concrete behavioral gaps that remain in the current diff: (1) on a rate-limit response, sendPrompt resolves without throwing but the prompt was never delivered — the unconditional pendingPrompts.remove() afterwards silently discards the write-ahead record the PR exists to preserve; (2) recovery logic only runs in createNewLocalSession, so if the app restarts after createTaskRun succeeds but before sendPrompt delivers (the task then has a latest_run.id), doConnect takes the reconnectToLocalSession path and the pending-prompt store is never consulted. Neither issue is a regression, but both undercut the stated safety guarantee and warrant human review before merge.

@stamphog stamphog Bot removed the Stamphog This will request an autostamp by stamphog on small changes label Jun 14, 2026

@stamphog stamphog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR adds write-ahead durability for prompts, but the recovery logic is only wired into createNewLocalSession. If the app restarts after createTaskRun succeeds but before sendPrompt delivers, the task already has a run ID and doConnect takes the resume-existing route into reconnectToLocalSession — which never checks pendingPrompts. The pending record sits in storage unrecovered. Bot reviewers raised this as P1 and it visibly remains unaddressed in the current diff. This is the crash scenario the PR most needs to handle.

…th tests

Resolves the remaining review findings on this PR:

- Codex P1: a run stranded before its prompt was delivered routes to the
  resume-existing path on the next connect, not createNewLocalSession. Deliver
  the owed write-ahead prompt there too, once the resumed log is confirmed to
  carry no prompt yet. Extracted a shared deliverWrittenAheadPrompt() helper so
  the deliver-then-clear-on-confirmed-delivery logic lives once.
- xp YAGNI: dropped the speculative list()/taskRunId surface (and the second
  save that only existed to populate taskRunId) — the reconciler that would
  consume them isn't in this PR. This also removes the duplicated save entirely.
- Tests: cover the recovery branch end-to-end against the real store — delivery
  on the create-new route, delivery on the resume-existing route, and no
  delivery when the resumed run already has a prompt event.

removePrompt's guard divergence from sessionConfigStore is intentional (the
guard is the safer side), left as-is.

Generated-By: PostHog Code
Task-Id: 22d61751-8d18-4d56-a79e-4741045320c1

Copy link
Copy Markdown
Member Author

Note

🤖 Automated triage by PR Shepherd — not written by a human

Closed out the remaining review threads in 9ec4d33:

  • Codex P1 — recover owed prompts on the resume-existing route: a run stranded before delivery routes to reconnectToLocalSession on the next connect, so it now delivers the owed write-ahead prompt there too — guarded on the resumed log carrying no prompt event yet. Pulled the deliver-then-clear-on-confirmed-delivery logic into a shared deliverWrittenAheadPrompt() so both routes share it.
  • xp YAGNI — listPrompts() / taskRunId: dropped the speculative surface (and the second save that only existed to populate taskRunId) since the reconciler that would consume them isn't in this PR. Bonus: this removes the duplicated save entirely rather than just deduping it.
  • recovery now under test: three e2e tests against the real store — delivery on create-new, delivery on resume-existing, and no delivery when the resumed run already has a prompt event.
  • removePrompt guard divergence (NIT): intentional — the guard is the safer side; left as-is.

All review threads are now actioned or resolved-by-design. Local: core 162 + ui sessions 224 green; core + ui typecheck clean.

@pauldambra pauldambra added the Stamphog This will request an autostamp by stamphog on small changes label Jun 14, 2026 — with PostHog

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All P1/P2 bot concerns are addressed in the current diff: recovery on the reconnect route is present, deliverWrittenAheadPrompt correctly retains the write-ahead record on rate-limit and queued outcomes, and the key recovery paths are covered by new integration tests. Architecture follows the platform-interface-in-core / implementation-in-ui / wired-at-composition-seam pattern correctly.

@pauldambra pauldambra requested a review from a team June 14, 2026 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stamphog This will request an autostamp by stamphog on small changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant