Conversation
Production HTTP-mode tenants stayed at /health.onboarding="pending"
forever and could re-DM the installer on every process restart. Both
behaviours trace to the in-memory firstDmSent flag in SlackHttpChannel
plus the fact that markOnboardingStarted only runs from the Socket Mode
flow (gated on a channels.yaml that is never baked in HTTP-mode rootfs).
Add an optional IntroductionLedger surface on SlackHttpChannel that
production wires against the SQLite onboarding_state table. The
receiver short-circuits the intro DM when the ledger reports sent and
stamps it only AFTER a successful Slack send, so:
- /health.onboarding flips to "complete" on the first successful DM
- a process restart never re-DMs the installer
- a transient Slack failure (rate limit, no ts returned, throw) leaves
the row clear so the next process start retries
The ledger is optional; tests and single-tenant dev fall back to the
existing in-memory flag with the prior semantics. The SlackHttpChannel
constructor stays compatible with all existing callers.
Tests: 7 new ledger-path cases on slack-http-receiver plus a
slack-channel-factory regression that pins the introductionLedger
forward, all green. Full suite 2316 pass / 0 fail.
Refs: real-vs-fake audit v2 finding H3
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a98d91cd69
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| metadataBaseUrl: process.env.METADATA_BASE_URL, | ||
| metrics: slackMetrics, | ||
| introductionLedger: { | ||
| isIntroSent: () => getOnboardingStatus(db).status === "complete", |
There was a problem hiding this comment.
Key intro-DM ledger by installer identity
The new ledger wiring treats any onboarding_state.status === "complete" as proof that the intro DM was already sent, but that state is global and not tied to identity.slack.installerUserId. After a Slack reinstall/identity rotation (which this codebase supports via metadata update + restart), SlackHttpChannel.connect() will skip sendIntroductionDm for the new installer because the old install already marked complete. This regresses first-contact behavior for reinstall flows and leaves the new installer without the onboarding DM.
Useful? React with 👍 / 👎.
Summary
Closes finding H3 from the 2026-05-01 real-vs-fake audit v2: two reachable HTTP-mode tenants (gilded-hearth, gogo-aim) sat at
/health.onboarding="pending"for 49-65h with no intro DM ever fired. Root cause is two layered bugs in this repo:firstDmSentflag is in-memory only. On a transient Slack send failure (rate limit, notsreturned, conversations.open failure), the flag staysfalse, but the next opportunity to retry is the next process restart. Production tenants with continuous uptime get exactly one shot.markOnboardingStarted/markOnboardingCompleteonly run from the Socket ModestartOnboardingpath insrc/index.ts, which is gated onchannelsConfig?.slack?.{owner_user_id,default_user_id,default_channel_id}. None of those exist in HTTP-mode production because nochannels.yamlis baked into the rootfs. So/health.onboardingstays at the default"pending"forever even when the HTTP-mode intro DM did go out.Reproduction
curl https://gilded-hearth.phantom.ghostwright.dev/health->"onboarding":"pending"after 65h uptime,evolution.generation: 0,scheduler.total: 0. Same forgogo-aim(49h, generation 1).channels.slack: true, memory healthy) but inert because no first user contact has ever happened.Root cause and fix
src/channels/slack-http-receiver.tsconnect()does the intro DM viasendIntroductionDmand gates on the in-memoryfirstDmSentinstance flag. Both gating and observability need to be persistent.This PR adds an optional
IntroductionLedgersurface onSlackHttpChannel:src/index.tswires it against the existing SQLiteonboarding_statetable:Behaviour the ledger pins:
/health.onboardingflips from"pending"to"complete"on the first successful intro DM, so audits and dashboards see the right state.tsreturned empty,conversations.openerror, exception thrown) leaves the row clear so the next process start retries naturally. The stamp happens only AFTERresult.sent === true.The ledger is optional; tests and single-tenant dev keep the in-memory
firstDmSentfallback with the same semantics it had before. The constructor stays back-compat with every existing caller.Architect note
This fix lands BEFORE the rootfs rebuild (audit M2) so the next rebuild ships a tenant whose first DM correctly stamps
/healthand is idempotent across restarts. It rides cleanly with phantom #116 (Phase 12 user research enrichment) which addsfirstboot_statefor the Socket Mode path; the two ledgers cover different transports and do not overlap.Live tenants stuck at
pendingtoday still need a one-shot operator action when bare-metal SSH is restored: stamponboarding_statetocompleteon the existing in-VM SQLite (or send the intro DM manually). Documented as an operator action in the H3 root-cause note (see local doc in phantom-cloud-deploy).Test plan
bun test src/channels/__tests__/slack-http-receiver.test.ts- 64 pass (7 new ledger-path cases)bun test src/channels/__tests__/slack-channel-factory.test.ts- 20 pass (1 new factory-forward case)bun test(full suite) - 2316 pass / 10 skip / 1 todo / 0 failbun run typecheck- cleanbun run lint- cleanNew regression tests pin: ledger short-circuits intro DM on a true read; ledger stamps only after a successful send; transient Slack errors keep the row clear; the stamp survives reconnect; a throwing
markIntroSentdoes not derailconnect(); the in-memory fallback still works without a ledger; the factory forwards the ledger to the channel.