Conversation
…ecretNamesMirror
R7 Phase 8a (multi-agent Slack flow, dated 2026-04-30) hardens the
existing Socket Mode receiver with Prometheus-grade observability and
locks the cross-repo allowlist mirror against phantomd's
AllowedSecretNames.
Metrics surface (new /metrics endpoint):
- phantom_slack_socket_state{state} (gauge, 7 series). Exactly one
series is 1.0 at any instant. Powers "tenant offline" alerts.
- phantom_slack_socket_reconnects_total (counter). Bolt's auto-
reconnect runs under the hood; this measures wobble rate.
- phantom_slack_socket_connection_seconds (histogram). Connection
lifetime from connect to disconnect; long-tail SLO.
- phantom_slack_event_dispatch_seconds{event_type} (histogram).
End-to-end Bolt middleware time. Slack ack deadline is 3s.
Implementation:
- New SocketModeReceiver hookup (vs. socketMode: true shorthand) to
reach receiver.client and subscribe lifecycle events. The State
enum is verbatim from @slack/socket-mode SocketModeClient.js:
connecting, authenticated, connected, reconnecting, disconnecting,
disconnected.
- The unrecoverable start() error path maps to a synthetic "error"
state on the gauge so the fleet view does not silently flatline
when the receiver never gets past handshake.
- Dispatch timing via app.use() global middleware: try/await
next() / finally observe(). Captures handler completion including
any async LLM/memory work the agent does in response.
- prom-client backed; the metrics module owns its own Registry so
Telegram/email can plug in via parallel registries without name
collisions on the global registry.
AllowedSecretNamesMirror cross-repo invariant:
- New frozen const exported from slack-channel-factory.ts listing
slack_bot_token + slack_app_token + slack_gateway_signing_secret.
- The phantomd side ships in the symmetric PR (phantomd #28,
TestIsAllowedName_AcceptsSlackAppToken pins the assertion).
- Drift on either side breaks tenant boot with HTTP 404 (the
metadata gateway maps ErrInvalidName to 404 to defeat name
enumeration). The factory test pins the Mirror against the
SECRET_RESPONSES test fixture so a future production-side rename
fails loud in CI.
Tests (29 net-new):
- 24 in slack-metrics.test.ts: each metric family, the noop emitter,
the dispatch middleware including throw-path, the lifecycle hooks
including disconnect-without-prior-connect.
- 5 in slack-channel-factory.test.ts: AllowedSecretNamesMirror
contains all three names, matches SECRET_RESPONSES, is frozen.
- All 2111 existing tests still green; full suite: 2111 pass / 0
fail / 0 errors.
Bun-mock cross-suite hygiene: every test that previously mocked
@slack/bolt now also exports SocketModeReceiver in its mock object.
Without this, a test file that ran AFTER one of those partial
mocks would import the real module and hit "Export named
'SocketModeReceiver' not found" at module load time. Caught by
running the full channels/ suite together.
Net diff: 11 files, ~390 LOC. Follows R7 §3.1's "150 net-new" floor;
the rest is test mirror coverage and documentation.
|
Ready for Codex review (feedback_review_protocol_codex_first). Failure-path + refresh-survival track to focus on (feedback_reviewer_failure_path_track):
Beyond Codex, the cross-repo P1 question: the test file mocks the SocketModeReceiver as a no-op object with |
|
@codex review Per feedback_reviewer_failure_path_track. Specific risks for review:
Architect: phantom-cloud-deploy/local/2026-04-30-research-r7-slack-multi-agent-impl.md (Phase 8a hardening pass, 4-place cross-repo invariant). |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Phase 8a of the multi-agent Slack flow (R7 spec dated 2026-04-30) hardens the existing Socket Mode receiver with Prometheus-grade observability and locks the cross-repo allowlist mirror against phantomd's
AllowedSecretNames.The Socket Mode receiver itself was already in place (R7 §3 confirmed the lift was hardening, not implementation). What this PR adds:
/metricsPrometheus endpoint, hooked via the underlyingSocketModeClientevent emitter.AllowedSecretNamesMirrorconstant pinning the three Slack-related secret names that traverse the metadata gateway, paired 1-1 with phantomd'sAllowedSecretNames.Metrics families
phantom_slack_socket_state{state}(gauge, 7 series). Exactly one series is 1.0 at any instant. Powers1 - max_over_time(...{state="connected"}[1m])"tenant offline" alerts.phantom_slack_socket_reconnects_total(counter). Bolt auto-reconnects under the hood; this measures wobble rate. Alert at sustained > 5/min.phantom_slack_socket_connection_seconds(histogram, custom buckets 1s -> 24h). Lifetime of a single Socket Mode connection. p99 should hold above 1 hour.phantom_slack_event_dispatch_seconds{event_type}(histogram, custom buckets 5ms -> 30s). End-to-end Bolt middleware time. Slack ack deadline is 3s.The
/metricsendpoint follows the existing/healthprecedent: unauthenticated, content-type from prom-client. Per-tenant isolation is at the Caddy URL boundary, not per-route.Implementation notes
new App({socketMode: true, ...})to explicitnew SocketModeReceiver({...})so we can reachreceiver.clientand subscribe lifecycle events. The State enum is verbatim from@slack/socket-mode/dist/src/SocketModeClient.js:connecting,authenticated,connected,reconnecting,disconnecting,disconnected. R7 §3.3 mentionedunable_to_socket_mode_start; the SDK does not actually emit that event (it surfaces asUnrecoverableSocketModeStartErrorthrown fromstart()), so we map the unrecoverable path to a syntheticerrorstate on the gauge inconnect().app.use()global middleware:try/await next() / finally observe(). Captures handler completion including async LLM/memory work the agent triggers in response to a Slack event.Cross-repo invariant
slack-channel-factory.tsexports a frozenAllowedSecretNamesMirror = ["slack_bot_token", "slack_app_token", "slack_gateway_signing_secret"]. The phantomd side lands in PR #28 (TestIsAllowedName_AcceptsSlackAppToken+_RejectsSlackAppTokenVariants). Drift on either side breaks tenant boot with HTTP 404 (the metadata gateway mapsErrInvalidNameto 404 to defeat name enumeration). The factory test pins the Mirror against theSECRET_RESPONSEStest fixture so a future production-side rename fails loud in CI on both repos.R7 §3 documents the four-way invariant; this PR closes phantom's two of the four.
Tests
29 net-new:
slack-metrics.test.ts: every metric family, the noop emitter, dispatch middleware including throw-path, lifecycle hooks including disconnect-without-prior-connect.slack-channel-factory.test.ts:AllowedSecretNamesMirrorcontains all three names, matchesSECRET_RESPONSES, is frozen.Bun-mock hygiene: every test file that mocks
@slack/boltnow also exportsSocketModeReceiverin its mock object. Without this, partial mocks leaked across test files and broke any subsequent test that importedslack.ts(caught by running the fullchannels/suite together).Verification
bun run typecheckcleanbun run lintcleanbun testgreen: 2111 pass / 0 fail / 0 errors / 4769 expects across 156 filesslack_app_tokenappears in 4 places (phantomd types.go + phantomd types_test.go + phantom slack-channel-factory.ts + phantom slack-channel-factory.test.ts).Sequencing
The phantomd half (PR #28) is the pre-req: without phantomd's allowlist update, a Phase 8b sidekick that calls
PutSecret(name="slack_app_token", ...)is rejected withErrInvalidName. Recommended merge order: phantomd #28 first (after Codex-clean), then this PR.Test plan
<slug>.phantom.ghostwright.dev/metrics, confirm the four metric families render with non-zero data.iptables -A OUTPUT -p tcp --dport 443 -j DROPfor 30s), verifyphantom_slack_socket_reconnects_totalincrements andphantom_slack_socket_connection_secondsobserves the lifetime.phantom_slack_event_dispatch_seconds{event_type="app_mention"}_countincrements.PutSecret(slack_app_token)against phantomd built with PR Replace Qdrant memory stack with ClawMem #28 (cross-repo smoke).Net diff: 11 files, ~390 LOC. Follows R7 §3.1's "150 net-new" floor; the rest is test mirror coverage and observability documentation in CLAUDE.md.