Skip to content

feat(slack): add Slack integration with @mention-based task submission#42

Open
isadeks wants to merge 1 commit intoaws-samples:mainfrom
isadeks:main
Open

feat(slack): add Slack integration with @mention-based task submission#42
isadeks wants to merge 1 commit intoaws-samples:mainfrom
isadeks:main

Conversation

@isadeks
Copy link
Copy Markdown

@isadeks isadeks commented Apr 20, 2026

Summary

  • Adds full Slack integration: submit tasks via @Shoof mentions or DMs, receive threaded notifications with emoji reaction progress
  • OAuth multi-workspace install flow with bgagent slack setup CLI wizard for zero-friction onboarding
  • Account linking via /bgagent link slash command
  • Cancel button with instant feedback, session message cleanup on completion

What's included

CDK Constructs: SlackIntegration, SlackInstallationTable, SlackUserMappingTable

Lambda Handlers: slack-events, slack-commands, slack-command-processor, slack-interactions, slack-oauth-callback, slack-link, slack-notify

Shared Utilities: slack-verify (HMAC signature verification), slack-blocks (Block Kit renderer)

CLI: bgagent slack setup (interactive wizard), bgagent slack link <code> (account linking)

Docs: SLACK_SETUP_GUIDE.md with screenshots, Developer Guide updated

UX Flow

  1. User mentions @Shoof fix the bug in org/repo#42
  2. 👀 reaction appears instantly
  3. Task created → ⏳ reaction, threaded notification with Cancel button
  4. Agent completes → ✅ reaction, "Task completed" with View PR button, session message deleted

Other changes (non-Slack)

  • Blueprint repo via context/env var: cdk deploy -c blueprintRepo=org/repo or BLUEPRINT_REPO=org/repo — no longer requires editing agent.ts directly. Developer Guide updated with the new method.
  • X-Ray tracing commented out: Requires account-level UpdateTraceSegmentDestination setup that blocks first-time deploys. Disabled to unblock new users; re-enable once the prerequisite is documented more clearly.

Test plan

  • bgagent slack setup — full flow from scratch (deploy, create app, credentials, install, link)
  • @Shoof mention in channel — happy path with reactions and threaded notifications
  • Cancel button mid-task — instant feedback and cleanup
  • Error cases: no repo, repo not onboarded, unlinked user
  • DM to Shoof — private task submission
  • /bgagent help and /bgagent link
  • Multi-workspace: second workspace installs via OAuth URL
  • -c blueprintRepo=org/repo deploys with custom repo onboarded

@isadeks isadeks force-pushed the main branch 3 times, most recently from aef74c1 to 833c770 Compare April 20, 2026 17:03
Adds full Slack integration enabling users to submit coding tasks by
mentioning @shoof in any channel or DM, with real-time emoji reactions
and threaded notifications showing task progress.

Key features:
- @mention task submission with natural language repo extraction
- Emoji reaction progression: 👀 → ⌛ → ✅
- Threaded notifications (created, started, completed/failed)
- Cancel button with instant feedback
- DM support for private task submissions
- OAuth multi-workspace install flow
- `bgagent slack setup` CLI wizard for zero-friction onboarding
- Account linking via /bgagent link

New files:
- CDK constructs: SlackIntegration, SlackInstallationTable, SlackUserMappingTable
- Lambda handlers: events, commands, command-processor, interactions,
  oauth-callback, link, notify
- Shared utilities: slack-verify, slack-blocks
- CLI: slack.ts (setup, link commands)
- Docs: SLACK_SETUP_GUIDE.md with screenshots
@isadeks isadeks marked this pull request as ready for review April 20, 2026 17:31
@isadeks isadeks requested a review from a team as a code owner April 20, 2026 17:31
@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 23, 2026

Thank you @isadeks ! Great addition

From a first pass review:

What's Done Well

  • HMAC verification (slack-verify.ts): timingSafeEqual, replay protection, proper error logging
  • Secret upsert lifecycle (slack-oauth-callback.ts): correctly handles create/update/restore with specific exception matching
  • CDK construct design: clean separation, proper cdk-nag suppressions, least-privilege IAM (with minor exceptions noted)
  • UX design: reaction progression, threaded notifications, DM support, cancel confirmation dialog, message cleanup on completion
  • Blueprint repo configurability: clean env ?? context ?? default pattern
  • Shared utility test quality: slack-verify.test.ts covers valid/invalid signatures, stale timestamps, length mismatches; slack-blocks.test.ts covers all event types

The inbound path (task creation) is well-architected. The existing createTaskCore() function is a genuinely clean seam — it accepts a TaskCreationContext with a channelSource tag and
an opaque channelMetadata: Record<string, string> bag, and never inspects their contents. The Slack handler correctly follows the same pattern as the API and webhook handlers:
authenticate in its own way, extract a userId, build metadata, delegate to the shared core. Adding 'slack' to the channelSource union is a one-line change that doesn't contaminate shared code.

My architectural problem: This PR hardwires Slack as the only notification consumer, bypassing the opportunity to build a proper event fan-out layer. Today on main, the TaskEventsTable is a pull-only audit log — no streams, no push notifications. When this PR adds stream: dynamodb.StreamViewType.NEW_IMAGE to the events table, it's
a meaningful infrastructure change. But then it wires the Slack notify Lambda directly as the sole stream consumer, with Slack-specific filtering baked in. This creates two problems for adding additional channels later:

  1. Each new channel needs its own DynamoDB Stream consumer on the same table. DynamoDB Streams supports a maximum of 2 consumers per table (without Kinesis). By the third adapter, you
    hit a hard AWS limit.
  2. The filtering logic (channel_source === 'slack') is duplicated inside each consumer. Every adapter handler would independently load the task record, check the channel source, and decide whether to act. This is the "notification routing" concern scattered across N handlers rather than centralized. I think there is an opportunity for a notification dispatcher pattern (maybe using eventbridge ?). This gives you:
  • Single stream consumer — no DynamoDB Streams consumer limit concern
  • Centralized routing — the dispatcher loads the task once, resolves the channel, and routes
  • Decoupled adapters — each notifier receives a pre-filtered, normalized event; it only cares about rendering and posting
  • Easy addition — a new channel is: one Lambda + one EventBridge rule, no changes to the dispatcher

The current slack-notify.ts (333 lines) mixes three concerns that should be separated: Event routing, Message rendering, and Delivery + state management. If you separate the routing concern, each future adapter only needs to implement rendering + delivery. The slack-notify.ts handler currently does all three, which means the next adapter would duplicate the routing and deduplication logic.

ALso, the opaque bag pattern (channel_metadata: Record<string, string>) works because each adapter writes and reads its own keys. But it means the task record accumulates state for the active channel's notification lifecycle (message timestamps, thread context). This couples the task record to the notification system. A cleaner approach would store notification state in a separate table or in the channel adapter's own state, not in the task's channel_metadata.

I don't want to block the PR for the notification architecture alone since this is the first channel we are adding, so premature abstractions is a real risk. What I would like to see in the current PR:

  1. Add a // TODO: Extract to notification dispatcher when adding a second channel comment on the DynamoDB Stream wiring in slack-integration.ts, making the architectural intent explicit.
  2. Move the deduplication logic out of slack-notify.ts into a shared utility now. The slack_notified_terminal conditional update pattern is channel-agnostic and will be needed by
    every adapter. Rename it to something generic like channel_notified_terminal.
  3. Don't store notification lifecycle state in channel_metadata — or at least document that this is a conscious tradeoff for V1. The slack_session_msg_ts, slack_created_msg_ts fields are Slack-specific delivery state, not channel context. A separate "notification state" map or table would be cleaner.
  4. Extract the channelSource type into a shared ChannelSource type in types.ts used by both TaskCreationContext and TaskRecord. This is a 3-line change that makes adding channels a type-safe, grep-able operation.
  5. Document the adapter contract — even informally. Future engineers adding a new adapter need to know: "implement inbound auth + call createTaskCore(), implement outbound rendering + delivery, wire a CDK construct." The PR adds 4,100 lines but no ADR or design doc explaining the pattern for the next person.

Some captured issues in the current code, please revise once the recommendations above are implemented to see if they are still relevant:

    1. Dead code: handleStatus and handleCancel never called

    File: cdk/src/handlers/slack-command-processor.ts

    The functions handleStatus (~40 lines) and handleCancel (~55 lines) are fully implemented but never wired into the switch(subcommand) in the handler function. Users running /bgagent
    status or /bgagent cancel get the default fallback message instead. Either add the case 'status': and case 'cancel': branches, or remove the dead code.

    1. url_verification challenge bypasses signature verification

    File: cdk/src/handlers/slack-events.ts

    The handler returns the challenge before verifying the Slack signing secret. Slack does sign url_verification requests, so verification should be attempted first. Only skip it if the
    signing secret is not yet populated (initial setup). As-is, an unauthenticated caller can confirm the endpoint is a Slack event handler.

    1. Retry dedup drops ALL retried events unconditionally

    File: cdk/src/handlers/slack-events.ts

    Any request with X-Slack-Retry-Num is immediately acked with 200 — no event-type discrimination. If the first delivery of app_uninstalled or tokens_revoked fails transiently, the
    retry is silently dropped and the revocation is permanently lost (bot token stays active). At minimum, allow security-critical event retries through — they are idempotent.

    1. DynamoDB Streams handler never fails the batch

    File: cdk/src/handlers/slack-notify.ts

    Every error is caught per-record and logged as warn, but never thrown. The construct configures retryAttempts: 3 and bisectBatchOnError: true, but these are useless because Lambda
    never sees a failure. Infrastructure errors (DynamoDB throttling, Secrets Manager outage) should be rethrown so Lambda retries the batch; only Slack API errors should be swallowed.

    1. OAuth callback missing state parameter (CSRF)

    File: cdk/src/handlers/slack-oauth-callback.ts

    The OAuth flow lacks a state parameter, the standard CSRF countermeasure (RFC 6749 §10.12). Practical risk is mitigated since client_secret is required for code exchange, but this is
    a well-known OAuth best practice. At minimum, document the tradeoff explicitly.

    1. revokeInstallation deletes secret even after DynamoDB failure

    File: cdk/src/handlers/slack-events.ts

    If the DynamoDB update to mark the installation as revoked fails, execution falls through to delete the bot token secret anyway. Result: installation record still shows status:
    'active' but the token is gone — every subsequent Slack API call fails with "secret not found."

    1. Six .catch(() => {}) calls with zero logging

    Files: slack-events.ts, slack-command-processor.ts

    Five+ Slack API calls (reactions.add/remove, chat.postMessage) use .catch(() => {}). Network failures, revoked tokens, rate limiting — all silently disappear. The slack-notify.ts
    handlers already log properly in equivalent helpers. Replace with .catch((err) => logger.warn(...)).

    1. handleAppMention Lambda invocation failure gives no user feedback

    File: cdk/src/handlers/slack-events.ts

    When the async Lambda invocation fails, the error is logged but the user sees the 👀 reaction and then... nothing. No reaction swap to ❌, no error thread. The "no repo found"
    path in the same function already handles this — apply the same pattern.

    1. Secret cache causes total outage after rotation

    File: cdk/src/handlers/shared/slack-verify.ts

    The 5-minute in-memory cache has no invalidation. After the signing secret is rotated, all warm Lambda instances reject every request for up to 5 minutes. Implement verify-then-retry:
    if verification fails, clear the cache, re-fetch, and verify once more.

    1. checkChannelAccess fails open on all errors

    File: cdk/src/handlers/slack-command-processor.ts

    Returns { ok: true } when bot token is missing, Slack API returns unknown errors, or fetch throws. Tasks are created for channels where notifications can never be delivered, with no
    user warning.

    1. CLI getStackOutput empty catch hides auth/permission errors

    File: cli/src/commands/slack.ts

    Catches all CloudFormation errors and returns null. An expired session or missing IAM permissions shows "Stack has not been deployed yet" instead of the real error. Only catch
    ValidationError (stack not found); surface other errors.

    1. Coverage
      The shared utilities (slack-verify, slack-blocks) and CDK constructs have solid tests. However, all 7 Lambda handlers have zero unit tests — a significant departure from the existing
      codebase pattern where every handler has a dedicated test file.
    1. MentionPayload extends SlackCommandPayload creates misleading type contract

    File: cdk/src/handlers/slack-command-processor.ts

    Mentions inherit response_url: string (required) but always set it to ''. Five other fields are similarly phantom. The handler branches on source === 'mention' at runtime, but the types don't express this. Refactor into a discriminated union:

    type CommandProcessorEvent = SlashCommandPayload | MentionPayload;
    // where each variant only has the fields it actually uses

    1. SlackBlock type requires as unknown as SlackBlock cast

    File: cdk/src/handlers/shared/slack-blocks.ts

    The actions() helper casts through unknown because SlackBlock doesn't model action blocks correctly. Button helpers return Record<string, unknown>. A discriminated union of
    SectionBlock | ActionsBlock with typed button elements would eliminate the cast and catch typos at compile time.

    1. channelSource union not used in TaskRecord

    File: cdk/src/handlers/shared/create-task-core.ts / types.ts

    TaskCreationContext.channelSource is 'api' | 'webhook' | 'slack' but TaskRecord.channel_source is plain string. Extract type ChannelSource = 'api' | 'webhook' | 'slack' and use it in
    both places.

    1. Duplicate utilities in slack-blocks.ts,slack-command-processor.ts -> truncate() and formatDuration() are identical in both files. Export from shared.
    1. Dedup ordering in slack-notify.ts. Terminal dedup write runs before checking channel_source === 'slack', writing to non-Slack task records unnecessarily.
    1. Excess IAM grant. slack-integration.ts. slackCommandsFn gets readSlackSecretsPolicy (bot token access) but only needs the signing secret, already granted separately.
    1. Inconsistent Number() in slack-blocks.ts. taskTimedOutMessage calls formatDuration(task.duration_s) without Number(), unlike taskCompletedMessage.
    1. safeJsonParse empty in slack-notify.ts. Parse errors are swallowed with no logging — users see "Unknown error" instead of the actual failure reason.

Thank you ! Also, @scoropeza since you are working on the related task, could you please have a look to see if this is aligned with your work ? Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants