lisa/feat/notification-health-api#402
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds admin notification troubleshooting endpoints and OpenAPI schemas, a troubleshooting Service, a POST test-notification endpoint, and threads River source job IDs through digest and notification dispatch paths with helpers and tests. ChangesNotification Troubleshooting, Testing, and Source Job ID Tracking
Sequence Diagram(s)sequenceDiagram
participant Admin
participant HealthEndpoint as /admin/notifications/health
participant TroubleshootingService
participant RiverJobs as River Job Table
participant Database
Admin->>HealthEndpoint: GET /admin/notifications/health
HealthEndpoint->>TroubleshootingService: Health()
TroubleshootingService->>RiverJobs: Query job state counts & metrics
TroubleshootingService->>Database: Query subscriptions & system destinations
TroubleshootingService->>TroubleshootingService: Aggregate warnings
TroubleshootingService-->>HealthEndpoint: HealthResponse
HealthEndpoint-->>Admin: 200 JSON
sequenceDiagram
participant Worker
participant RequestBuilder as buildXxxRequest()
participant SourceJobHelper as requestWithSourceJobID()
participant Dispatcher as notifier.Dispatch()
Worker->>RequestBuilder: Build notification.Request
RequestBuilder-->>Worker: notification.Request
Worker->>SourceJobHelper: Wrap with riverJobID(job)
SourceJobHelper-->>Worker: Annotated request (Options.SourceJobID/SourceJobKind/CorrelationID)
Worker->>Dispatcher: Dispatch annotated request
Dispatcher-->>Worker: Dispatch result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/docs.go`:
- Around line 30780-30853: The JobDetail schema's timestamp properties
(attemptedAt, createdAt, finalizedAt, scheduledAt) are plain strings—update the
OpenAPI generation by adding "format: date-time" for these properties and also
annotate the Go struct fields (e.g., AttemptedAt, CreatedAt, FinalizedAt,
ScheduledAt) with swaggo format tags (format:"date-time") in their json tags so
the generated docs include the date-time format and enable client validation.
- Around line 30410-30427: The JSON schema for handler.testNotificationRequest
lacks validation on providerType, mode, and destinationTarget; update the
corresponding Go request struct (fields ProviderType, Mode, DestinationTarget)
to add binding/validation tags: set ProviderType to require only allowed values
(oneof=email slack), set Mode to limit allowed values (e.g., oneof=immediate
scheduled) and mark optional/required as appropriate, and add a format/email
binding or custom validator for DestinationTarget when providerType == "email";
ensure these changes propagate so the generated schema includes enum constraints
for providerType and mode and a format for destinationTarget.
- Around line 187-237: Update the OpenAPI parameter schemas for the query params
in docs.go: add an enum constraint for the "provider" parameter (values: email,
slack) and for the "state" parameter (use your River job state enum), add
numeric bounds for "limit" (minimum: 1, maximum: 200), and mark "since" with
format: date-time (RFC3339); ensure these changes are applied to the parameter
definitions named "provider", "state", "limit", and "since" so the generated
docs reflect the enum/format/maximum constraints.
- Around line 553-598: The OpenAPI fragment for
"/admin/notifications/{notificationName}/diagnostics" is missing a 404 response
for unknown notification names; update the Swagger docs to add a "404" response
referencing api.Error (matching other responses) and, in the handler that serves
notification diagnostics (look for the method that returns
handler.GenericDataResponse-notificationtroubleshooting_DiagnosticsResponse and
that reads the path param "notificationName"), add a runtime check that returns
a 404 when the name is unrecognized and add the Swagger annotation // `@Failure`
404 {object} api.Error "Not Found" to the handler so the generated docs include
the 404 response.
In `@docs/swagger.json`:
- Around line 182-212: The OpenAPI spec for the query parameters "queue" and
"state" defines them as array types but is missing maxItems constraints; add an
appropriate "maxItems" value for both parameters (matching the backend limit) to
the parameter objects for "queue" and "state" so clients and docs can validate
array length; if the project uses swagger/swag annotations, update the
corresponding handler annotations (the query param definitions that generate
"queue" and "state") rather than editing docs/swagger.json directly so the
generated spec includes the maxItems constraint.
In `@docs/swagger.yaml`:
- Around line 1057-1091: The description "Items from the list response" in the
generic wrapper definitions (e.g.,
handler.GenericDataResponse-handler_testNotificationResponse,
handler.GenericDataResponse-notificationtroubleshooting_DiagnosticsResponse,
handler.GenericDataResponse-notificationtroubleshooting_HealthResponse,
handler.GenericDataResponse-notificationtroubleshooting_JobDetail and
handler.GenericDataResponse-handler_threatIDResponse) is misleading; update the
description on each property's data to reflect that these wrappers contain a
single-object response (e.g., "Single object response" or "Wrapped single
<definition> response") so the docs correctly describe that data is a single
object from the referenced definition rather than a list item.
In `@internal/api/handler/notifications.go`:
- Around line 176-180: The code currently maps every error from
h.troubleshooting.Jobs(ctx.Request().Context(), query) to HTTP 400; change this
to distinguish client/validation errors from internal failures by inspecting the
returned error (e.g. using errors.Is / type assertions against your domain error
types such as validationErr or NotFoundErr) and only return
ctx.JSON(http.StatusBadRequest, api.NewError(err)) for known client errors,
otherwise log the error and return ctx.JSON(http.StatusInternalServerError,
api.NewError(err)); keep the h.sugar.Warnw call but ensure internal errors use
StatusInternalServerError when calling ctx.JSON.
In `@internal/service/notificationtroubleshooting/service.go`:
- Around line 589-600: The DB query errors for the three calls using
s.db.WithContext(...) (the Count into summary.Completed24h, Count into
summary.Discarded24h, and Select("min(scheduled_at)").Scan(&oldest) that sets
summary.OldestAvailableAt) are being ignored; change each to capture the
returned error, handle or propagate it (e.g., return fmt.Errorf("computing queue
health: %w", err) from the enclosing function) instead of discarding with “_ =
…”, so failures are not swallowed and the caller receives/logs the real DB
error.
- Line 1337: The call to latestJobForKind in the schedule diagnostics path
currently ignores its returned error (last, ok, _ := s.latestJobForKind(ctx,
def.JobKind)); modify this to capture the error (last, ok, err := ...) and
handle it: if err is non-nil, mark the diagnostic as failed/unknown and include
the error details in the diagnostic message or log instead of proceeding as if
the lookup succeeded. Update any surrounding logic that uses last/ok to
early-return or set an error state when err != nil so DB read failures no longer
produce misleading success diagnostics.
In `@internal/service/worker/notification_definition_helpers_test.go`:
- Around line 27-36: Add a new unit test (or extend
TestRequestWithSourceJobIDSetsDispatchMetadata) to cover the non-positive jobID
branch of requestWithSourceJobID: call requestWithSourceJobID with jobID == 0
and a negative jobID (e.g., -1) using the same
buildWorkflowTaskDigestNotificationRequest fixture, then assert that
request.Options.SourceJobID is unchanged/empty for these cases; ensure the test
references requestWithSourceJobID and verifies the no-op behavior when jobID is
non-positive.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: b5697773-8abd-4c29-a3e3-c68c377e8ee2
📒 Files selected for processing (24)
cmd/run.godocs/docs.godocs/swagger.jsondocs/swagger.yamlinternal/api/handler/api.gointernal/api/handler/notifications.gointernal/api/handler/notifications_integration_test.gointernal/service/digest/notifications.gointernal/service/digest/service.gointernal/service/notificationtroubleshooting/service.gointernal/service/notificationtroubleshooting/service_test.gointernal/service/worker/due_soon_checker.gointernal/service/worker/jobs.gointernal/service/worker/notification_definition_helpers.gointernal/service/worker/notification_definition_helpers_test.gointernal/service/worker/poam_deadline_reminder_worker.gointernal/service/worker/poam_digest_worker.gointernal/service/worker/poam_milestone_overdue_worker.gointernal/service/worker/poam_overdue_transition_worker.gointernal/service/worker/risk_digest_worker.gointernal/service/worker/risk_workers.gointernal/service/worker/workflow_execution_failed_worker.gointernal/service/worker/workflow_notification_jobs.gointernal/service/worker/workflow_task_digest_worker.go
There was a problem hiding this comment.
Pull request overview
Adds an admin-focused notification troubleshooting API surface (health, job listing/detail, diagnostics, and test enqueue) and improves traceability by propagating the upstream River source job ID into downstream notification/provider jobs.
Changes:
- Propagate River
job.IDintonotification.DispatchOptions.SourceJobIDacross workflow/risk/POAM/digest dispatch paths. - Add
internal/service/notificationtroubleshootingto compute health/diagnostics and query recent River jobs with sanitized metadata. - Expose new admin routes under
/admin/notifications/*(and update Swagger) including a fixed test-notification enqueue endpoint.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/service/worker/workflow_task_digest_worker.go | Wraps digest dispatch requests with source River job ID. |
| internal/service/worker/workflow_notification_jobs.go | Propagates source job ID through workflow task assignment/due-soon dispatch paths. |
| internal/service/worker/workflow_execution_failed_worker.go | Adds source job ID to execution-failed notifications (including system fanout). |
| internal/service/worker/risk_workers.go | Adds source job ID to risk reminder notification requests. |
| internal/service/worker/risk_digest_worker.go | Adds source job ID to risk digest notification requests. |
| internal/service/worker/poam_overdue_transition_worker.go | Adds source job ID to POAM overdue notification requests. |
| internal/service/worker/poam_milestone_overdue_worker.go | Adds source job ID to POAM milestone overdue reminder requests. |
| internal/service/worker/poam_digest_worker.go | Adds source job ID to POAM digest notification requests. |
| internal/service/worker/poam_deadline_reminder_worker.go | Adds source job ID to POAM deadline reminder requests. |
| internal/service/worker/notification_definition_helpers.go | Introduces requestWithSourceJobID helper to set dispatch metadata. |
| internal/service/worker/notification_definition_helpers_test.go | Adds unit test for requestWithSourceJobID. |
| internal/service/worker/jobs.go | Adds source-aware digest dispatch interface and uses it in SendGlobalDigestWorker. |
| internal/service/worker/due_soon_checker.go | Adds source job ID to due-soon checker dispatch requests. |
| internal/service/notificationtroubleshooting/service.go | New troubleshooting service: health, jobs list/detail, diagnostics, sanitization, pagination. |
| internal/service/notificationtroubleshooting/service_test.go | Unit tests for sanitization, staleness thresholds, query validation, cron parsing. |
| internal/service/digest/service.go | Adds SendGlobalDigestWithSourceJobID and funnels through shared implementation. |
| internal/service/digest/notifications.go | Adds source job ID into digest dispatch options. |
| internal/api/handler/notifications.go | Adds admin troubleshooting endpoints + test enqueue; wires troubleshooting service/enqueuer. |
| internal/api/handler/notifications_integration_test.go | Adds integration coverage for troubleshooting auth + /health operational state. |
| internal/api/handler/api.go | Wires NotificationWorkerEnqueuer into notifications handler. |
| cmd/run.go | Populates NotificationWorkerEnqueuer in handler services wiring. |
| docs/swagger.yaml | Documents new admin troubleshooting and test endpoints + new schemas. |
| docs/swagger.json | Generated Swagger JSON updates for new endpoints/schemas. |
| docs/docs.go | Generated embedded Swagger template updates. |
Comments suppressed due to low confidence (1)
internal/api/handler/notifications.go:126
- New admin troubleshooting endpoints are added, but the integration tests currently only cover unauthorized access and the /health success path. Consider adding success-path integration tests for /admin/notifications/jobs (filters + pagination), /admin/notifications/jobs/:id (200/404), /admin/notifications/{name}/diagnostics (200/400), and /admin/notifications/test (202/503) to ensure these routes remain stable.
func (h *NotificationsHandler) Register(api *echo.Group) {
api.GET("/health", h.GetTroubleshootingHealth)
api.GET("/jobs", h.ListTroubleshootingJobs)
api.GET("/jobs/:id", h.GetTroubleshootingJob)
api.POST("/test", h.SendTestNotification)
api.GET("/:notificationName/diagnostics", h.GetNotificationDiagnostics)
api.GET("", h.ListSystemNotifications)
api.GET("/providers", h.ListNotificationProviders)
api.POST("/:notificationName/destinations", h.CreateSystemNotificationDestination)
api.DELETE("/:notificationName/destinations", h.DeleteSystemNotificationDestination)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
gusfcarvalho
left a comment
There was a problem hiding this comment.
Reviewed the notification troubleshooting API end-to-end against a live environment. Found 5 bugs that need to be fixed before merging.
Summary
| # | Severity | Description |
|---|---|---|
| 1 | 🔴 High | supportsSystemDestination is too broad — emits false missing_destination warnings for user-fanout-only types |
| 2 | 🟡 Medium | DiagnosticsResponse is missing subscriberCounts and configuredDestinations fields |
| 3 | 🟡 Medium | Test-send returns correlationId instead of jobIds — deviates from spec |
| 4 | 🟠 Low | POAM diagnostics subscriber check has wrong code — copy-paste bug with risk gate |
| 5 | 🟠 Low | Workflow subscriber checks share an identical label — two distinct checks both say "Subscribed users" |
| // @Param notificationName path string true "Notification name or family" | ||
| // @Success 200 {object} handler.GenericDataResponse[notificationtroubleshooting.DiagnosticsResponse] | ||
| // @Failure 400 {object} api.Error | ||
| // @Failure 401 {object} api.Error | ||
| // @Failure 404 {object} api.Error "Not Found" | ||
| // @Failure 500 {object} api.Error | ||
| // @Security OAuth2Password |
| var notificationJobKinds = []string{ | ||
| worker.JobTypeSendEmail, | ||
| worker.JobTypeSendEmailFrom, | ||
| worker.JobTypeSendSlackChannel, | ||
| worker.JobTypeSendSlackDM, | ||
| worker.JobTypeSendGlobalDigest, | ||
| worker.JobTypeWorkflowTaskAssigned, | ||
| worker.JobTypeWorkflowTaskDueSoon, | ||
| worker.JobTypeWorkflowTaskDigest, | ||
| worker.JobTypeWorkflowExecutionFailed, | ||
| worker.JobTypeWorkflowDueSoonChecker, | ||
| "workflow_task_digest_checker", | ||
| "schedule_workflows", | ||
| worker.JobTypeRiskReviewDeadlineReminderScanner, |
| return []scheduleDefinition{ | ||
| {"EVIDENCE_DIGEST", worker.JobTypeSendGlobalDigest, "", "digest", digestEnabled, digestSchedule}, | ||
| {"WORKFLOW_DUE_SOON", worker.JobTypeWorkflowDueSoonChecker, worker.JobTypeWorkflowTaskDueSoon, "email", workflowCfg.DueSoonEnabled, workflowCfg.DueSoonSchedule}, | ||
| {"WORKFLOW_TASK_DIGEST", "workflow_task_digest_checker", worker.JobTypeWorkflowTaskDigest, "digest", workflowCfg.TaskDigestEnabled, workflowCfg.TaskDigestSchedule}, |
| case "evidence": | ||
| return []string{worker.JobTypeSendGlobalDigest}, []string{worker.JobTypeSendGlobalDigest} | ||
| case "workflow": | ||
| return []string{worker.JobTypeWorkflowDueSoonChecker, "workflow_task_digest_checker"}, []string{ | ||
| worker.JobTypeWorkflowTaskAssigned, | ||
| worker.JobTypeWorkflowTaskDueSoon, | ||
| worker.JobTypeWorkflowTaskDigest, | ||
| worker.JobTypeWorkflowExecutionFailed, | ||
| } |
automated implementation by lisa.
Summary by CodeRabbit
New Features
Documentation
Refactor
Tests