Skip to content

feat: add recovery middleware to handle panic gracefully#1537

Open
hhc7 wants to merge 1 commit into
apache:mainfrom
hhc7:feat/add-recovery-middleware
Open

feat: add recovery middleware to handle panic gracefully#1537
hhc7 wants to merge 1 commit into
apache:mainfrom
hhc7:feat/add-recovery-middleware

Conversation

@hhc7
Copy link
Copy Markdown

@hhc7 hhc7 commented May 28, 2026

Fixes #1536

Proposed Changes

  • Add middleware.Recovery() that catches panics from any subsequent middleware or handler, logs the panic message with full stack trace via log.Errorf + debug.Stack(), and returns a unified 500 JSON response (reason: base.unknown) via handler.NewRespBody + TrMsg — consistent with how other errors are handled in internal/base/handler/handler.go.
  • Mount Recovery() as the first middleware in internal/base/server/http.go so it covers all subsequent middleware (brotli, accept-language, short-id, auth, etc.) and handlers.
  • Add unit tests in recovery_test.go covering both the panic path (verifies 500 + base.unknown reason) and the no-panic path (verifies normal requests pass through unaffected).

@LinkinStars LinkinStars self-requested a review May 30, 2026 06:00
@LinkinStars
Copy link
Copy Markdown
Member

I agree that adding a recovery layer makes sense. gin.New() does not include one by default, so today a panic can indeed end up as a dropped connection instead of a controlled response.

That said, I do not think this implementation is quite correct yet, because it assumes every panic can be turned into the same JSON 500 response.

There are at least two cases where that is not very elegant:

  1. SSE / streaming endpoints
    For /chat/completions, the handler starts an SSE response very early: it sets Content-Type: text/event-stream, sends status 200, flushes headers, and may already write streamed chunks before later logic runs. If a panic happens after that point,
    recovery cannot reliably convert the response into a JSON 500. In practice it may just append JSON into an already-started event stream, which produces an invalid/mixed response rather than a clean error.

  2. HTML routes
    This server also handles UI / HTML routes, not only JSON APIs. With a global recovery mounted at the engine level, a panic in an HTML route would now also return the API-style JSON body. That is probably not the intended behavior for page requests,
    and it is not really “consistent” for the whole server, only for JSON endpoints.

So I think the underlying concern is valid, but the current fix is a bit too broad. A more robust approach would be to either:

  • scope JSON recovery only to API routes, or
  • make recovery response-aware, for example checking whether headers/body were already written and whether the request is an API, SSE, or HTML route.

In short: the direction is reasonable, but handling all panics with the same AbortWithStatusJSON(500, ...) is not especially elegant for SSE and HTML paths.

@LinkinStars LinkinStars self-assigned this May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The Gin engine is initialized without a recovery middleware, causing connection drops on panic

2 participants