Skip to content

fix(rivetkit): add lifecycle error retry and gateway HTTP routing#4666

Closed
NathanFlurry wants to merge 1 commit intosleep-cleanup/18-remove-legacy-timeoutsfrom
04-14-fix_rivetkit_add_lifecycle_error_retry_and_gateway_http_routing
Closed

fix(rivetkit): add lifecycle error retry and gateway HTTP routing#4666
NathanFlurry wants to merge 1 commit intosleep-cleanup/18-remove-legacy-timeoutsfrom
04-14-fix_rivetkit_add_lifecycle_error_retry_and_gateway_http_routing

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 15, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 15, 2026

PR Review: fix(rivetkit): add lifecycle error retry and gateway HTTP routing

Overview

This PR adds two distinct improvements:

  1. Lifecycle error retry logic — a new lifecycle-errors.ts module classifies errors at actor lifecycle boundaries (restarting, stopping) and wraps action calls with an automatic retry loop. Connection open failures that are non-retryable now abort pRetry immediately instead of exhausting retries.
  2. Gateway HTTP routing fix — test URLs updated to include a /request/ path prefix, along with a new test asserting that non-/request paths are not routed to onRequest.
  3. WebSocket listener ordering fix — close/error listeners are now attached before onOpen to avoid losing close events from actors that immediately reject connections.

Specific Issues

lifecycle-errors.ts: continue vs early return in classifyLifecycleBoundaryError

When classifyActorError returns undefined for the "database accessed after actor stopped" case, the outer loop calls continue, which proceeds to check the next error in the cause chain. If a retryable error exists deeper in that chain, it would be incorrectly classified as retryable — even though the definitive signal was "do not retry." The intent here appears to be suppression, so this branch should return undefined from the outer function, not continue.

actor-conn.ts: misleading log level for non-retryable errors

In the .catch((err) => handler, non-retryable connection open errors (e.g., auth failure) fall into the same branch as intentional aborts and log at info level with the message "connection retry aborted". These should log at warn/error with a distinct message so they are diagnosable in production.

retryOnLifecycleBoundary: no jitter in exponential backoff

The delay calculation is Math.min(initialDelayMs * 2 ** attempt, maxDelayMs). When many clients hit a restarting actor simultaneously, this produces a synchronized thundering herd on restart. Adding random jitter is the standard fix.

actor-handle.ts: fragile string comparison for the destroy guard

if (opts.name === "destroy") skips the retry wrapper. This works today but would silently break if the destroy action is ever renamed or aliased. A typed constant or an explicit option on the call site would be more robust.

Missing unit tests for lifecycle-errors.ts

The classification logic covers multiple legacy string patterns, a depth-limited error chain walk, and two distinct retry kinds. This is the highest-risk new code in the PR — a unit test file exercising each classification case (actorRestarting, legacy internal_error with "Actor is stopping", actor.stopped, transport string matches, chain walking) would catch regressions without needing full integration tests.


Minor Notes

  • The actor.stopped case in classifyActorError is tagged legacy: true but lacks a TODO(RVT-XXXX) tracking issue unlike the other legacy cases.
  • maxAttempts = 0 would cause throw lastError to throw undefined. While unlikely in practice, guarding or initializing lastError to a sentinel would be safer.
  • The WebSocket listener reordering in actor-driver.ts is a clean, correct fix for the race condition.
  • The gateway routing test split (into "does route" / "does not route" subtests) is a clear improvement in test coverage.

Summary

The design intent is sound and the error classification approach is well-structured. The main concerns before merge are: the continue/return suppression logic for the "database accessed" guard in classifyLifecycleBoundaryError, the misleading log level for non-retryable connection open errors, and the absence of unit tests for the classification logic given the number of edge-case string patterns being matched.

@github-actions
Copy link
Copy Markdown
Contributor

Preview packages published to npm

Install with:

npm install rivetkit@pr-4666

All packages published as 0.0.0-pr.4666.c672e53 with tag pr-4666.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-c672e53
docker pull rivetdev/engine:full-c672e53
Individual packages
npm install rivetkit@pr-4666
npm install @rivetkit/react@pr-4666
npm install @rivetkit/rivetkit-native@pr-4666
npm install @rivetkit/workflow-engine@pr-4666

@NathanFlurry NathanFlurry force-pushed the 04-14-fix_rivetkit_add_lifecycle_error_retry_and_gateway_http_routing branch from ef47c7e to cde8213 Compare April 27, 2026 07:30
@NathanFlurry NathanFlurry force-pushed the 04-24-replay/core-docs-and-guidance branch from e7dee2a to 9c8c5fb Compare April 27, 2026 07:31
@NathanFlurry NathanFlurry force-pushed the 04-14-fix_rivetkit_add_lifecycle_error_retry_and_gateway_http_routing branch from cde8213 to 8cf4223 Compare April 27, 2026 07:53
@NathanFlurry NathanFlurry marked this pull request as ready for review April 27, 2026 08:06
@NathanFlurry NathanFlurry changed the base branch from 04-24-replay/core-docs-and-guidance to graphite-base/4666 April 27, 2026 08:31
@NathanFlurry NathanFlurry force-pushed the 04-14-fix_rivetkit_add_lifecycle_error_retry_and_gateway_http_routing branch from 8cf4223 to 6987ecc Compare April 27, 2026 08:31
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4666 to sleep-cleanup/18-remove-legacy-timeouts April 27, 2026 08:32
@NathanFlurry NathanFlurry force-pushed the 04-14-fix_rivetkit_add_lifecycle_error_retry_and_gateway_http_routing branch from 6987ecc to d17056c Compare April 27, 2026 17:35
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/18-remove-legacy-timeouts branch from 0eb8697 to f5ff8fb Compare April 27, 2026 19:06
@NathanFlurry NathanFlurry force-pushed the 04-14-fix_rivetkit_add_lifecycle_error_retry_and_gateway_http_routing branch from d17056c to df32e78 Compare April 27, 2026 19:06
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/18-remove-legacy-timeouts branch from f5ff8fb to 65bf5d3 Compare April 27, 2026 19:30
@NathanFlurry NathanFlurry force-pushed the 04-14-fix_rivetkit_add_lifecycle_error_retry_and_gateway_http_routing branch from df32e78 to 2a5bc46 Compare April 27, 2026 19:30
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/18-remove-legacy-timeouts branch from 65bf5d3 to 645a697 Compare April 27, 2026 19:40
@NathanFlurry NathanFlurry force-pushed the 04-14-fix_rivetkit_add_lifecycle_error_retry_and_gateway_http_routing branch from 2a5bc46 to 0d9c403 Compare April 27, 2026 19:40
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/18-remove-legacy-timeouts branch from 645a697 to 731cecf Compare April 27, 2026 20:48
@NathanFlurry NathanFlurry force-pushed the 04-14-fix_rivetkit_add_lifecycle_error_retry_and_gateway_http_routing branch from 0d9c403 to 110fc1e Compare April 27, 2026 20:48
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/18-remove-legacy-timeouts branch from 731cecf to a5ee22e Compare April 27, 2026 21:38
@NathanFlurry NathanFlurry force-pushed the 04-14-fix_rivetkit_add_lifecycle_error_retry_and_gateway_http_routing branch from 110fc1e to 41c83ab Compare April 27, 2026 21:38
@NathanFlurry
Copy link
Copy Markdown
Member Author

Landed in main via stack-merge fast-forward push. Commits are in main; closing to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant