Skip to content

fix(pegboard-envoy): subscribe before registering envoy#4802

Closed
NathanFlurry wants to merge 1 commit intoengine-stabilize/envoy-client-command-dedupfrom
engine-stabilize/envoy-subscribe-before-register
Closed

fix(pegboard-envoy): subscribe before registering envoy#4802
NathanFlurry wants to merge 1 commit intoengine-stabilize/envoy-client-command-dedupfrom
engine-stabilize/envoy-subscribe-before-register

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 27, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

Code Review

Overview

This PR fixes a race condition in pegboard-envoy by reordering the subscription step to happen before init_conn registers the envoy in the load balancer. Previously, if a pending actor retried as soon as the envoy became eligible, it could send a start command to an envoy that had not yet subscribed, causing the message to be missed.

The fix is logically correct and the updated comment accurately explains the invariant.


Issue: Double Namespace Resolution

The main concern is that the PR introduces a redundant resolve_for_name_global call. The namespace is now resolved in both places:

  1. In lib.rs (new code) to get namespace.namespace_id for the PubSub subscriptions.
  2. Inside conn::init_conn in conn.rs (lines 53–59), which also calls resolve_for_name_global independently.

This means every new envoy connection now pays two DB round-trips to resolve the same namespace. The cleaner fix would be to refactor init_conn to accept the already-resolved namespace_id: Id (or the full namespace struct) directly, eliminating the duplicate lookup. The namespace resolution could be lifted entirely to lib.rs and passed through.


Minor Observations

  • Correctness of field usage: url_data.envoy_key.clone() is used for subscriptions instead of conn.envoy_key. These are equivalent since init_conn stores url_data.envoy_key into conn.envoy_key, so no behavior change there.
  • url_data ownership: url_data.namespace and url_data.envoy_key are cloned before passing url_data to init_conn, which destructures it. This is correct — no fields are moved prematurely.
  • Comment on // Create the connection.: Accurate but fairly obvious from the call; consider dropping it per the project's minimal-comment convention.
  • Error messages: The with_context messages on the new namespace resolution are consistent with how init_conn phrases equivalent errors.

Summary

The race-condition fix is correct. The main follow-up is eliminating the duplicate resolve_for_name_global call by threading the resolved namespace ID into init_conn instead of re-resolving it there.

@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-client-lifecycle branch from cf29c73 to 2cdf843 Compare April 27, 2026 07:31
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from 66a6724 to 090e234 Compare April 27, 2026 07:31
@NathanFlurry NathanFlurry marked this pull request as ready for review April 27, 2026 08:08
@NathanFlurry NathanFlurry changed the base branch from engine-stabilize/envoy-client-lifecycle to graphite-base/4802 April 27, 2026 08:31
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from 090e234 to 7ca2768 Compare April 27, 2026 08:31
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4802 to engine-stabilize/envoy-client-command-dedup April 27, 2026 08:32
@NathanFlurry NathanFlurry changed the base branch from engine-stabilize/envoy-client-command-dedup to graphite-base/4802 April 27, 2026 09:56
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from 7ca2768 to bdd460a Compare April 27, 2026 09:56
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4802 to 04-27-fix_envoy-client_clear_processed_command_idx_after_ack_send April 27, 2026 09:56
Copy link
Copy Markdown
Contributor

@MasterPtato MasterPtato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove the dupe ns resolve for name global call in conn.rs

@NathanFlurry NathanFlurry changed the base branch from 04-27-fix_envoy-client_clear_processed_command_idx_after_ack_send to graphite-base/4802 April 27, 2026 19:30
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from d5733bb to 80a124a Compare April 27, 2026 19:30
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4802 to engine-stabilize/envoy-client-command-dedup April 27, 2026 19:31
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from 80a124a to c491f97 Compare April 27, 2026 19:40
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-client-command-dedup branch from e4a53d3 to a6d7cb6 Compare April 27, 2026 19:40
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from c491f97 to 9b2fdc5 Compare April 27, 2026 20:48
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-client-command-dedup branch from a6d7cb6 to 93e48e9 Compare April 27, 2026 20:48
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from 9b2fdc5 to e8f7ae6 Compare April 27, 2026 21:29
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-client-command-dedup branch from 93e48e9 to 4ef7f4e Compare April 27, 2026 21:29
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-client-command-dedup branch from 4ef7f4e to 71ea925 Compare April 27, 2026 21:38
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from e8f7ae6 to 51ba630 Compare April 27, 2026 21:38
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-subscribe-before-register branch from 51ba630 to f8eb8a6 Compare April 27, 2026 22:03
@NathanFlurry
Copy link
Copy Markdown
Member Author

Landed in main via stack-merge fast-forward push. Commits are in main; closing to match.

@github-actions
Copy link
Copy Markdown
Contributor

Preview packages published to npm

Install with:

npm install rivetkit@pr-4802

All packages published as 0.0.0-pr.4802.89b4f34 with tag pr-4802.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-89b4f34
docker pull rivetdev/engine:full-89b4f34
Individual packages
npm install rivetkit@pr-4802
npm install @rivetkit/react@pr-4802
npm install @rivetkit/rivetkit-napi@pr-4802
npm install @rivetkit/workflow-engine@pr-4802

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants