Skip to content

fix(pegboard): add threshold parameters to runner protocol metadata#4247

Merged
NathanFlurry merged 4 commits intomainfrom
02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata
Feb 27, 2026
Merged

fix(pegboard): add threshold parameters to runner protocol metadata#4247
NathanFlurry merged 4 commits intomainfrom
02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata

Conversation

@MasterPtato
Copy link
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4247 February 20, 2026 02:36 Destroyed
@railway-app
Copy link

railway-app bot commented Feb 20, 2026

🚅 Deployed to the rivet-pr-4247 environment in rivet-frontend

Service Status Web Updated (UTC)
website 😴 Sleeping (View Logs) Web Feb 23, 2026 at 11:24 am
ladle ❌ Build Failed (View Logs) Web Feb 21, 2026 at 1:41 am
frontend-inspector ❌ Build Failed (View Logs) Web Feb 20, 2026 at 10:19 pm
frontend-cloud ❌ Build Failed (View Logs) Web Feb 20, 2026 at 10:18 pm
mcp-hub ✅ Success (View Logs) Web Feb 20, 2026 at 2:37 am

Copy link
Contributor Author

MasterPtato commented Feb 20, 2026

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 20, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4247

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4247

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4247

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4247

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4247

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4247

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4247

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4247

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4247

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4247

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4247

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4247

commit: 209ff35

@claude
Copy link

claude bot commented Feb 20, 2026

test comment - please ignore

@claude
Copy link

claude bot commented Feb 20, 2026

simple test 2

@claude
Copy link

claude bot commented Feb 20, 2026

test with gt sign and lt sign

@claude
Copy link

claude bot commented Feb 20, 2026


PR Review: fix(pegboard): add threshold parameters to runner protocol metadata

Overview

This PR bumps the runner protocol to v6, adding actor_stop_threshold and serverless_drain_grace_period to ProtocolMetadata. It replaces the hardcoded DRAIN_GRACE_PERIOD constant with a config-driven value, moves runner config lookup to connection init, and applies protocol-derived limits to TypeScript actor configs. The performance improvements (HashMap cache lookup, parallelized DB queries in allocate_actor_v2) are welcome additions.


Critical Bug

Missing closing brace in engine/packages/config/src/config/pegboard.rs (lines 383–389)

The runner_event_demuxer_max_last_seen_ms function is missing its closing }. The new serverless_drain_grace_period function was inserted before the close of the existing function:

// Current (broken):
pub fn runner_event_demuxer_max_last_seen_ms(&self) -> u64 {
    self.runner_event_demuxer_max_last_seen_ms.unwrap_or(30_000)

pub fn serverless_drain_grace_period(&self) -> u64 {
    self.serverless_drain_grace_period.unwrap_or(10_000)
}
}

// Should be:
pub fn runner_event_demuxer_max_last_seen_ms(&self) -> u64 {
    self.runner_event_demuxer_max_last_seen_ms.unwrap_or(30_000)
}

pub fn serverless_drain_grace_period(&self) -> u64 {
    self.serverless_drain_grace_period.unwrap_or(10_000)
}

This will fail to compile as pub fn is not valid inside a function body.


Issues

1. Shared definition mutation in actor-driver.ts

lookupInRegistry returns the shared definition object directly from the registry (config.use[name]). Mutating definition.config.options in place permanently modifies the shared definition — every subsequent actor instantiation of the same type will inherit the already-capped timeouts, even if the protocol metadata changes between reconnections. The limits should be applied to a copy, or passed through to instantiate().

// Lines 509–527: mutates the shared definition
definition.config.options.onSleepTimeout = Math.min(...);
definition.config.options.onDestroyTimeout = Math.min(...);

2. Missing documentation update

Per the project's CLAUDE.md, config changes in engine/packages/config/ require updating website/src/content/docs/self-hosting/configuration.mdx. The new serverless_drain_grace_period config option is not reflected there.


Minor Notes

  • v6.bare header comment says # Runner Protocol v1 — should be v6.
  • Grammar: The comment // Check if runner is an serverless runner (runtime.rs:286) should be // Check if the runner is a serverless runner.
  • Behavioral change in allocation: The old has_valid_serverless check required max_runners \!= 0 (i.e. a valid config) before triggering serverless behavior. The new for_serverless only checks if the runner config variant is Serverless — a runner with max_runners == 0 would now also be treated as serverless in the allocation fallback path. If this is intentional, worth a comment; if not, it may be a regression.

Positives

  • Protocol versioning: The v5→v6 bump is clean. Correctly identifying that ToServer/ToRunner formats are unchanged between v5 and v6 and sharing the deserializer is a nice optimization.
  • HashMap cache for last_ping_cache: Good improvement from O(n) Vec scan to O(1) lookup.
  • Parallelized DB queries in allocate_actor_v2 with tokio::join\!: Good use of concurrent I/O.
  • Extracting #startRunnerLostTimeout: Removing the duplicated timeout logic into a helper method is a clean refactor.
  • Moving runner config fetch to handle_init: Parallelizing it with the missed-commands DB query via tokio::try_join\! is a sensible improvement.

@MasterPtato MasterPtato force-pushed the 02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata branch from 3008d9c to 13b1948 Compare February 20, 2026 22:14
@MasterPtato MasterPtato marked this pull request as ready for review February 20, 2026 22:14
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4247 February 20, 2026 22:15 Destroyed
@MasterPtato MasterPtato force-pushed the 02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata branch from 13b1948 to 76e52b6 Compare February 20, 2026 22:16
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4247 February 20, 2026 22:16 Destroyed
@MasterPtato MasterPtato force-pushed the 02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata branch from 76e52b6 to ce7323c Compare February 21, 2026 01:40
@MasterPtato MasterPtato force-pushed the 02-19-chore_get_docker_compose_working_again branch from 121bf70 to 3da1f9e Compare February 21, 2026 01:40
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4247 February 21, 2026 01:40 Destroyed
@MasterPtato MasterPtato force-pushed the 02-19-chore_get_docker_compose_working_again branch from 3da1f9e to 1cefffe Compare February 26, 2026 01:12
@MasterPtato MasterPtato force-pushed the 02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata branch from ce7323c to 30accfe Compare February 26, 2026 01:12
@MasterPtato MasterPtato force-pushed the 02-19-chore_get_docker_compose_working_again branch from 1cefffe to f33dd50 Compare February 26, 2026 02:14
@MasterPtato MasterPtato force-pushed the 02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata branch from 30accfe to 0bb287d Compare February 26, 2026 02:14
@MasterPtato MasterPtato force-pushed the 02-19-chore_get_docker_compose_working_again branch from f33dd50 to a49f2da Compare February 26, 2026 19:42
@MasterPtato MasterPtato force-pushed the 02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata branch from 0bb287d to 9c03cf4 Compare February 26, 2026 19:42
@claude
Copy link

claude bot commented Feb 26, 2026

PR Review: fix(pegboard): add threshold parameters to runner protocol metadata

Summary

This PR introduces protocol v6, adding actor_stop_threshold and serverless_drain_grace_period to ProtocolMetadata, makes the drain grace period configurable (replacing a hardcoded constant), and adds a per-instance override mechanism in the TypeScript runner so drivers can enforce protocol-level limits without mutating shared actor configs.

Overall this is a solid, well-structured change. A few items worth checking:


Issues

1. BigInt vs number comparison in #startRunnerLostTimeout

runnerLostThreshold is typed as i64 (BigInt in JavaScript), but the comparison in the new #startRunnerLostTimeout method uses a bare number literal 0:

if (
    \!this.#runnerLostTimeout &&
    this.#protocolMetadata &&
    this.#protocolMetadata.runnerLostThreshold > 0   // ← BigInt > number throws TypeError
)

The old code explicitly converted to Number first (this.#runnerLostThreshold = Number(init.metadata.runnerLostThreshold)), so the comparison was safe. The new code skips that conversion. In JavaScript, bigint > 0 (number) throws TypeError: Cannot mix BigInt and other types, use explicit conversions. The fix:

this.#protocolMetadata.runnerLostThreshold > 0n

The division in the log message already uses 1000n (BigInt literal), confirming this is indeed a BigInt field.

2. Wrong comment in v6.bare schema file

engine/sdks/schemas/runner-protocol/v6.bare opens with:

# Runner Protocol v1

This is a copy-paste artifact from v1.bare. Should read # Runner Protocol v6.

3. Behavioral change in allocate_actor_v2: has_valid_serverlessfor_serverless

The old code checked max_runners \!= 0 as part of determining serverless eligibility:

// old
let has_valid_serverless = runner_config_res
    .first()
    .map(|runner| match &runner.config.kind {
        RunnerConfigKind::Serverless { max_runners, .. } => *max_runners \!= 0,
        _ => false,
    })
    .unwrap_or_default();

The new code only checks if the ByVariantKey exists in the database — it no longer validates max_runners \!= 0. If this is intentional, it's worth a brief comment explaining why the max_runners check was dropped.

4. for_serverless in the CrashPolicy::Sleep arm

Old code:

(CrashPolicy::Sleep, false, false) => Ok(AllocateActorOutputV2 {
    serverless: for_serverless,  // could be true
    status: AllocateActorStatus::Sleep,
}),

New code:

(CrashPolicy::Sleep, false, false) => Ok(AllocateActorOutputV2 {
    serverless: false,  // always false
    status: AllocateActorStatus::Sleep,
}),

This is a behavioral change — when a Sleep-policy actor falls into the sleep path, it's now never treated as serverless. If intentional, a comment would help clarify why.

5. serverlessDrainGracePeriod truthiness check in TypeScript

if (protocolMetadata.serverlessDrainGracePeriod) {
    const drainMax = Math.max(Number(protocolMetadata.serverlessDrainGracePeriod) - 1000, 0);
    ...
}

If the drain grace period is explicitly configured to 0 ms (unlikely but valid since the config accepts it), the truthiness check will silently skip setting the overrides. Consider \!= null instead of a truthy check:

if (protocolMetadata.serverlessDrainGracePeriod \!= null) {

Minor / Nits

6. Repeated inline ternary in instance/mod.ts

The override min-clamping pattern is repeated four times:

this.overrides.runStopTimeout \!== undefined
    ? Math.min(this.#config.options.runStopTimeout, this.overrides.runStopTimeout)
    : this.#config.options.runStopTimeout

A small private helper (e.g., #effectiveTimeout(key: keyof typeof this.overrides, base: number)) would reduce duplication and improve readability. Not a blocker.

7. overrides is a public property

overrides on ActorInstance has no access modifier, making it publicly accessible. Since it's only intended to be set by the driver, a comment documenting its intended usage (internal/driver-only) would help future contributors.


Positives

  • HashMap refactor in gasoline/src/db/kv/mod.rs is a clean O(n) → O(1) improvement for the ping cache.
  • Parallelizing the for_serverless and queue-existence checks in allocate_actor_v2 with tokio::join\! is good for latency.
  • Extracting #startRunnerLostTimeout() in the TypeScript runner eliminates duplicated logic.
  • The v5→v6 conversion design (ToServer/ToRunner have identical binary formats, only ToClient's ProtocolMetadata changed) is clean and the backward compat path is sound.
  • Replacing the hardcoded DRAIN_GRACE_PERIOD with a configurable value is the right call.


// Create actor instance
const definition = lookupInRegistry(this.#config, actorConfig.name);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MasterPtato MasterPtato force-pushed the 02-19-chore_get_docker_compose_working_again branch from a49f2da to cc4d89e Compare February 27, 2026 00:12
@MasterPtato MasterPtato force-pushed the 02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata branch from 9c03cf4 to 209ff35 Compare February 27, 2026 00:12
Base automatically changed from 02-19-chore_get_docker_compose_working_again to main February 27, 2026 00:29
@NathanFlurry NathanFlurry merged commit f976ae4 into main Feb 27, 2026
15 of 34 checks passed
@NathanFlurry NathanFlurry deleted the 02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata branch February 27, 2026 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants