fix(pb): fix pool config defaults by MasterPtato · Pull Request #4774 · rivet-dev/rivet

MasterPtato · 2026-04-25T01:08:10Z

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

MasterPtato · 2026-04-25T01:08:29Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-04-25T01:20:18Z

Code Review

This PR makes three targeted bug fixes to pool config defaults. All changes are correct and address real issues. Some notes below.

Change 1: `drain_on_version_upgrade` default: `false` → `true`

This is a behavioral change that affects any deployment relying on the implicit default. Draining on version upgrade is the safer operational behavior (avoids routing traffic to stale runners), so true is the better default. Worth noting in a changelog or migration note since existing configs that omitted the field will now behave differently after a redeploy.

Change 2: `drain_grace_period` default: `10_000` → `10`

Clear bug fix. The field is documented in seconds, so the old default of 10_000 was ~2.8 hours rather than the intended 10 seconds. The inline comment claims this mirrors the deprecated config.pegboard.serverless_drain_grace_period value — if that deprecated field was in milliseconds (10,000 ms = 10 s), the comment is now misleading and should be updated or removed to avoid re-introducing the confusion.

Change 3: Division-by-zero guard in `runner_pool.rs`

The fix prevents a panic when slots_per_runner == 0, but silently falls back to treating every slot as its own runner. Since the field is not Option<u32>, the caller is required to supply a value but there is no upstream validation rejecting zero. Consider whether a zero value should be rejected at config parse time with an explicit error rather than silently over-provisioning. The current fix is still an improvement over panicking.

Minor observations

No tests were added or updated to pin the new default values. Given these are config defaults that previously caused silent operational bugs, a unit test asserting the Into conversion produces the correct defaults (10 s grace period, drain_on_version_upgrade = true) would prevent regression.
The Serverless fields slots_per_runner, min_runners, max_runners, and runners_margin are all marked deprecated in the struct. If they are no longer the canonical way to configure pool sizing, it may be worth adding a validation warning when they are supplied to nudge users to migrate.

Overall the fixes are correct and the PR is safe to merge. The stale comment on drain_grace_period and the lack of regression tests are the main things worth addressing before or after merging.

claude · 2026-04-25T01:21:50Z

Code Review: fix(pb): fix pool config defaults

Overview

This PR fixes three bugs in pool configuration defaults:

drain_on_version_upgrade default changed from false → true
drain_grace_period default changed from 10_000 → 10
Division-by-zero protection when slots_per_runner is 0

Analysis

✅ `drain_on_version_upgrade` default: `false` → `true`

Correct change. Draining on version upgrade is the safer default — it avoids killing in-flight actors during a deployment. Opting into disruptive upgrades should be explicit.

✅ Division-by-zero guard: `slots_per_runner.max(1)`

(adjusted_desired_slots as u32).div_ceil(slots_per_runner.max(1))

Clear defensive fix. slots_per_runner = 0 would panic on integer division, and .max(1) keeps the semantics sensible (0 slots → behave as 1 to avoid division error). Good.

⚠️ `drain_grace_period` default: `10_000` → `10`

The struct documents the field as seconds:

/// Seconds.
drain_grace_period: Option<u32>,

The old default of 10_000 (seconds) = ~2.77 hours is obviously wrong. The deprecated global fallback (config.pegboard.serverless_drain_grace_period) defaults to 10_000 milliseconds = 10 seconds, so the correct equivalent in seconds is 10. Fix is correct.

However, there is a related inconsistency not addressed in this PR:

In engine/sdks/rust/data/src/versioned/namespace_runner_config.rs (v4 → v5 migration), the hardcoded value is still 10_000:

// Default to deprecated config value (config.pegboard.serverless_drain_grace_period)
drain_grace_period: 10_000,  // line 312 — not changed in this PR

If the v5 BARE schema field is also in seconds (matching the live RunnerConfigKind::Serverless type), existing migrated configs will carry a 10,000-second (~2.77 hour) drain grace period. This should be 10 to match the fix applied here.

Also worth noting: in engine/packages/pegboard/src/workflows/serverless/conn.rs, the runner config's drain_grace_period is currently skipped via .. and the deprecated global config is used instead — so this fix does not affect runtime behavior yet. When this field is eventually wired in, the migration value at line 312 in versioned/namespace_runner_config.rs will become load-bearing.

Summary

Change	Assessment
`drain_on_version_upgrade` default → `true`	✅ Correct, safer default
`drain_grace_period` default → `10`	✅ Correct unit fix
`slots_per_runner.max(1)` div-by-zero guard	✅ Correct defensive fix
v4→v5 migration `drain_grace_period: 10_000` (line 312 in `namespace_runner_config.rs`)	⚠️ Not updated — likely the same unit bug

Suggested follow-up: Update the v4→v5 migration default at engine/sdks/rust/data/src/versioned/namespace_runner_config.rs:312 from 10_000 to 10 to match this fix, or document why it should remain different. The comment on that line references the same deprecated config as justification.

MasterPtato requested a review from NathanFlurry April 25, 2026 01:08

This was referenced Apr 25, 2026

fix(outbound): implement drain grace period #4775

Open

fix(pb-envoy): hard abort ws to tunnel on ping error #4776

Open

fix(pb): fix pool config defaults

ac730ee

MasterPtato force-pushed the 04-24-fix_pb_remove_actor_sleep_ts_when_allocated branch from 1afa22c to fd614fc Compare April 25, 2026 01:14

MasterPtato force-pushed the 04-24-fix_pb_fix_pool_config_defaults branch from a37d543 to ac730ee Compare April 25, 2026 01:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pb): fix pool config defaults#4774

fix(pb): fix pool config defaults#4774
MasterPtato wants to merge 1 commit into04-24-fix_pb_remove_actor_sleep_ts_when_allocatedfrom
04-24-fix_pb_fix_pool_config_defaults

MasterPtato commented Apr 25, 2026

Uh oh!

MasterPtato commented Apr 25, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 25, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MasterPtato commented Apr 25, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

MasterPtato commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Change 1: drain_on_version_upgrade default: false → true

Change 2: drain_grace_period default: 10_000 → 10

Change 3: Division-by-zero guard in runner_pool.rs

Minor observations

Uh oh!

claude Bot commented Apr 25, 2026

Code Review: fix(pb): fix pool config defaults

Overview

Analysis

✅ drain_on_version_upgrade default: false → true

✅ Division-by-zero guard: slots_per_runner.max(1)

⚠️ drain_grace_period default: 10_000 → 10

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MasterPtato commented Apr 25, 2026 •

edited

Loading

claude Bot commented Apr 25, 2026 •

edited

Loading

Change 1: `drain_on_version_upgrade` default: `false` → `true`

Change 2: `drain_grace_period` default: `10_000` → `10`

Change 3: Division-by-zero guard in `runner_pool.rs`

✅ `drain_on_version_upgrade` default: `false` → `true`

✅ Division-by-zero guard: `slots_per_runner.max(1)`

⚠️ `drain_grace_period` default: `10_000` → `10`