Skip to content

fix(pb): remove actor sleep ts when allocated#4772

Open
MasterPtato wants to merge 1 commit into04-24-refactor_gateway_clean_up_and_fix_lost_tunnel_msgsfrom
04-24-fix_pb_remove_actor_sleep_ts_when_allocated
Open

fix(pb): remove actor sleep ts when allocated#4772
MasterPtato wants to merge 1 commit into04-24-refactor_gateway_clean_up_and_fix_lost_tunnel_msgsfrom
04-24-fix_pb_remove_actor_sleep_ts_when_allocated

Conversation

@MasterPtato
Copy link
Copy Markdown
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review: fix(pb): remove actor sleep ts when allocated (#4772)

File: engine/packages/pegboard/src/workflows/actor2/runtime.rs

Summary

This PR fixes a real bug in the allocate activity in actor2/runtime.rs. Previously, SleepTsKey was deleted unconditionally at the start of every allocation attempt — even when the allocation failed (e.g., due to ConcurrentActorLimitReached or NoEnvoys). Now the deletion is gated on a successful allocation (allocation.is_some()). The fix is correct and minimal.


What the Bug Was

Before this change, the following was placed unconditionally inside the FDB transaction before any allocation decision was made:

// Set not sleeping
tx.delete(&keys::actor::SleepTsKey::new(actor_id));

This meant that a sleeping actor that failed to allocate (no envoys available, concurrent actor limit reached) would have its SleepTsKey deleted and state.sleep_ts cleared — even though the actor remained in a sleeping/reallocating state. This could cause the gateway (ops/actor/get_for_gateway.rs) to report sleeping: false for an actor that was actually still sleeping, potentially preventing proper wake-on-request behavior.

After the fix, tx.delete and state.sleep_ts = None are only executed when allocation.is_some().


Correctness Analysis

The fix is logically sound. The set_sleeping activity writes SleepTsKey and sets state.sleep_ts = Some(now). The only correct time to clear them is when the actor is actually being woken up, which requires a successful allocation. The v1 actor workflow in actor/runtime.rs correctly gates its SleepTsKey deletion inside the allocated-status branch, consistent with this fix.

Both the UDB key deletion and state.sleep_ts = None are correctly gated together on allocation.is_some(), keeping the FDB key and in-memory state in sync.


Potential Concerns

1. No regression test added

The PR does not add a test for this scenario. A test covering "sleeping actor fails to allocate, sleep_ts remains set" would prevent regressions. Non-blocking given existing test infrastructure, but worth a follow-up.

2. PR description is unfilled

The PR body does not describe what the bug was, what caused it, or how it was found/tested. A short description of the scenario would help with future archeology.


Code Style

No issues. Logging follows tracing::* structured-field convention, no println!/eprintln! introduced, comment style matches surrounding code, hard tabs used correctly.


Verdict

Approve. The fix is correct, minimal, and matches the pattern used in the v1 actor workflow.

@MasterPtato MasterPtato force-pushed the 04-24-refactor_gateway_clean_up_and_fix_lost_tunnel_msgs branch from 2245d28 to 9681ff0 Compare April 25, 2026 01:07
@MasterPtato MasterPtato force-pushed the 04-24-fix_pb_remove_actor_sleep_ts_when_allocated branch from ce212a2 to 1afa22c Compare April 25, 2026 01:07
@MasterPtato MasterPtato force-pushed the 04-24-fix_pb_remove_actor_sleep_ts_when_allocated branch from 1afa22c to fd614fc Compare April 25, 2026 01:14
@MasterPtato MasterPtato force-pushed the 04-24-refactor_gateway_clean_up_and_fix_lost_tunnel_msgs branch from 9681ff0 to b08bae3 Compare April 25, 2026 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant