Skip to content

feat: state migration benchmark#6922

Open
LesnyRumcajs wants to merge 3 commits intomainfrom
state-migration-benchmark
Open

feat: state migration benchmark#6922
LesnyRumcajs wants to merge 3 commits intomainfrom
state-migration-benchmark

Conversation

@LesnyRumcajs
Copy link
Copy Markdown
Member

@LesnyRumcajs LesnyRumcajs commented Apr 15, 2026

Summary of changes

Changes introduced in this pull request:

  • I failed at improving the perf for now but this might be useful

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Outside contributions

  • I have read and agree to the CONTRIBUTING document.
  • I have read and agree to the AI Policy document. I understand that failure to comply with the guidelines will lead to rejection of the pull request.

Summary by CodeRabbit

  • New Features

    • Added a new migrate CLI command to run a single state migration against a snapshot, accepting snapshot path, target height, and backend selection.
    • Command reports whether a migration produced a new state and logs the parent→new state transition with elapsed time.
    • Height parameter parsing now accepts case-insensitive string input.
  • Performance / Behavior

    • Snapshot CAR imports now use a write buffer during loading to improve write behavior.

@LesnyRumcajs LesnyRumcajs requested a review from a team as a code owner April 15, 2026 14:34
@LesnyRumcajs LesnyRumcajs requested review from akaladarshi and hanabi1224 and removed request for a team April 15, 2026 14:35
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a4105bd7-a657-4014-af4b-f2132323e3d2

📥 Commits

Reviewing files that changed from the base of the PR and between aa3a16b and bc52df8.

📒 Files selected for processing (1)
  • src/utils/db/car_util.rs

Walkthrough

Adds a new MigrateCommand CLI subcommand to run a single state migration on the head of a snapshot CAR (or .car.zst) at a specified height; creates a temporary ParityDb (writable or read-only overlay), detects network from genesis CID, loads actor bundles, and runs migrations for the computed epoch.

Changes

Cohort / File(s) Summary
New MigrateCommand
src/dev/subcommands/migrate_cmd.rs
Adds MigrateCommand CLI (--snapshot, --height, optional --backend) with pub async fn run(self). Creates temporary on-disk ParityDb, imports or overlays snapshot by backend, derives network from snapshot genesis CID, constructs ChainConfig, validates epoch, loads actor bundles, runs run_state_migrations, and logs parent->new state or errors.
Subcommand Integration
src/dev/subcommands/mod.rs
Adds mod migrate_cmd;, new Subcommand::Migrate(migrate_cmd::MigrateCommand) variant and dispatches cmd.run().await; adds user-facing subcommand docs.
CLI Parsing: Height
src/networks/mod.rs
Adds EnumString derive and #[strum(ascii_case_insensitive)] to pub enum Height to enable case-insensitive CLI parsing of height identifiers.
CAR Import Buffering
src/utils/db/car_util.rs
Wraps provided db: &impl Blockstore with BlockstoreWithWriteBuffer::new_with_capacity(db, 10000) in load_car, so CAR block writes go through a write-buffered blockstore.

Sequence Diagram

sequenceDiagram
    participant CLI as User/CLI
    participant Migrate as MigrateCommand
    participant Parity as ParityDb (temp)
    participant Snapshot as Snapshot (CAR / CAR.ZST)
    participant NetDetect as Network Detection
    participant Bundles as Actor Bundles
    participant Migrator as run_state_migrations

    CLI->>Migrate: invoke with --snapshot and --height
    Migrate->>Parity: create temporary on-disk ParityDb
    alt backend = Db (import)
        Migrate->>Parity: import snapshot into writable Db
    else backend = Car (overlay)
        Migrate->>Snapshot: attach snapshot as read-only overlay
        Snapshot-->>Parity: provide read-only view
    end
    Migrate->>Snapshot: read head tipset & parent state
    Migrate->>NetDetect: derive network from genesis CID
    NetDetect-->>Migrate: network + ChainConfig
    Migrate->>Bundles: load actor bundles for network
    Bundles-->>Migrate: bundles loaded
    Migrate->>Migrator: run_state_migrations(epoch, config, store, parent_state)
    Migrator-->>Migrate: migration result (new_state or none)
    Migrate-->>CLI: log parent_state -> new_state or error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • akaladarshi
  • sudo-shashank
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: state migration benchmark' aligns with the main objective of the PR, which introduces a new CLI subcommand for state migration benchmarking on snapshot heads.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch state-migration-benchmark
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch state-migration-benchmark

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/dev/subcommands/migrate_cmd.rs (2)

40-63: Add context on the setup failure paths.

The raw ?s here leave very little signal when temp DB creation, head discovery, genesis lookup, or bundle loading fails. Please wrap those boundaries with .context(...).

Suggested patch
-        let temp_dir = tempfile::Builder::new()
-            .prefix("forest-migrate-")
-            .tempdir()?;
+        let temp_dir = tempfile::Builder::new()
+            .prefix("forest-migrate-")
+            .tempdir()
+            .context("failed to create temporary ParityDb directory")?;
         let paritydb_path = temp_dir.path().join("paritydb");
-        let paritydb = Db::open(&paritydb_path, &DbConfig::default())?;
+        let paritydb = Db::open(&paritydb_path, &DbConfig::default())
+            .with_context(|| format!("failed to open ParityDb at {}", paritydb_path.display()))?;
@@
-        let head = store.heaviest_tipset()?;
-        let genesis = head.genesis(&store)?;
+        let head = store
+            .heaviest_tipset()
+            .context("failed to read snapshot head tipset")?;
+        let genesis = head
+            .genesis(&store)
+            .context("failed to resolve snapshot genesis tipset")?;
@@
-        load_actor_bundles(store.writer(), &network).await?;
+        load_actor_bundles(store.writer(), &network)
+            .await
+            .with_context(|| format!("failed to load actor bundles for {network}"))?;
As per coding guidelines "Use `anyhow::Result` for most operations and add context with `.context()` when errors occur".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dev/subcommands/migrate_cmd.rs` around lines 40 - 63, Wrap all fallible
setup calls with `.context(...)` to provide clear error messages: add context to
tempfile::Builder::tempdir(), Db::open(paritydb_path, ...),
ManyCar::new(paritydb) usage and the store.read_only_file(&snapshot) call, the
store.heaviest_tipset() and head.genesis(&store) calls,
NetworkChain::from_genesis(genesis.cid()) (or its `.context(...)` already
present but extend messages if needed), ChainConfig::from_chain(&network) if it
can fail, and the await call to load_actor_bundles(store.writer(), &network).
For each, call `.with_context(|| "clear descriptive message about which step
failed and the relevant variables (e.g., snapshot path, paritydb path, genesis
cid)")` or `.context("...")` so failures include actionable diagnostics tied to
the function names above.

76-77: Move the migration off the Tokio worker thread.

run_state_migrations() is synchronous and CPU-heavy; running it directly inside this async command can pin a runtime worker for the whole benchmark. Wrap it in tokio::task::spawn_blocking and await the join handle instead.

Suggested patch
-        let new_state = run_state_migrations(epoch, &chain_config, &store, &parent_state)?;
+        let chain_config = chain_config.clone();
+        let store = Arc::clone(&store);
+        let new_state = tokio::task::spawn_blocking(move || {
+            run_state_migrations(epoch, &chain_config, &store, &parent_state)
+        })
+        .await
+        .context("state migration task panicked")??;
As per coding guidelines "Use `tokio::task::spawn_blocking` for CPU-intensive work such as VM execution and cryptography".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dev/subcommands/migrate_cmd.rs` around lines 76 - 77, Replace the direct
synchronous call to run_state_migrations(...) with a tokio::task::spawn_blocking
wrapper so the CPU-bound work doesn't run on a Tokio worker thread; e.g. call
tokio::task::spawn_blocking(move || run_state_migrations(epoch,
chain_config_clone, store_clone, parent_state_clone)).await and propagate errors
(handle the JoinError and the inner Result with ?), making sure to clone or Arc
the captured variables (chain_config, store, parent_state) into the closure as
needed and keep the timing semantics (start = Instant::now() remains outside so
you still measure wall time of the spawned task).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/dev/subcommands/migrate_cmd.rs`:
- Around line 52-77: Compare the requested migration epoch/height to the
snapshot head epoch returned by store.heaviest_tipset() and reject or resolve
appropriately before calling run_state_migrations: if head.epoch() > epoch
(i.e., the snapshot is later than the requested migration), resolve the tipset
corresponding to the requested epoch and extract its parent state (instead of
using head.parent_state()), or return an error; if head.epoch() < epoch, return
an error indicating the snapshot is too old; finally pass that resolved
parent_state into run_state_migrations(epoch, &chain_config, &store,
&parent_state).

---

Nitpick comments:
In `@src/dev/subcommands/migrate_cmd.rs`:
- Around line 40-63: Wrap all fallible setup calls with `.context(...)` to
provide clear error messages: add context to tempfile::Builder::tempdir(),
Db::open(paritydb_path, ...), ManyCar::new(paritydb) usage and the
store.read_only_file(&snapshot) call, the store.heaviest_tipset() and
head.genesis(&store) calls, NetworkChain::from_genesis(genesis.cid()) (or its
`.context(...)` already present but extend messages if needed),
ChainConfig::from_chain(&network) if it can fail, and the await call to
load_actor_bundles(store.writer(), &network). For each, call `.with_context(||
"clear descriptive message about which step failed and the relevant variables
(e.g., snapshot path, paritydb path, genesis cid)")` or `.context("...")` so
failures include actionable diagnostics tied to the function names above.
- Around line 76-77: Replace the direct synchronous call to
run_state_migrations(...) with a tokio::task::spawn_blocking wrapper so the
CPU-bound work doesn't run on a Tokio worker thread; e.g. call
tokio::task::spawn_blocking(move || run_state_migrations(epoch,
chain_config_clone, store_clone, parent_state_clone)).await and propagate errors
(handle the JoinError and the inner Result with ?), making sure to clone or Arc
the captured variables (chain_config, store, parent_state) into the closure as
needed and keep the timing semantics (start = Instant::now() remains outside so
you still measure wall time of the spawned task).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 04aed58c-2826-4db0-ad85-2c6717bbac68

📥 Commits

Reviewing files that changed from the base of the PR and between 101befd and a44294e.

📒 Files selected for processing (3)
  • src/dev/subcommands/migrate_cmd.rs
  • src/dev/subcommands/mod.rs
  • src/networks/mod.rs

Comment thread src/dev/subcommands/migrate_cmd.rs Outdated
Comment on lines +52 to +77
let head = store.heaviest_tipset()?;
let genesis = head.genesis(&store)?;
let network = NetworkChain::from_genesis(genesis.cid()).context(
"snapshot genesis does not match any known mainnet/calibnet/butterflynet genesis; custom devnets are not supported",
)?;
let chain_config = ChainConfig::from_chain(&network);

// The migration reads the target-height actor bundle from the
// blockstore; load it into the writable layer so it's visible through
// the ManyCar.
load_actor_bundles(store.writer(), &network).await?;

let epoch = chain_config.epoch(height);
anyhow::ensure!(
epoch > 0,
"no epoch configured for height {height} on {network}"
);

let parent_state = *head.parent_state();
tracing::info!(
"Running {height} migration on {network} (epoch {epoch}); head epoch {head_epoch}, parent state {parent_state}",
head_epoch = head.epoch(),
);

let start = std::time::Instant::now();
let new_state = run_state_migrations(epoch, &chain_config, &store, &parent_state)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate that the snapshot head matches the requested migration boundary.

run_state_migrations() uses the supplied parent_state as-is. This command always passes head.parent_state(), so if the snapshot head is later than the requested upgrade, the migration runs against the wrong state tree and the benchmark result is meaningless. Reject mismatched heads or resolve the tipset at the requested epoch before extracting parent_state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dev/subcommands/migrate_cmd.rs` around lines 52 - 77, Compare the
requested migration epoch/height to the snapshot head epoch returned by
store.heaviest_tipset() and reject or resolve appropriately before calling
run_state_migrations: if head.epoch() > epoch (i.e., the snapshot is later than
the requested migration), resolve the tipset corresponding to the requested
epoch and extract its parent state (instead of using head.parent_state()), or
return an error; if head.epoch() < epoch, return an error indicating the
snapshot is too old; finally pass that resolved parent_state into
run_state_migrations(epoch, &chain_config, &store, &parent_state).

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 1.26582% with 78 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.02%. Comparing base (101befd) to head (bc52df8).
⚠️ Report is 24 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/dev/subcommands/migrate_cmd.rs 0.00% 77 Missing ⚠️
src/dev/subcommands/mod.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/networks/mod.rs 88.81% <ø> (ø)
src/utils/db/car_util.rs 88.98% <100.00%> (+0.09%) ⬆️
src/dev/subcommands/mod.rs 72.46% <0.00%> (-1.07%) ⬇️
src/dev/subcommands/migrate_cmd.rs 0.00% <0.00%> (ø)

... and 72 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 101befd...bc52df8. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/dev/subcommands/migrate_cmd.rs (1)

138-139: ⚠️ Potential issue | 🟠 Major

Validate head epoch before deriving migration input state.

At Line 139, parent_state is always taken from the snapshot head, but epoch is derived from --height. If head epoch and migration epoch differ, this benchmarks the wrong pre-migration state.

Suggested fix
 fn bench<DB: Blockstore + Send + Sync>(
     store: &Arc<DB>,
     chain_config: &ChainConfig,
     network: &NetworkChain,
     head: Tipset,
     height: Height,
 ) -> anyhow::Result<()> {
     let epoch = chain_config.epoch(height);
+    anyhow::ensure!(
+        head.epoch() == epoch,
+        "snapshot head epoch {} does not match requested migration epoch {} for height {height}; use a snapshot cut at the migration boundary",
+        head.epoch(),
+        epoch
+    );
     let parent_state = *head.parent_state();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dev/subcommands/migrate_cmd.rs` around lines 138 - 139, The code derives
epoch = chain_config.epoch(height) but always uses parent_state =
*head.parent_state(), which can belong to a different epoch; update migrate_cmd
to validate that the snapshot head epoch matches the migration epoch (compare
head.epoch() or head.height() to epoch/height) and return an error if they
differ, or alternatively load the snapshot/state corresponding to the requested
epoch/height before setting parent_state; reference the variables/functions
epoch, chain_config.epoch(height), parent_state, and head to locate where to add
the check or alternative state lookup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/dev/subcommands/migrate_cmd.rs`:
- Around line 138-139: The code derives epoch = chain_config.epoch(height) but
always uses parent_state = *head.parent_state(), which can belong to a different
epoch; update migrate_cmd to validate that the snapshot head epoch matches the
migration epoch (compare head.epoch() or head.height() to epoch/height) and
return an error if they differ, or alternatively load the snapshot/state
corresponding to the requested epoch/height before setting parent_state;
reference the variables/functions epoch, chain_config.epoch(height),
parent_state, and head to locate where to add the check or alternative state
lookup.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a207f1e0-3b9d-4d3d-95fd-754d5618a805

📥 Commits

Reviewing files that changed from the base of the PR and between a44294e and aa3a16b.

📒 Files selected for processing (1)
  • src/dev/subcommands/migrate_cmd.rs

hanabi1224
hanabi1224 previously approved these changes Apr 22, 2026
Comment thread src/utils/db/car_util.rs
where
R: AsyncBufRead + AsyncSeek + Unpin,
{
let db = crate::db::BlockstoreWithWriteBuffer::new_with_capacity(db, 10000);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter db could be BlockstoreWithWriteBuffer already, we should somehow avoid recursive BlockstoreWithWriteBuffer wrappers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants