Skip to content

fix(db): avoid long flush stall on restart#211

Open
EddieHouston wants to merge 2 commits intoBlockstream:new-indexfrom
EddieHouston:fix/restart-flush-stall
Open

fix(db): avoid long flush stall on restart#211
EddieHouston wants to merge 2 commits intoBlockstream:new-indexfrom
EddieHouston:fix/restart-flush-stall

Conversation

@EddieHouston
Copy link
Copy Markdown
Collaborator

@EddieHouston EddieHouston commented Apr 23, 2026

Summary

Fix a flush stall that occurs on restart of a mature electrs DB built from 91b883a or later.

On restart, DB::open applied bulk-load L0 triggers (64/256/512) unconditionally. Later in
update(), enable_auto_compaction() reset them to RocksDB defaults (4/20/36). If L0 held
more than 36 files at that point — normal after a bulk-load that was interrupted or followed
by a restart — the tightened level0_stop_writes_trigger immediately put the DB into
pre-flush stall territory. The subsequent db.flush() parked inside
WaitUntilFlushWouldNotStallWrites until background compaction drained L0 below 36.

On testnet with 173 L0 files this caused over 1 hour of indexer freeze on restart.

Fix

  • DB::open() checks the F sentinel before choosing triggers. When F is absent
    (initial sync incomplete), it widens L0 triggers for bulk load (64/256/512). When F is
    present (normal restart), it leaves RocksDB defaults (4/20/36) in place — no runtime
    set_options() needed.

  • apply_bulk_load_triggers() — new method that widens L0 triggers and disables
    pending-compaction-bytes stalls for initial sync throughput.

  • apply_steady_state_triggers() — restores RocksDB defaults after full_compaction()
    drains L0. Called once inside the F sentinel gate in start_auto_compactions().

  • start_auto_compactions() replaces the old enable_auto_compaction() trigger-reset
    logic. On first completion of initial sync (F absent), it runs a one-time full compaction,
    restores steady-state triggers, and writes the F sentinel. On subsequent calls (F
    present), it only enables auto-compaction.

  • Operator logging at info level for which trigger profile is active and whether full
    compaction runs or is skipped.

Test plan

  • cargo check clean
  • cargo test --lib — 8/8 pass, including new_index::db::tests::*
  • cargo test --test electrum — 4/4 pass
  • cargo test --test rest — 22/22 pass
  • Deploy to testnet and verify restart completes flush in under a second
  • Confirm Manual flush startManual flush finished in RocksDB LOG are within ms
  • Verify synced tip resumes advancing from ZMQ notifications immediately after restart

@EddieHouston EddieHouston self-assigned this Apr 23, 2026
@EddieHouston EddieHouston force-pushed the fix/restart-flush-stall branch from d102699 to bef02e3 Compare April 23, 2026 12:10
Comment thread src/new_index/schema.rs
Comment thread src/new_index/db.rs Outdated
Comment thread src/new_index/db.rs Outdated
  enable_auto_compaction() was lowering level0_stop_writes_trigger from the
  bulk-load value (512) to the RocksDB default (36) on every update() call.
  At DB open the bulk-load triggers are applied by DB::open, so on any restart
  L0 can legitimately hold more than 36 files. When the first post-restart
  update() called enable_auto_compaction(), the trigger tightening instantly
  put the DB into pre-flush stall territory, and the end-of-batch
  db.flush() that follows parked inside WaitUntilFlushWouldNotStallWrites
  waiting for background compaction to bring L0 below 36.

  On production testnet this reliably cost 77 minutes of indexer freeze per
  restart (verified by 'Manual flush start' → 'Manual flush finished' in the
  RocksDB LOG). The actual memtable flush took 62 ms once unblocked; the rest
  was wait.

  Split enable_auto_compaction() into the minimal flag-flip and a new
  apply_steady_state_triggers() that holds the L0 trigger / pending-bytes-
  limit reset. Invoke the latter exactly once per DB lifetime, inside the
  F-sentinel gate in start_auto_compactions(), immediately after
  full_compaction() has drained L0. On DBs where F is already set (steady-
  state restart), triggers stay at bulk-load values — the comment in
  DB::open already argues that configuration is fine for steady-state reads
  given the prefix bloom filters.
  Open the DB with RocksDB-default L0 triggers (4/20/36) and only widen
  to bulk-load values (64/256/512) when the full-compaction sentinel 'F'
  is absent. Previously every restart used bulk-load triggers permanently,
  leaving read amplification unnecessarily high after initial sync.

  Add info-level logging to DB::open and start_auto_compactions so
  operators can see which trigger profile is active and whether full
  compaction runs or is skipped.
@EddieHouston EddieHouston force-pushed the fix/restart-flush-stall branch from 1ebdfd6 to 5e80660 Compare April 28, 2026 12:50
Comment thread src/new_index/schema.rs
fn start_auto_compactions(&self, db: &DB) {
let key = b"F".to_vec();
if db.get(&key).is_none() {
info!("full-compaction sentinel 'F' not found — running one-time full compaction and tightening triggers");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split this log message across the two following statements, as they are separate... full compaction happens then the triggers are tightening. there may be a significant time between them.

Comment thread src/new_index/schema.rs
db.apply_steady_state_triggers();
db.put_sync(&key, b"");
assert!(db.get(&key).is_some());
info!("full-compaction sentinel 'F' set — future restarts will skip full compaction");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the log message should just say 'full compaction sentinel F set'. whether full compaction happens or not is outside the scope of this block

Comment thread src/new_index/db.rs

fn apply_bulk_load_triggers(&self) {
let opts = [
("level0_file_num_compaction_trigger", "64"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've replaced the L0_BULK_TRIGGER with magic numbers, losing some context

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating

@philippem
Copy link
Copy Markdown
Collaborator

the description is misleading because update() is called once on initial startup, not in the main loop. The steady state compaction parameters are applied once regardless of the number of L0 files that happen to be on disk.

The PR code no longer applies the steady state options that we had previously, which means we end up with defaults after a restart, not the steady state. They are written to an options file on disk but it needs to be explicitly loaded and reapplied.

https://github.com/facebook/rocksdb/wiki/RocksDB-Options-File

Are you able to repro L0 accumulation locally? I attempted by repeatedly killing electrs after startup but L0 did not increase (it was loading a full bitcoin mainnet).

@EddieHouston
Copy link
Copy Markdown
Collaborator Author

EddieHouston commented May 5, 2026

the description is misleading because update() is called once on initial startup, not in the main loop. The steady state compaction parameters are applied once regardless of the number of L0 files that happen to be on disk.

Updating description.

The PR code no longer applies the steady state options that we had previously, which means we end up with defaults after a restart, not the steady state. They are written to an options file on disk but it needs to be explicitly loaded and reapplied.

The DB always opens with Options::default() (not from the OPTIONS file...RocksDB still writes an OPTIONS file to disk automatically (it does that on every open/set_options call), but electrs never reads it back. Every startup builds options from scratch in code).

So the steady-state values (4/20/36, 64 GiB/256 GiB) are already in effect on every startup. The apply_steady_state_triggers() only needs to run after apply_bulk_load_triggers() within the same process.

https://github.com/facebook/rocksdb/wiki/RocksDB-Options-File

Are you able to repro L0 accumulation locally? I attempted by repeatedly killing electrs after startup but L0 did not increase (it was loading a full bitcoin mainnet).

TBD, looking into how to create a bunch of L0 files locally to reproduce this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants