Add more metrics for snapshot and state sync by yzang2019 · Pull Request #2879 · sei-protocol/sei-chain

yzang2019 · 2026-02-12T16:39:47Z

Describe your changes and provide context

This PR is adding more visibility around MemIAVL snapshot creation + replay + pruning, as well as state sync snapshot creation process.

With these metrics, we should have better visibility to correlate some timing for performance changes in relate to the snapshot behavior

Testing performed to validate your change

Tested locally and verified the metrics works

* main: chore: remove wasm dir on unsafe-reset (#2875) fix: respect existing genesis file (#2868) fix to halt due to reconstructing block from bad proposal (backported #2823) (#2873) chore(refactor): drop unused code (#2811) made the peer dialing less aggressive (backported #2799) (#2872) perf(store): lazy-init `sortedCache` in `cachekv.Store` (#2804) feat: embed genesis for well-known chains (#2835) fix: use MADV_RANDOM during loadtree (#2857)

github-actions · 2026-02-12T16:49:52Z

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`✅ passed`	Feb 27, 2026, 8:25 AM

masih · 2026-02-12T16:49:03Z

sei-db/state_db/sc/memiavl/db.go

 			db.logger.Error("failed to prune snapshot", "err", err)
+		} else {
+			db.logger.Info("successfully pruned snapshot", "name", name)
+			otelMetrics.SnapshotPruneCount.Add(context.Background(), 1)


We probably want to measure the failure rate too right?

In which case, you can use the same metric and tag by status?

yzang2019 · 2026-02-12T16:55:02Z

sei-db/state_db/sc/memiavl/snapshot.go


 // writeLeaf sends leaf and KV write operations to the pipeline
 func (w *snapshotWriter) writeLeaf(version uint32, key, value, hash []byte) error {
-	// Track channel fill metrics for all channels


Removing these since it seems they are not being used

codecov · 2026-02-12T16:56:32Z

Codecov Report

❌ Patch coverage is 73.07692% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.43%. Comparing base (f748419) to head (c3601a6).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
sei-db/state_db/sc/memiavl/db.go	75.00%	4 Missing and 2 partials ⚠️
sei-cosmos/storev2/rootmulti/store.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #2879       +/-   ##
===========================================
+ Coverage   58.13%   68.43%   +10.29%     
===========================================
  Files        2111       24     -2087     
  Lines      173463     3817   -169646     
===========================================
- Hits       100847     2612    -98235     
+ Misses      63665      921    -62744     
+ Partials     8951      284     -8667

Flag	Coverage Δ
sei-chain	`68.27% <73.07%> (+10.17%)`	⬆️
sei-db	`69.50% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
sei-db/state_db/sc/memiavl/metrics.go	`50.00% <ø> (ø)`
sei-db/state_db/sc/memiavl/multitree.go	`79.22% <100.00%> (+0.06%)`	⬆️
sei-db/state_db/sc/memiavl/snapshot.go	`59.37% <ø> (-0.93%)`	⬇️
sei-cosmos/storev2/rootmulti/store.go	`46.11% <0.00%> (-0.45%)`	⬇️
sei-db/state_db/sc/memiavl/db.go	`66.06% <75.00%> (-0.04%)`	⬇️

... and 2087 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sei-db/state_db/sc/memiavl/db.go

 		}

 		// catchup the remaining entries in rlog
+		startTime := time.Now()


sei-db/state_db/sc/memiavl/db.go

-		cloned.logger.Info("snapshot rewrite process completed", "duration_sec", totalElapsed, "duration_min", totalElapsed/60)
-		otelMetrics.SnapshotCreationLatency.Record(
+		totalRewriteElapsed := time.Since(startTime).Seconds()
+		cloned.logger.Info("snapshot rewrite process completed", "duration_sec", totalRewriteElapsed, "duration_min", totalRewriteElapsed/60)


blindchaser · 2026-02-13T15:41:53Z

sei-db/state_db/sc/memiavl/metrics.go

 		)),
+		SnapshotRewriteCount: must(meter.Int64Counter(
+			"memiavl_snapshot_rewrite_count",
+			metric.WithDescription("Total num of times memiavl snapshot rewrite attempts"),


nit: grammar seems a little weird, how about: "Total number of memiavl snapshot rewrite attempts"

Yup, that looks better!

* main: (66 commits) feat(flatkv): include legacyDB in ApplyChangeSets, LtHash, and read path (#2978) Deflake mempool tests with Eventually-based block waits (#2983) Demote noisy gasless classification log to debug level (#2982) Harden `TestStateLock_NoPOL` against proposal/timeout race (#2980) added a config parameter to limit outbound p2p connections. (#2974) merged unconditional and persistent peers status (#2977) Fix race between file pruning and in-flight parquet queries (#2975) fix(giga): don't migrate balance on failed txs (#2961) Fix hanging upgrade tests by adding timeouts to wait_for_height (#2976) Add snapshot import for Giga Live State (#2970) Fix Rocksdb MVCC read timestamp lifetime for iterators (#2971) Reduce exposed tendermint RPC endpoint (#2968) Deflake `TestStateLock_NoPOL` by widening propose timeout in test (#2969) go bench read + write receipts/logs for parquet vs pebble (#2794) [giga] clear up cache after Write (#2827) fix: use correct EVM storage key prefix in benchmark key generation (#2966) Harden staking precompile test against CI flakiness (#2967) Don't sync flatKV DBs when committing (#2964) Fix flaky `TestStateLock_POLSafety1` (#2962) Add metrics for historical proof success/failure rate (#2958) ...

This PR is adding more visibility around MemIAVL snapshot creation + replay + pruning, as well as state sync snapshot creation process. With these metrics, we should have better visibility to correlate some timing for performance changes in relate to the snapshot behavior Tested locally and verified the metrics works

yzang2019 added 2 commits February 12, 2026 00:24

Add one more metric for state sync

f4412bf

Add more metrics for snapshot creation

b26dca2

yzang2019 requested review from Kbhat1, blindchaser and masih February 12, 2026 16:39

yzang2019 added 3 commits February 12, 2026 08:41

Add prune latency metrics

d629834

Add pruning count metrics

3a9ac25

masih approved these changes Feb 12, 2026

View reviewed changes

yzang2019 added the non-app-hash-breaking label Feb 12, 2026

yzang2019 commented Feb 12, 2026

View reviewed changes

github-advanced-security bot found potential problems Feb 12, 2026

View reviewed changes

sei-db/state_db/sc/memiavl/db.go Fixed Show fixed Hide fixed

Address comments

c52b76a

github-advanced-security bot found potential problems Feb 12, 2026

View reviewed changes

blindchaser reviewed Feb 13, 2026

View reviewed changes

blindchaser approved these changes Feb 13, 2026

View reviewed changes

yzang2019 added 2 commits February 13, 2026 11:02

Address comments

ab38167

yzang2019 enabled auto-merge (squash) February 27, 2026 08:24

yzang2019 merged commit 5ad8d47 into main Feb 27, 2026
38 checks passed

yzang2019 deleted the yang/add-metrics-statesync branch February 27, 2026 08:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more metrics for snapshot and state sync #2879

Add more metrics for snapshot and state sync #2879
yzang2019 merged 8 commits intomainfrom
yang/add-metrics-statesync

yzang2019 commented Feb 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

masih Feb 12, 2026

Uh oh!

yzang2019 Feb 12, 2026

Uh oh!

yzang2019 Feb 12, 2026

Uh oh!

codecov bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Check warning

Check notice

blindchaser Feb 13, 2026

Uh oh!

yzang2019 Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yzang2019 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes and provide context

Testing performed to validate your change

Uh oh!

github-actions bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

masih Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

yzang2019 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

yzang2019 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Check warning

Check notice

blindchaser Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

yzang2019 Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yzang2019 commented Feb 12, 2026 •

edited

Loading

github-actions bot commented Feb 12, 2026 •

edited

Loading

codecov bot commented Feb 12, 2026 •

edited

Loading