-
Notifications
You must be signed in to change notification settings - Fork 123
Use DeferredChainMonitor for non-VSS storage backends #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Use DeferredChainMonitor for non-VSS storage backends #782
Conversation
This change uses LDK's DeferredChainMonitor for local storage backends (SQLite, filesystem) instead of the regular ChainMonitor. The deferred variant queues watch_channel and update_channel operations for later flushing, enabling safe persistence ordering where the ChannelManager is persisted before the channel monitors. This ensures crash safety. VSS storage backends continue to use the regular ChainMonitor since VSS handles its own persistence ordering. The implementation: - Adds ChainMonitor enum that wraps both Regular and Deferred variants - Implements all required traits (Watch, Listen, Confirm, AChainMonitor, BaseMessageHandler, SendOnlyMessageHandler, EventsProvider) for the enum - Adds use_deferred_chain_monitor parameter to build_with_store_internal - Updates VSS build methods to use regular ChainMonitor (false) - Updates non-VSS build methods to use DeferredChainMonitor (true) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
👋 Hi! I see this is a draft PR. |
Hmm, that's unfortunate. I imagine especially mobile and VSS-driven nodes would benefit the most from any change improving on the CM/CM inconsistency situation? |
|
Yea, I think its an open question what we should do - on the one hand nodes with remote persistence are going to be the most impacted by the increase in sending latency (which is probably something where we're currently in an unacceptably-bad state, given how single-threaded some of LDK's logic is around the BP!). OTOH, they are also somewhat more likely to hit the FC-due-to-out-of-sync issues because they have high latency persistence. I've mentioned to Joost but another option we have is to do the chanman and monitor writes at the same time but spawn them in-order, which will at least give us likely protection. We should maybe discuss live which option we want to go with. In any case, since this is now using the async pipeline for monitor persistence anyway, we should probably switch to actual async persistence for monitors at the same time. |
|
Parallel writes started in order still doesn't fully close the gap though. We'd remain in "mostly works" territory where the race window is smaller but not eliminated. As discussed offline, for high-latency backends, an option to avoid unnecessary round trips is batched writes. Doesn't need to be atomic (which would require all KV stores to support transactions), just ordered: write chanman first, then monitors, but send them together. This would fix the FC problem without being unnecessarily slow for remote storage. The downside is extending the KVStore interface with a batch write method, but we could provide a blanket implementation for existing KV stores that just iterates through the writes sequentially. For VSS specifically we'd implement actual batch sending to get the latency benefit. |
|
Illustration of the code changes for batch writes: lightningdevkit/rust-lightning#4379 |
PoC branch to evaluate integrating DeferredChainMonitor (lightningdevkit/rust-lightning#4345) into ldk-node.
For local storage backends (SQLite, filesystem), this uses DeferredChainMonitor which defers monitor operations until explicitly flushed.
VSS continues to use the regular ChainMonitor. While this isn't safe against force-closes, it avoids introducing potentially high-latency channel manager writes into the critical path. Currently this provides no practical benefit since the background processor loop isn't sufficiently parallelized. Payment latency wouldn't actually increase if we'd also use deferred writing for VSS. This is primarily a forward-looking optimization for when that parallelization is addressed.