From 86804192e5c32d0759af11bae0f052779e31b106 Mon Sep 17 00:00:00 2001 From: grunch Date: Thu, 28 May 2026 13:01:11 -0300 Subject: [PATCH 1/3] fix(pow): surface explicit PoW rejection instead of generic timeout When mostrod requires NIP-13 PoW and the client sends an event without satisfying it, the daemon silently drops the event (logs "Not POW verified event!" on its side) and never replies. The CLI was hiding that behind a generic timeout message ("deadline has elapsed" / "Timeout waiting for DM or gift wrap event"), which can mislead users into suspecting network failures, relay issues, or daemon downtime. After timing out, wait_for_dm now consults the daemon's kind-38385 info event ("pow" tag) and, if the required difficulty exceeds the configured POW, returns the new PowRequirementUnmet error with an actionable message pointing at --pow / POW=. Genuine timeouts keep their existing WaitForDmTimeout error, so the add-bond-invoice happy-path branch (which only matches WaitForDmTimeout) is unaffected. Closes #172 Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/pow_error_handling.md | 246 +++++++++++++++++++++++++++++++++++++ src/util/events.rs | 120 +++++++++++++++--- src/util/messaging.rs | 82 ++++++++++++- src/util/mod.rs | 5 +- 4 files changed, 433 insertions(+), 20 deletions(-) create mode 100644 docs/pow_error_handling.md diff --git a/docs/pow_error_handling.md b/docs/pow_error_handling.md new file mode 100644 index 0000000..b67e955 --- /dev/null +++ b/docs/pow_error_handling.md @@ -0,0 +1,246 @@ +# Spec — Surface explicit PoW rejection error + +Tracking issue: [MostroP2P/mostro-cli#172](https://github.com/MostroP2P/mostro-cli/issues/172) + +## 1. Problem + +When a user sends an event from `mostro-cli` to a `mostrod` instance that +**requires NIP‑13 Proof‑of‑Work** without providing the required PoW, the +daemon silently drops the event (logs `Not POW verified event!` on its side) +and never replies. The CLI eventually times out and surfaces a generic +timeout error to the user: + +```text +deadline has elapsed +``` + +(or, in current `main`, the slightly less awful but still ambiguous +`Timeout waiting for DM or gift wrap event`). + +This is misleading. From the user perspective the message can mean almost +anything: the relay is slow, mostrod is down, the network is broken, the CLI +is buggy, or the command was wrong. The real cause — PoW not satisfied — is +hidden. + +## 2. Root cause + +The flow that breaks today: + +1. `mostro-cli` mines PoW on the outer GiftWrap based on the `POW` env var + (default `0`) — see `src/util/messaging.rs::parse_pow_env` and the + `WrapOptions { pow, .. }` plumbing. +2. The wrapped event is published to relays via `client.send_event(...)`. +3. `wait_for_dm` opens a subscription on the trade key and waits for + `FETCH_EVENTS_TIMEOUT` (15 s) for an inbound GiftWrap. +4. If `mostrod` requires PoW above what the client provided, mostrod + silently drops the event in + [`mostro/src/app.rs`](https://github.com/MostroP2P/mostro) — the relay + accepted it (so there's no NIP‑01 `OK false` from the relay either) and + no reply is ever produced. +5. `tokio::time::timeout` elapses → `wait_for_dm` returns `WaitForDmTimeout` + → the user sees the generic timeout message. + +The CLI has **no way to distinguish** "daemon silently dropped my event for +PoW reasons" from "transient relay or network issue". + +## 3. Signal we can use + +`mostrod` already publishes its required PoW in its **kind‑38385 info event** +under the tag `["pow", ""]`. See +[`mostro/src/nip33.rs`](https://github.com/MostroP2P/mostro) (`new_info_event`) +where the daemon writes: + +```rust +Tag::custom( + TagKind::Custom(Cow::Borrowed("pow")), + vec![mostro_settings.pow.to_string()], +), +``` + +Reading that tag on the CLI side lets us answer the only question we care +about: *did this Mostro instance reject our event because we didn't provide +enough PoW?* + +We already fetch kind‑38385 elsewhere in the CLI — see +`src/util/events.rs::fetch_bond_claim_window_days` — so this is a small +extension of an existing pattern. + +## 4. Design + +### 4.1 Error variant + +Add a typed error so callers (and tests) can match the PoW failure mode +distinctly from a generic timeout: + +```rust +#[derive(Debug)] +pub struct PowRequirementUnmet { + pub required: u8, + pub configured: u8, +} +``` + +The user-facing `Display` will be explicit: + +```text +This Mostro instance requires NIP-13 proof of work of N bits, but the +client sent the event with M bits. Re-run with `--pow N` or set +`POW=N` and try again. +``` + +The error lives next to `WaitForDmTimeout` in `src/util/messaging.rs` (same +module, same idiom: small zero/struct error wrapped via `anyhow`). + +### 4.2 Helper — fetch required PoW from kind‑38385 + +Add a helper in `src/util/events.rs`: + +```rust +/// Best-effort: fetch the Mostro instance's kind-38385 info event and read +/// the `pow` tag. Returns `None` when no info event is available or the tag +/// is missing/unparseable. Mirrors `fetch_bond_claim_window_days`. +pub async fn fetch_required_pow(ctx: &crate::cli::Context) -> Option; +``` + +Implementation details: + +- Author filter on `ctx.mostro_pubkey`, kind `NOSTR_INFO_EVENT_KIND`. +- Pick the **newest** revision by `created_at` (replaceable but lagging + relays can still serve old copies; same caveat as the bond-window helper). +- Scan tags for `["pow", ""]`, parse as `u8`. +- Any error (fetch failure, missing tag, unparseable value) → `None`. + +Returning `None` is the "I don't know" signal; the caller treats it as +*"can't blame PoW"* and falls back to the generic timeout error. This keeps +the helper non-fatal: older daemons that don't publish the tag, or unreachable +relays, never break flows that worked before. + +### 4.3 Where to plug it in — `wait_for_dm` + +`wait_for_dm` is the single chokepoint that every request/reply flow goes +through (`add_invoice`, `take_order`, `take_dispute`, `send_msg`, `new_order`, +`rate_user`, `orders_info`, `restore`, `last_trade_index`, `add_bond_invoice`). +Centralizing the fix here covers every command in one place. + +Postflight check (chosen — see Alternatives below): + +```rust +let event = match waited { + Ok(inner) => inner?, + Err(_elapsed) => { + // Before declaring this a generic timeout, check whether the daemon + // advertises a PoW requirement we didn't meet — that's the real + // cause "deadline has elapsed" was hiding. + if let Some(required) = fetch_required_pow(ctx).await { + let configured = parse_pow_env().unwrap_or(0); + if required > configured { + return Err(PowRequirementUnmet { required, configured }.into()); + } + } + return Err(WaitForDmTimeout.into()); + } +}; +``` + +Add an `&Context` parameter? Look at the signature today — +`wait_for_dm(ctx, order_trade_keys, sent_message)` — `ctx` is already +passed. We just need to grant the helper access to `ctx.client` and +`ctx.mostro_pubkey` (already does). + +### 4.4 `add_bond_invoice` interplay + +`add_bond_invoice` treats `WaitForDmTimeout` as the happy path (Mostro pays +the invoice without acking over Nostr). The new variant must **not** be +caught by that branch — a PoW failure on `add-bond-invoice` is *not* a +successful submission. Concretely: + +- The existing `Err(e) if e.downcast_ref::().is_some()` + arm continues to match only `WaitForDmTimeout`. +- A `PowRequirementUnmet` falls through to the generic `Err(e) => return + Err(e)` arm and bubbles up to the user, which is the desired behavior. + +No code change needed in `add_bond_invoice.rs` — the existing pattern match +already handles the right split. We assert this with a regression note. + +## 5. Alternatives considered + +### 5.1 Preflight check (fail before sending) + +Run `fetch_required_pow` *before* `send_dm`. Advantages: fail fast (no +15‑second wait). Disadvantages: + +- Adds a relay roundtrip to the happy path for every command. +- Doubles "you didn't set PoW" errors for daemons that publish the tag but + also accept PoW=0 (no real check on that side today). +- Couples every command to the info event being reachable. + +→ Rejected. Postflight has the right cost model: zero overhead when things +work, and *only* pays for the info-event fetch in the failure path where the +user is already waiting anyway. + +### 5.2 Auto-mine the required PoW + +Bump `WrapOptions.pow` to `max(configured, required)` instead of erroring. +Tempting, but high PoW targets (e.g. 28+ bits) can take minutes to mine on +a laptop, and the user wouldn't see *why* the CLI is suddenly chewing CPU. +Better to fail explicitly with an actionable hint and let the user opt in +by setting `POW` themselves. + +→ Rejected. Defer to explicit user opt-in via `--pow` / `POW` env. + +### 5.3 Read NIP‑01 `OK false` from the relay + +Won't help: mostrod drops the event *after* the relay accepts it, so the +relay returns `OK true`. There is no rejection signal on the publish path. + +→ Rejected. + +## 6. Acceptance criteria + +(Reproducing the issue's checklist, with concrete bindings.) + +- Sending an event from `mostro-cli` to a PoW‑enabled `mostrod` without + satisfying PoW must surface an error that mentions PoW. Specifically the + `PowRequirementUnmet { required, configured }` variant rendered as: + `"This Mostro instance requires NIP-13 proof of work of N bits, but the + client sent the event with M bits. Re-run with --pow N or set POW=N and + try again."` +- Genuine timeouts (daemon reachable, no PoW requirement, just slow / no + reply) keep returning `WaitForDmTimeout` — its existing message stays. +- `add_bond_invoice`'s "timeout is the happy path" arm only matches + `WaitForDmTimeout`; a `PowRequirementUnmet` propagates and is shown to + the user. +- Older daemons that don't publish `["pow", ...]` in kind‑38385 behave + exactly as before (helper returns `None` → generic timeout). + +## 7. Test plan + +Unit tests in `src/util/messaging.rs` and `src/util/events.rs`: + +- `pow_requirement_unmet_display_mentions_required_and_configured` — + formatting check on the error message. +- `fetch_required_pow_returns_none_when_tag_missing` — synthesize a + kind-38385 event with no `pow` tag, assert `None`. +- `fetch_required_pow_picks_newest_revision` — two events, newer with a + different pow value; helper returns the newer value. Mirrors the existing + `fetch_bond_claim_window_days` pattern. +- `fetch_required_pow_parses_pow_tag` — single event with `["pow", "12"]`, + assert `Some(12)`. + +A manual end-to-end check (out of the unit-test scope) is to run against a +PoW‑enabled local `mostrod`: + +```sh +POW=0 mostro-cli neworder ... +# expected: PowRequirementUnmet error, not "deadline has elapsed" +POW= mostro-cli neworder ... +# expected: normal flow, no error +``` + +## 8. Out of scope + +- Auto-mining the missing PoW (see 5.2). +- Removing or restructuring the `POW` env var / `--pow` flag. +- Surfacing the PoW requirement during `list-orders` / informational + flows. The fix is scoped to the request/reply path where the missing + PoW silently kills the flow. diff --git a/src/util/events.rs b/src/util/events.rs index 63bad26..717dd26 100644 --- a/src/util/events.rs +++ b/src/util/events.rs @@ -83,15 +83,26 @@ pub fn create_filter( } } -/// Fetch the Mostro instance's kind-38385 info event and read the -/// `bond_payout_claim_window_days` tag. +/// Fetch the Mostro instance's kind-38385 info event and read a tag whose +/// first slot equals `tag_name`. Returns the second slot as a string. /// -/// Returns `None` when the node publishes no info event, the tag is absent -/// (older daemon or bonds disabled), or the value can't be parsed. Used to -/// render the forfeit deadline for an `add-bond-invoice` request locally, per -/// the protocol's "Bond payout invoice" / "Other events" docs. Best-effort: -/// any relay error degrades to `None` rather than failing the caller. -pub async fn fetch_bond_claim_window_days(ctx: &crate::cli::Context) -> Option { +/// Best-effort: any relay error or missing tag degrades to `None` so callers +/// keep working against older daemons that don't publish the tag. +/// Pure: read the second slot of the first tag whose first slot equals +/// `tag_name`. Split out so the kind-38385 tag-parsing logic stays unit +/// testable without spinning up a relay. +fn read_info_tag_from_event(event: &nostr_sdk::Event, tag_name: &str) -> Option { + event.tags.iter().find_map(|tag| { + let slice = tag.as_slice(); + if slice.first().map(String::as_str) == Some(tag_name) { + slice.get(1).cloned() + } else { + None + } + }) +} + +async fn fetch_info_tag(ctx: &crate::cli::Context, tag_name: &str) -> Option { let filter = Filter::new() .author(ctx.mostro_pubkey) .kind(nostr_sdk::Kind::Custom(NOSTR_INFO_EVENT_KIND)); @@ -104,16 +115,38 @@ pub async fn fetch_bond_claim_window_days(ctx: &crate::cli::Context) -> Option().ok()); - } - } - None + read_info_tag_from_event(event, tag_name) +} + +/// Fetch the Mostro instance's kind-38385 info event and read the +/// `bond_payout_claim_window_days` tag. +/// +/// Returns `None` when the node publishes no info event, the tag is absent +/// (older daemon or bonds disabled), or the value can't be parsed. Used to +/// render the forfeit deadline for an `add-bond-invoice` request locally, per +/// the protocol's "Bond payout invoice" / "Other events" docs. Best-effort: +/// any relay error degrades to `None` rather than failing the caller. +pub async fn fetch_bond_claim_window_days(ctx: &crate::cli::Context) -> Option { + // A stale claim window would render the wrong, very user-facing forfeit + // deadline — same newest-revision caveat as the rest of the info event. + fetch_info_tag(ctx, "bond_payout_claim_window_days") + .await + .and_then(|v| v.parse::().ok()) +} + +/// Fetch the Mostro instance's required NIP-13 proof-of-work difficulty from +/// the kind-38385 info event (`["pow", ""]` tag). +/// +/// Returns `None` when the daemon doesn't publish the tag (older versions), +/// when the value is unparseable, or when the info event can't be fetched. +/// Used by [`crate::util::messaging::wait_for_dm`] to distinguish a real +/// timeout from a silent PoW rejection — see `docs/pow_error_handling.md`. +pub async fn fetch_required_pow(ctx: &crate::cli::Context) -> Option { + fetch_info_tag(ctx, "pow") + .await + .and_then(|v| v.parse::().ok()) } #[allow(clippy::too_many_arguments)] @@ -206,3 +239,56 @@ pub async fn fetch_events_list( } } } + +#[cfg(test)] +mod tests { + use super::*; + + async fn make_info_event(keys: &Keys, tags: Vec) -> nostr_sdk::Event { + EventBuilder::new(nostr_sdk::Kind::Custom(NOSTR_INFO_EVENT_KIND), "") + .tags(tags) + .sign(keys) + .await + .expect("sign info event") + } + + fn pow_tag(value: &str) -> Tag { + Tag::parse(["pow", value]).expect("build pow tag") + } + + #[tokio::test] + async fn read_info_tag_finds_pow_value() { + let keys = Keys::generate(); + let event = make_info_event( + &keys, + vec![ + Tag::parse(["fee", "0.006"]).unwrap(), + pow_tag("12"), + Tag::parse(["fiat_currencies_accepted", "USD,EUR"]).unwrap(), + ], + ) + .await; + assert_eq!(read_info_tag_from_event(&event, "pow"), Some("12".into())); + } + + #[tokio::test] + async fn read_info_tag_returns_none_when_missing() { + let keys = Keys::generate(); + let event = make_info_event(&keys, vec![Tag::parse(["fee", "0.006"]).unwrap()]).await; + assert_eq!(read_info_tag_from_event(&event, "pow"), None); + } + + #[tokio::test] + async fn pow_tag_parses_as_u8() { + // u8 parse is what fetch_required_pow chains after the helper. + // Lock in that the daemon's stringified u8 round-trips cleanly, + // and that garbage values degrade to None. + let parse = |s: &str| s.parse::().ok(); + assert_eq!(parse("12"), Some(12)); + assert_eq!(parse("0"), Some(0)); + assert_eq!(parse("nope"), None); + // Out of range for u8 → None, which is the right "ignore this + // weird value, fall back to generic timeout" behavior. + assert_eq!(parse("999"), None); + } +} diff --git a/src/util/messaging.rs b/src/util/messaging.rs index f55f645..d7cb5c4 100644 --- a/src/util/messaging.rs +++ b/src/util/messaging.rs @@ -275,6 +275,33 @@ impl std::fmt::Display for WaitForDmTimeout { impl std::error::Error for WaitForDmTimeout {} +/// Returned by [`wait_for_dm`] when the wait timed out *and* the Mostro +/// instance's kind-38385 info event advertises a NIP-13 PoW difficulty +/// strictly above what the client sent. Lets callers (and tests) +/// distinguish a silent PoW rejection — the daemon dropped the event after +/// the relay accepted it — from a generic timeout. +/// +/// See `docs/pow_error_handling.md` for the design rationale. +#[derive(Debug)] +pub struct PowRequirementUnmet { + pub required: u8, + pub configured: u8, +} + +impl std::fmt::Display for PowRequirementUnmet { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + write!( + f, + "This Mostro instance requires NIP-13 proof of work of {} bits, \ + but the client sent the event with {} bits. \ + Re-run with `--pow {}` or set `POW={}` and try again.", + self.required, self.configured, self.required, self.required + ) + } +} + +impl std::error::Error for PowRequirementUnmet {} + pub async fn wait_for_dm( ctx: &crate::cli::Context, order_trade_keys: Option<&Keys>, @@ -314,7 +341,24 @@ where // a notification-channel error so callers can treat them differently. let event = match waited { Ok(inner) => inner?, - Err(_elapsed) => return Err(WaitForDmTimeout.into()), + Err(_elapsed) => { + // mostrod silently drops events whose outer GiftWrap doesn't meet + // its NIP-13 PoW requirement (relay accepts → daemon discards → + // no reply ever comes). Before declaring this a generic timeout, + // ask the daemon's kind-38385 info event whether that's actually + // the cause we're hiding behind "no reply". + if let Some(required) = super::events::fetch_required_pow(ctx).await { + let configured = parse_pow_env().unwrap_or(0); + if required > configured { + return Err(PowRequirementUnmet { + required, + configured, + } + .into()); + } + } + return Err(WaitForDmTimeout.into()); + } }; let mut events = Events::default(); @@ -441,6 +485,42 @@ pub async fn print_dm_events( mod tests { use super::*; + #[test] + fn pow_requirement_unmet_display_mentions_required_and_configured() { + let rendered = PowRequirementUnmet { + required: 12, + configured: 4, + } + .to_string(); + assert!( + rendered.contains("12") && rendered.contains("4"), + "message should surface both PoW values, got: {rendered}", + ); + assert!( + rendered.contains("proof of work"), + "message should call out PoW explicitly, got: {rendered}", + ); + assert!( + rendered.contains("--pow") && rendered.contains("POW="), + "message should hint at both the CLI flag and env var, got: {rendered}", + ); + } + + #[test] + fn pow_requirement_unmet_is_downcastable_through_anyhow() { + // The add-bond-invoice flow only catches WaitForDmTimeout via + // downcast_ref::(). Verify our new error stays + // distinct so PoW failures don't get misreported as a successful + // bond-invoice submission. + let err: anyhow::Error = PowRequirementUnmet { + required: 8, + configured: 0, + } + .into(); + assert!(err.downcast_ref::().is_some()); + assert!(err.downcast_ref::().is_none()); + } + fn leading_zero_bits_in_hex(hex: &str) -> u32 { let mut bits = 0_u32; for ch in hex.chars() { diff --git a/src/util/mod.rs b/src/util/mod.rs index b4b0bc6..7d36c8b 100644 --- a/src/util/mod.rs +++ b/src/util/mod.rs @@ -7,12 +7,13 @@ pub mod types; // Re-export commonly used items to preserve existing import paths pub use events::{ - create_filter, fetch_bond_claim_window_days, fetch_events_list, FETCH_EVENTS_TIMEOUT, + create_filter, fetch_bond_claim_window_days, fetch_events_list, fetch_required_pow, + FETCH_EVENTS_TIMEOUT, }; pub use messaging::{ derive_shared_key_hex, derive_shared_keys, keys_from_shared_hex, print_dm_events, send_admin_chat_message_via_shared_key, send_dm, send_plain_text_dm, wait_for_dm, - WaitForDmTimeout, + PowRequirementUnmet, WaitForDmTimeout, }; pub use misc::{get_mcli_path, uppercase_first}; pub use net::connect_nostr; From 72ced796105afd409828a3e495607693c7250111 Mon Sep 17 00:00:00 2001 From: grunch Date: Thu, 28 May 2026 13:17:32 -0300 Subject: [PATCH 2/3] fix(pow): bound the post-timeout PoW probe to avoid doubling latency MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit fetch_required_pow uses FETCH_EVENTS_TIMEOUT (15s) internally, so a slow/unreachable relay would let wait_for_dm wait 15s for the reply plus another 15s for the probe before producing an error. Wrap the probe in a short tokio::time::timeout (3s) and fall through to WaitForDmTimeout if it doesn't return in time — Ok(Some(required)) is the only case that escalates to PowRequirementUnmet. Addresses review feedback on #173. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/pow_error_handling.md | 12 ++++++++++-- src/util/messaging.rs | 16 +++++++++++++++- 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/docs/pow_error_handling.md b/docs/pow_error_handling.md index b67e955..3906472 100644 --- a/docs/pow_error_handling.md +++ b/docs/pow_error_handling.md @@ -130,8 +130,12 @@ let event = match waited { Err(_elapsed) => { // Before declaring this a generic timeout, check whether the daemon // advertises a PoW requirement we didn't meet — that's the real - // cause "deadline has elapsed" was hiding. - if let Some(required) = fetch_required_pow(ctx).await { + // cause "deadline has elapsed" was hiding. Bounded by + // POW_PROBE_TIMEOUT so a slow/unreachable relay can't double the + // user-visible wait; if the probe doesn't return in time we fall + // through to the generic timeout error instead of hanging. + let probe = tokio::time::timeout(POW_PROBE_TIMEOUT, fetch_required_pow(ctx)).await; + if let Ok(Some(required)) = probe { let configured = parse_pow_env().unwrap_or(0); if required > configured { return Err(PowRequirementUnmet { required, configured }.into()); @@ -142,6 +146,10 @@ let event = match waited { }; ``` +`POW_PROBE_TIMEOUT` is a small constant (currently 3 s) — well below +`FETCH_EVENTS_TIMEOUT` (15 s). Worst-case user-visible wait stays at one +`FETCH_EVENTS_TIMEOUT` plus the probe budget instead of doubling. + Add an `&Context` parameter? Look at the signature today — `wait_for_dm(ctx, order_trade_keys, sent_message)` — `ctx` is already passed. We just need to grant the helper access to `ctx.client` and diff --git a/src/util/messaging.rs b/src/util/messaging.rs index d7cb5c4..da3b241 100644 --- a/src/util/messaging.rs +++ b/src/util/messaging.rs @@ -256,6 +256,12 @@ pub async fn send_plain_text_dm( .await } +/// Upper bound on the post-timeout PoW probe inside [`wait_for_dm`]. Kept +/// well below `FETCH_EVENTS_TIMEOUT` so a slow/unreachable relay can't +/// double the user-visible wait — if the kind-38385 info event isn't back +/// within this budget we fall through to [`WaitForDmTimeout`] instead. +const POW_PROBE_TIMEOUT: std::time::Duration = std::time::Duration::from_secs(3); + /// Distinguishable error returned by [`wait_for_dm`] when no reply arrives /// within [`FETCH_EVENTS_TIMEOUT`]. /// @@ -347,7 +353,15 @@ where // no reply ever comes). Before declaring this a generic timeout, // ask the daemon's kind-38385 info event whether that's actually // the cause we're hiding behind "no reply". - if let Some(required) = super::events::fetch_required_pow(ctx).await { + // + // The probe is bounded by `POW_PROBE_TIMEOUT` instead of the full + // `FETCH_EVENTS_TIMEOUT` so a slow/unreachable relay can't double + // the user-visible wait. If the probe doesn't return in time, fall + // through to the generic timeout error rather than hanging. + let probe = + tokio::time::timeout(POW_PROBE_TIMEOUT, super::events::fetch_required_pow(ctx)) + .await; + if let Ok(Some(required)) = probe { let configured = parse_pow_env().unwrap_or(0); if required > configured { return Err(PowRequirementUnmet { From 2050955e919731722bbf50a0d70dc9b3a9413fa2 Mon Sep 17 00:00:00 2001 From: grunch Date: Thu, 28 May 2026 13:29:39 -0300 Subject: [PATCH 3/3] fix(pow): run PoW probe concurrently with the DM wait MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previous attempt (commit 72ced79) capped the postflight info-event lookup at 3s, but it still ran sequentially after the 15s DM wait timed out, so the user-visible failure path could take up to 18s when the relay was slow. Run the probe concurrently with the DM wait instead: by the time the wait elapses the probe's answer is typically already in hand, so the timeout branch consumes a resolved JoinHandle with ~0s added latency. POW_PROBE_TIMEOUT is kept as a safety net for pathological relays that outlive the 15s wait. The probe needs a 'static future for tokio::spawn, so the work moves into a new fetch_required_pow_with(client, mostro_pubkey) — the existing fetch_required_pow(ctx) becomes a thin wrapper around it. On the happy path (DM arrives in time) the spawned probe is aborted so we don't leak a stray relay request. Addresses review feedback on #173. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/pow_error_handling.md | 41 ++++++++++++++++++++++++++------------ src/util/events.rs | 19 ++++++++++++++++-- src/util/messaging.rs | 38 +++++++++++++++++++++++------------ 3 files changed, 70 insertions(+), 28 deletions(-) diff --git a/docs/pow_error_handling.md b/docs/pow_error_handling.md index 3906472..42bb453 100644 --- a/docs/pow_error_handling.md +++ b/docs/pow_error_handling.md @@ -122,20 +122,31 @@ through (`add_invoice`, `take_order`, `take_dispute`, `send_msg`, `new_order`, `rate_user`, `orders_info`, `restore`, `last_trade_index`, `add_bond_invoice`). Centralizing the fix here covers every command in one place. -Postflight check (chosen — see Alternatives below): +Concurrent probe (chosen — see Alternatives below): ```rust +// Kick off the PoW probe alongside the DM wait so its answer is in hand +// the moment the wait times out. The probe is cheap to start and cheap to +// cancel via JoinHandle::abort() on the happy path. +let pow_probe = tokio::spawn(fetch_required_pow_with( + ctx.client.clone(), + ctx.mostro_pubkey, +)); + +let waited = tokio::time::timeout(FETCH_EVENTS_TIMEOUT, /* notification loop */).await; + let event = match waited { - Ok(inner) => inner?, + Ok(inner) => { + pow_probe.abort(); + inner? + } Err(_elapsed) => { - // Before declaring this a generic timeout, check whether the daemon - // advertises a PoW requirement we didn't meet — that's the real - // cause "deadline has elapsed" was hiding. Bounded by - // POW_PROBE_TIMEOUT so a slow/unreachable relay can't double the - // user-visible wait; if the probe doesn't return in time we fall - // through to the generic timeout error instead of hanging. - let probe = tokio::time::timeout(POW_PROBE_TIMEOUT, fetch_required_pow(ctx)).await; - if let Ok(Some(required)) = probe { + // Probe has been running for FETCH_EVENTS_TIMEOUT alongside the + // wait; it should already be done. POW_PROBE_TIMEOUT is a safety + // net for pathological relays — if the answer isn't in by then, + // fall through to the generic timeout error. + let probe_result = tokio::time::timeout(POW_PROBE_TIMEOUT, pow_probe).await; + if let Ok(Ok(Some(required))) = probe_result { let configured = parse_pow_env().unwrap_or(0); if required > configured { return Err(PowRequirementUnmet { required, configured }.into()); @@ -146,9 +157,13 @@ let event = match waited { }; ``` -`POW_PROBE_TIMEOUT` is a small constant (currently 3 s) — well below -`FETCH_EVENTS_TIMEOUT` (15 s). Worst-case user-visible wait stays at one -`FETCH_EVENTS_TIMEOUT` plus the probe budget instead of doubling. +The probe lives in `events::fetch_required_pow_with(client, mostro_pubkey)` +— an owned-args sibling of `fetch_required_pow(ctx)`, used so the spawned +future is `'static`. The 3 s `POW_PROBE_TIMEOUT` is now a safety net rather +than the typical wait: in the common timeout case the probe is already +resolved when we look at it, so the user-visible wait stays at +`FETCH_EVENTS_TIMEOUT` (15 s) plus ~0 s, instead of doubling to 30 s as the +naive sequential version would. Add an `&Context` parameter? Look at the signature today — `wait_for_dm(ctx, order_trade_keys, sent_message)` — `ctx` is already diff --git a/src/util/events.rs b/src/util/events.rs index 717dd26..40435e4 100644 --- a/src/util/events.rs +++ b/src/util/events.rs @@ -144,9 +144,24 @@ pub async fn fetch_bond_claim_window_days(ctx: &crate::cli::Context) -> Option Option { - fetch_info_tag(ctx, "pow") + fetch_required_pow_with(ctx.client.clone(), ctx.mostro_pubkey).await +} + +/// Owned-args variant of [`fetch_required_pow`], suitable for `tokio::spawn`. +/// +/// `wait_for_dm` kicks the probe off concurrently with the DM wait so the +/// answer is already in hand by the time the wait times out (zero added +/// latency in the timeout path, instead of a sequential second fetch). +pub async fn fetch_required_pow_with(client: Client, mostro_pubkey: PublicKey) -> Option { + let filter = Filter::new() + .author(mostro_pubkey) + .kind(nostr_sdk::Kind::Custom(NOSTR_INFO_EVENT_KIND)); + let events = client + .fetch_events(filter, FETCH_EVENTS_TIMEOUT) .await - .and_then(|v| v.parse::().ok()) + .ok()?; + let event = events.iter().max_by_key(|e| e.created_at)?; + read_info_tag_from_event(event, "pow").and_then(|v| v.parse::().ok()) } #[allow(clippy::too_many_arguments)] diff --git a/src/util/messaging.rs b/src/util/messaging.rs index da3b241..f6936d2 100644 --- a/src/util/messaging.rs +++ b/src/util/messaging.rs @@ -329,6 +329,17 @@ where // Send message here after opening notifications to avoid missing messages. sent_message.await?; + // Kick off the PoW probe concurrently with the DM wait. By running the + // kind-38385 lookup alongside the 15s `FETCH_EVENTS_TIMEOUT` instead of + // *after* it, the timeout branch doesn't pay a second sequential + // `fetch_events` round-trip — by then the probe has typically already + // returned. `JoinHandle` lets us `abort()` the probe cheaply on the happy + // path (DM arrives in time) without leaking the task. + let pow_probe = tokio::spawn(super::events::fetch_required_pow_with( + ctx.client.clone(), + ctx.mostro_pubkey, + )); + // Wait for the DM or gift wrap event let waited = tokio::time::timeout(super::events::FETCH_EVENTS_TIMEOUT, async move { loop { @@ -346,22 +357,23 @@ where // Keep a genuine timeout (the only "no reply" outcome) distinguishable from // a notification-channel error so callers can treat them differently. let event = match waited { - Ok(inner) => inner?, + Ok(inner) => { + // Happy path: DM arrived. Cancel the probe; the answer is no + // longer needed and we don't want a stray relay request lingering. + pow_probe.abort(); + inner? + } Err(_elapsed) => { // mostrod silently drops events whose outer GiftWrap doesn't meet // its NIP-13 PoW requirement (relay accepts → daemon discards → - // no reply ever comes). Before declaring this a generic timeout, - // ask the daemon's kind-38385 info event whether that's actually - // the cause we're hiding behind "no reply". - // - // The probe is bounded by `POW_PROBE_TIMEOUT` instead of the full - // `FETCH_EVENTS_TIMEOUT` so a slow/unreachable relay can't double - // the user-visible wait. If the probe doesn't return in time, fall - // through to the generic timeout error rather than hanging. - let probe = - tokio::time::timeout(POW_PROBE_TIMEOUT, super::events::fetch_required_pow(ctx)) - .await; - if let Ok(Some(required)) = probe { + // no reply ever comes). The probe has already been running for + // `FETCH_EVENTS_TIMEOUT` alongside the wait, so it is almost + // certainly done. Cap the await with `POW_PROBE_TIMEOUT` as a + // safety net so a pathological relay can't keep us hanging — if + // the probe isn't back by then, fall through to the generic + // timeout error instead of waiting any longer. + let probe_result = tokio::time::timeout(POW_PROBE_TIMEOUT, pow_probe).await; + if let Ok(Ok(Some(required))) = probe_result { let configured = parse_pow_env().unwrap_or(0); if required > configured { return Err(PowRequirementUnmet {