diff --git a/docs/pow_error_handling.md b/docs/pow_error_handling.md new file mode 100644 index 0000000..42bb453 --- /dev/null +++ b/docs/pow_error_handling.md @@ -0,0 +1,269 @@ +# Spec — Surface explicit PoW rejection error + +Tracking issue: [MostroP2P/mostro-cli#172](https://github.com/MostroP2P/mostro-cli/issues/172) + +## 1. Problem + +When a user sends an event from `mostro-cli` to a `mostrod` instance that +**requires NIP‑13 Proof‑of‑Work** without providing the required PoW, the +daemon silently drops the event (logs `Not POW verified event!` on its side) +and never replies. The CLI eventually times out and surfaces a generic +timeout error to the user: + +```text +deadline has elapsed +``` + +(or, in current `main`, the slightly less awful but still ambiguous +`Timeout waiting for DM or gift wrap event`). + +This is misleading. From the user perspective the message can mean almost +anything: the relay is slow, mostrod is down, the network is broken, the CLI +is buggy, or the command was wrong. The real cause — PoW not satisfied — is +hidden. + +## 2. Root cause + +The flow that breaks today: + +1. `mostro-cli` mines PoW on the outer GiftWrap based on the `POW` env var + (default `0`) — see `src/util/messaging.rs::parse_pow_env` and the + `WrapOptions { pow, .. }` plumbing. +2. The wrapped event is published to relays via `client.send_event(...)`. +3. `wait_for_dm` opens a subscription on the trade key and waits for + `FETCH_EVENTS_TIMEOUT` (15 s) for an inbound GiftWrap. +4. If `mostrod` requires PoW above what the client provided, mostrod + silently drops the event in + [`mostro/src/app.rs`](https://github.com/MostroP2P/mostro) — the relay + accepted it (so there's no NIP‑01 `OK false` from the relay either) and + no reply is ever produced. +5. `tokio::time::timeout` elapses → `wait_for_dm` returns `WaitForDmTimeout` + → the user sees the generic timeout message. + +The CLI has **no way to distinguish** "daemon silently dropped my event for +PoW reasons" from "transient relay or network issue". + +## 3. Signal we can use + +`mostrod` already publishes its required PoW in its **kind‑38385 info event** +under the tag `["pow", ""]`. See +[`mostro/src/nip33.rs`](https://github.com/MostroP2P/mostro) (`new_info_event`) +where the daemon writes: + +```rust +Tag::custom( + TagKind::Custom(Cow::Borrowed("pow")), + vec![mostro_settings.pow.to_string()], +), +``` + +Reading that tag on the CLI side lets us answer the only question we care +about: *did this Mostro instance reject our event because we didn't provide +enough PoW?* + +We already fetch kind‑38385 elsewhere in the CLI — see +`src/util/events.rs::fetch_bond_claim_window_days` — so this is a small +extension of an existing pattern. + +## 4. Design + +### 4.1 Error variant + +Add a typed error so callers (and tests) can match the PoW failure mode +distinctly from a generic timeout: + +```rust +#[derive(Debug)] +pub struct PowRequirementUnmet { + pub required: u8, + pub configured: u8, +} +``` + +The user-facing `Display` will be explicit: + +```text +This Mostro instance requires NIP-13 proof of work of N bits, but the +client sent the event with M bits. Re-run with `--pow N` or set +`POW=N` and try again. +``` + +The error lives next to `WaitForDmTimeout` in `src/util/messaging.rs` (same +module, same idiom: small zero/struct error wrapped via `anyhow`). + +### 4.2 Helper — fetch required PoW from kind‑38385 + +Add a helper in `src/util/events.rs`: + +```rust +/// Best-effort: fetch the Mostro instance's kind-38385 info event and read +/// the `pow` tag. Returns `None` when no info event is available or the tag +/// is missing/unparseable. Mirrors `fetch_bond_claim_window_days`. +pub async fn fetch_required_pow(ctx: &crate::cli::Context) -> Option; +``` + +Implementation details: + +- Author filter on `ctx.mostro_pubkey`, kind `NOSTR_INFO_EVENT_KIND`. +- Pick the **newest** revision by `created_at` (replaceable but lagging + relays can still serve old copies; same caveat as the bond-window helper). +- Scan tags for `["pow", ""]`, parse as `u8`. +- Any error (fetch failure, missing tag, unparseable value) → `None`. + +Returning `None` is the "I don't know" signal; the caller treats it as +*"can't blame PoW"* and falls back to the generic timeout error. This keeps +the helper non-fatal: older daemons that don't publish the tag, or unreachable +relays, never break flows that worked before. + +### 4.3 Where to plug it in — `wait_for_dm` + +`wait_for_dm` is the single chokepoint that every request/reply flow goes +through (`add_invoice`, `take_order`, `take_dispute`, `send_msg`, `new_order`, +`rate_user`, `orders_info`, `restore`, `last_trade_index`, `add_bond_invoice`). +Centralizing the fix here covers every command in one place. + +Concurrent probe (chosen — see Alternatives below): + +```rust +// Kick off the PoW probe alongside the DM wait so its answer is in hand +// the moment the wait times out. The probe is cheap to start and cheap to +// cancel via JoinHandle::abort() on the happy path. +let pow_probe = tokio::spawn(fetch_required_pow_with( + ctx.client.clone(), + ctx.mostro_pubkey, +)); + +let waited = tokio::time::timeout(FETCH_EVENTS_TIMEOUT, /* notification loop */).await; + +let event = match waited { + Ok(inner) => { + pow_probe.abort(); + inner? + } + Err(_elapsed) => { + // Probe has been running for FETCH_EVENTS_TIMEOUT alongside the + // wait; it should already be done. POW_PROBE_TIMEOUT is a safety + // net for pathological relays — if the answer isn't in by then, + // fall through to the generic timeout error. + let probe_result = tokio::time::timeout(POW_PROBE_TIMEOUT, pow_probe).await; + if let Ok(Ok(Some(required))) = probe_result { + let configured = parse_pow_env().unwrap_or(0); + if required > configured { + return Err(PowRequirementUnmet { required, configured }.into()); + } + } + return Err(WaitForDmTimeout.into()); + } +}; +``` + +The probe lives in `events::fetch_required_pow_with(client, mostro_pubkey)` +— an owned-args sibling of `fetch_required_pow(ctx)`, used so the spawned +future is `'static`. The 3 s `POW_PROBE_TIMEOUT` is now a safety net rather +than the typical wait: in the common timeout case the probe is already +resolved when we look at it, so the user-visible wait stays at +`FETCH_EVENTS_TIMEOUT` (15 s) plus ~0 s, instead of doubling to 30 s as the +naive sequential version would. + +Add an `&Context` parameter? Look at the signature today — +`wait_for_dm(ctx, order_trade_keys, sent_message)` — `ctx` is already +passed. We just need to grant the helper access to `ctx.client` and +`ctx.mostro_pubkey` (already does). + +### 4.4 `add_bond_invoice` interplay + +`add_bond_invoice` treats `WaitForDmTimeout` as the happy path (Mostro pays +the invoice without acking over Nostr). The new variant must **not** be +caught by that branch — a PoW failure on `add-bond-invoice` is *not* a +successful submission. Concretely: + +- The existing `Err(e) if e.downcast_ref::().is_some()` + arm continues to match only `WaitForDmTimeout`. +- A `PowRequirementUnmet` falls through to the generic `Err(e) => return + Err(e)` arm and bubbles up to the user, which is the desired behavior. + +No code change needed in `add_bond_invoice.rs` — the existing pattern match +already handles the right split. We assert this with a regression note. + +## 5. Alternatives considered + +### 5.1 Preflight check (fail before sending) + +Run `fetch_required_pow` *before* `send_dm`. Advantages: fail fast (no +15‑second wait). Disadvantages: + +- Adds a relay roundtrip to the happy path for every command. +- Doubles "you didn't set PoW" errors for daemons that publish the tag but + also accept PoW=0 (no real check on that side today). +- Couples every command to the info event being reachable. + +→ Rejected. Postflight has the right cost model: zero overhead when things +work, and *only* pays for the info-event fetch in the failure path where the +user is already waiting anyway. + +### 5.2 Auto-mine the required PoW + +Bump `WrapOptions.pow` to `max(configured, required)` instead of erroring. +Tempting, but high PoW targets (e.g. 28+ bits) can take minutes to mine on +a laptop, and the user wouldn't see *why* the CLI is suddenly chewing CPU. +Better to fail explicitly with an actionable hint and let the user opt in +by setting `POW` themselves. + +→ Rejected. Defer to explicit user opt-in via `--pow` / `POW` env. + +### 5.3 Read NIP‑01 `OK false` from the relay + +Won't help: mostrod drops the event *after* the relay accepts it, so the +relay returns `OK true`. There is no rejection signal on the publish path. + +→ Rejected. + +## 6. Acceptance criteria + +(Reproducing the issue's checklist, with concrete bindings.) + +- Sending an event from `mostro-cli` to a PoW‑enabled `mostrod` without + satisfying PoW must surface an error that mentions PoW. Specifically the + `PowRequirementUnmet { required, configured }` variant rendered as: + `"This Mostro instance requires NIP-13 proof of work of N bits, but the + client sent the event with M bits. Re-run with --pow N or set POW=N and + try again."` +- Genuine timeouts (daemon reachable, no PoW requirement, just slow / no + reply) keep returning `WaitForDmTimeout` — its existing message stays. +- `add_bond_invoice`'s "timeout is the happy path" arm only matches + `WaitForDmTimeout`; a `PowRequirementUnmet` propagates and is shown to + the user. +- Older daemons that don't publish `["pow", ...]` in kind‑38385 behave + exactly as before (helper returns `None` → generic timeout). + +## 7. Test plan + +Unit tests in `src/util/messaging.rs` and `src/util/events.rs`: + +- `pow_requirement_unmet_display_mentions_required_and_configured` — + formatting check on the error message. +- `fetch_required_pow_returns_none_when_tag_missing` — synthesize a + kind-38385 event with no `pow` tag, assert `None`. +- `fetch_required_pow_picks_newest_revision` — two events, newer with a + different pow value; helper returns the newer value. Mirrors the existing + `fetch_bond_claim_window_days` pattern. +- `fetch_required_pow_parses_pow_tag` — single event with `["pow", "12"]`, + assert `Some(12)`. + +A manual end-to-end check (out of the unit-test scope) is to run against a +PoW‑enabled local `mostrod`: + +```sh +POW=0 mostro-cli neworder ... +# expected: PowRequirementUnmet error, not "deadline has elapsed" +POW= mostro-cli neworder ... +# expected: normal flow, no error +``` + +## 8. Out of scope + +- Auto-mining the missing PoW (see 5.2). +- Removing or restructuring the `POW` env var / `--pow` flag. +- Surfacing the PoW requirement during `list-orders` / informational + flows. The fix is scoped to the request/reply path where the missing + PoW silently kills the flow. diff --git a/src/util/events.rs b/src/util/events.rs index 63bad26..40435e4 100644 --- a/src/util/events.rs +++ b/src/util/events.rs @@ -83,6 +83,43 @@ pub fn create_filter( } } +/// Fetch the Mostro instance's kind-38385 info event and read a tag whose +/// first slot equals `tag_name`. Returns the second slot as a string. +/// +/// Best-effort: any relay error or missing tag degrades to `None` so callers +/// keep working against older daemons that don't publish the tag. +/// Pure: read the second slot of the first tag whose first slot equals +/// `tag_name`. Split out so the kind-38385 tag-parsing logic stays unit +/// testable without spinning up a relay. +fn read_info_tag_from_event(event: &nostr_sdk::Event, tag_name: &str) -> Option { + event.tags.iter().find_map(|tag| { + let slice = tag.as_slice(); + if slice.first().map(String::as_str) == Some(tag_name) { + slice.get(1).cloned() + } else { + None + } + }) +} + +async fn fetch_info_tag(ctx: &crate::cli::Context, tag_name: &str) -> Option { + let filter = Filter::new() + .author(ctx.mostro_pubkey) + .kind(nostr_sdk::Kind::Custom(NOSTR_INFO_EVENT_KIND)); + + let events = ctx + .client + .fetch_events(filter, FETCH_EVENTS_TIMEOUT) + .await + .ok()?; + + // kind-38385 is replaceable, but pick the newest revision by `created_at` + // explicitly: a lagging relay (or several relays at once) can still surface + // an older copy. + let event = events.iter().max_by_key(|e| e.created_at)?; + read_info_tag_from_event(event, tag_name) +} + /// Fetch the Mostro instance's kind-38385 info event and read the /// `bond_payout_claim_window_days` tag. /// @@ -92,28 +129,39 @@ pub fn create_filter( /// the protocol's "Bond payout invoice" / "Other events" docs. Best-effort: /// any relay error degrades to `None` rather than failing the caller. pub async fn fetch_bond_claim_window_days(ctx: &crate::cli::Context) -> Option { + // A stale claim window would render the wrong, very user-facing forfeit + // deadline — same newest-revision caveat as the rest of the info event. + fetch_info_tag(ctx, "bond_payout_claim_window_days") + .await + .and_then(|v| v.parse::().ok()) +} + +/// Fetch the Mostro instance's required NIP-13 proof-of-work difficulty from +/// the kind-38385 info event (`["pow", ""]` tag). +/// +/// Returns `None` when the daemon doesn't publish the tag (older versions), +/// when the value is unparseable, or when the info event can't be fetched. +/// Used by [`crate::util::messaging::wait_for_dm`] to distinguish a real +/// timeout from a silent PoW rejection — see `docs/pow_error_handling.md`. +pub async fn fetch_required_pow(ctx: &crate::cli::Context) -> Option { + fetch_required_pow_with(ctx.client.clone(), ctx.mostro_pubkey).await +} + +/// Owned-args variant of [`fetch_required_pow`], suitable for `tokio::spawn`. +/// +/// `wait_for_dm` kicks the probe off concurrently with the DM wait so the +/// answer is already in hand by the time the wait times out (zero added +/// latency in the timeout path, instead of a sequential second fetch). +pub async fn fetch_required_pow_with(client: Client, mostro_pubkey: PublicKey) -> Option { let filter = Filter::new() - .author(ctx.mostro_pubkey) + .author(mostro_pubkey) .kind(nostr_sdk::Kind::Custom(NOSTR_INFO_EVENT_KIND)); - - let events = ctx - .client + let events = client .fetch_events(filter, FETCH_EVENTS_TIMEOUT) .await .ok()?; - - // kind-38385 is replaceable, but pick the newest revision by `created_at` - // explicitly: a lagging relay (or several relays at once) can still surface - // an older copy, and a stale claim window would render the wrong, very - // user-facing forfeit deadline. let event = events.iter().max_by_key(|e| e.created_at)?; - for tag in event.tags.iter() { - let slice = tag.as_slice(); - if slice.first().map(String::as_str) == Some("bond_payout_claim_window_days") { - return slice.get(1).and_then(|v| v.parse::().ok()); - } - } - None + read_info_tag_from_event(event, "pow").and_then(|v| v.parse::().ok()) } #[allow(clippy::too_many_arguments)] @@ -206,3 +254,56 @@ pub async fn fetch_events_list( } } } + +#[cfg(test)] +mod tests { + use super::*; + + async fn make_info_event(keys: &Keys, tags: Vec) -> nostr_sdk::Event { + EventBuilder::new(nostr_sdk::Kind::Custom(NOSTR_INFO_EVENT_KIND), "") + .tags(tags) + .sign(keys) + .await + .expect("sign info event") + } + + fn pow_tag(value: &str) -> Tag { + Tag::parse(["pow", value]).expect("build pow tag") + } + + #[tokio::test] + async fn read_info_tag_finds_pow_value() { + let keys = Keys::generate(); + let event = make_info_event( + &keys, + vec![ + Tag::parse(["fee", "0.006"]).unwrap(), + pow_tag("12"), + Tag::parse(["fiat_currencies_accepted", "USD,EUR"]).unwrap(), + ], + ) + .await; + assert_eq!(read_info_tag_from_event(&event, "pow"), Some("12".into())); + } + + #[tokio::test] + async fn read_info_tag_returns_none_when_missing() { + let keys = Keys::generate(); + let event = make_info_event(&keys, vec![Tag::parse(["fee", "0.006"]).unwrap()]).await; + assert_eq!(read_info_tag_from_event(&event, "pow"), None); + } + + #[tokio::test] + async fn pow_tag_parses_as_u8() { + // u8 parse is what fetch_required_pow chains after the helper. + // Lock in that the daemon's stringified u8 round-trips cleanly, + // and that garbage values degrade to None. + let parse = |s: &str| s.parse::().ok(); + assert_eq!(parse("12"), Some(12)); + assert_eq!(parse("0"), Some(0)); + assert_eq!(parse("nope"), None); + // Out of range for u8 → None, which is the right "ignore this + // weird value, fall back to generic timeout" behavior. + assert_eq!(parse("999"), None); + } +} diff --git a/src/util/messaging.rs b/src/util/messaging.rs index f55f645..f6936d2 100644 --- a/src/util/messaging.rs +++ b/src/util/messaging.rs @@ -256,6 +256,12 @@ pub async fn send_plain_text_dm( .await } +/// Upper bound on the post-timeout PoW probe inside [`wait_for_dm`]. Kept +/// well below `FETCH_EVENTS_TIMEOUT` so a slow/unreachable relay can't +/// double the user-visible wait — if the kind-38385 info event isn't back +/// within this budget we fall through to [`WaitForDmTimeout`] instead. +const POW_PROBE_TIMEOUT: std::time::Duration = std::time::Duration::from_secs(3); + /// Distinguishable error returned by [`wait_for_dm`] when no reply arrives /// within [`FETCH_EVENTS_TIMEOUT`]. /// @@ -275,6 +281,33 @@ impl std::fmt::Display for WaitForDmTimeout { impl std::error::Error for WaitForDmTimeout {} +/// Returned by [`wait_for_dm`] when the wait timed out *and* the Mostro +/// instance's kind-38385 info event advertises a NIP-13 PoW difficulty +/// strictly above what the client sent. Lets callers (and tests) +/// distinguish a silent PoW rejection — the daemon dropped the event after +/// the relay accepted it — from a generic timeout. +/// +/// See `docs/pow_error_handling.md` for the design rationale. +#[derive(Debug)] +pub struct PowRequirementUnmet { + pub required: u8, + pub configured: u8, +} + +impl std::fmt::Display for PowRequirementUnmet { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + write!( + f, + "This Mostro instance requires NIP-13 proof of work of {} bits, \ + but the client sent the event with {} bits. \ + Re-run with `--pow {}` or set `POW={}` and try again.", + self.required, self.configured, self.required, self.required + ) + } +} + +impl std::error::Error for PowRequirementUnmet {} + pub async fn wait_for_dm( ctx: &crate::cli::Context, order_trade_keys: Option<&Keys>, @@ -296,6 +329,17 @@ where // Send message here after opening notifications to avoid missing messages. sent_message.await?; + // Kick off the PoW probe concurrently with the DM wait. By running the + // kind-38385 lookup alongside the 15s `FETCH_EVENTS_TIMEOUT` instead of + // *after* it, the timeout branch doesn't pay a second sequential + // `fetch_events` round-trip — by then the probe has typically already + // returned. `JoinHandle` lets us `abort()` the probe cheaply on the happy + // path (DM arrives in time) without leaking the task. + let pow_probe = tokio::spawn(super::events::fetch_required_pow_with( + ctx.client.clone(), + ctx.mostro_pubkey, + )); + // Wait for the DM or gift wrap event let waited = tokio::time::timeout(super::events::FETCH_EVENTS_TIMEOUT, async move { loop { @@ -313,8 +357,34 @@ where // Keep a genuine timeout (the only "no reply" outcome) distinguishable from // a notification-channel error so callers can treat them differently. let event = match waited { - Ok(inner) => inner?, - Err(_elapsed) => return Err(WaitForDmTimeout.into()), + Ok(inner) => { + // Happy path: DM arrived. Cancel the probe; the answer is no + // longer needed and we don't want a stray relay request lingering. + pow_probe.abort(); + inner? + } + Err(_elapsed) => { + // mostrod silently drops events whose outer GiftWrap doesn't meet + // its NIP-13 PoW requirement (relay accepts → daemon discards → + // no reply ever comes). The probe has already been running for + // `FETCH_EVENTS_TIMEOUT` alongside the wait, so it is almost + // certainly done. Cap the await with `POW_PROBE_TIMEOUT` as a + // safety net so a pathological relay can't keep us hanging — if + // the probe isn't back by then, fall through to the generic + // timeout error instead of waiting any longer. + let probe_result = tokio::time::timeout(POW_PROBE_TIMEOUT, pow_probe).await; + if let Ok(Ok(Some(required))) = probe_result { + let configured = parse_pow_env().unwrap_or(0); + if required > configured { + return Err(PowRequirementUnmet { + required, + configured, + } + .into()); + } + } + return Err(WaitForDmTimeout.into()); + } }; let mut events = Events::default(); @@ -441,6 +511,42 @@ pub async fn print_dm_events( mod tests { use super::*; + #[test] + fn pow_requirement_unmet_display_mentions_required_and_configured() { + let rendered = PowRequirementUnmet { + required: 12, + configured: 4, + } + .to_string(); + assert!( + rendered.contains("12") && rendered.contains("4"), + "message should surface both PoW values, got: {rendered}", + ); + assert!( + rendered.contains("proof of work"), + "message should call out PoW explicitly, got: {rendered}", + ); + assert!( + rendered.contains("--pow") && rendered.contains("POW="), + "message should hint at both the CLI flag and env var, got: {rendered}", + ); + } + + #[test] + fn pow_requirement_unmet_is_downcastable_through_anyhow() { + // The add-bond-invoice flow only catches WaitForDmTimeout via + // downcast_ref::(). Verify our new error stays + // distinct so PoW failures don't get misreported as a successful + // bond-invoice submission. + let err: anyhow::Error = PowRequirementUnmet { + required: 8, + configured: 0, + } + .into(); + assert!(err.downcast_ref::().is_some()); + assert!(err.downcast_ref::().is_none()); + } + fn leading_zero_bits_in_hex(hex: &str) -> u32 { let mut bits = 0_u32; for ch in hex.chars() { diff --git a/src/util/mod.rs b/src/util/mod.rs index b4b0bc6..7d36c8b 100644 --- a/src/util/mod.rs +++ b/src/util/mod.rs @@ -7,12 +7,13 @@ pub mod types; // Re-export commonly used items to preserve existing import paths pub use events::{ - create_filter, fetch_bond_claim_window_days, fetch_events_list, FETCH_EVENTS_TIMEOUT, + create_filter, fetch_bond_claim_window_days, fetch_events_list, fetch_required_pow, + FETCH_EVENTS_TIMEOUT, }; pub use messaging::{ derive_shared_key_hex, derive_shared_keys, keys_from_shared_hex, print_dm_events, send_admin_chat_message_via_shared_key, send_dm, send_plain_text_dm, wait_for_dm, - WaitForDmTimeout, + PowRequirementUnmet, WaitForDmTimeout, }; pub use misc::{get_mcli_path, uppercase_first}; pub use net::connect_nostr;