fix(workflows): in-place updates, deletion-forgery fix, webhook-secret & channel immutability by AaronGoldsmith · Pull Request #1340 · block/buzz

AaronGoldsmith · 2026-06-29T01:52:18Z

Summary

Workflows that were edited or deleted kept running concurrently. The root cause: the relay called create_workflow, which generated a random DB primary key on every publish, so each edit inserted a new active row (the old one kept firing) and client-side deletions never matched a row.

This PR makes the client-supplied UUID (the Nostr parameterized-replaceable-event d tag) the stable primary key, so edits update in place and deletions resolve correctly. It also fixes three security/stability issues found in review:

In-place updates — handle_workflow_def resolves the existing row by UUID and calls update_workflow instead of unconditionally inserting. No more duplicate concurrent cron jobs.
Deletion forgery (security) — NIP-09 a-tag deletion previously only checked the actor matched the coordinate pubkey, then deleted by UUID without confirming the DB row's owner. A forged 30620:<attacker>:<victim_uuid> could delete another user's workflow. Now the row is loaded and owner_pubkey is verified before deleting.
Webhook secret preservation — editing a webhook workflow regenerated its _webhook_secret, breaking existing callers. The existing secret is now extracted from the stored row and re-injected on update.
Channel immutability & info-leak prevention — channel membership is checked before any DB lookup (so unauthorized users can't probe for valid UUIDs via differing errors), and record.channel_id != Some(channel_id) rejects moving a workflow between channels, including the legacy NULL-channel case.
NIP-33 stale-write protection on the command path — workflow-def events are routed to the command executor and never reached replace_parameterized_event, so an older event delivered after a newer one (reconnect/retry) could overwrite the row with stale YAML. The lock + (created_at, id) dominance check + soft-delete is now extracted into a shared buzz_db::nip33_stale_write_guard, called by both replace_parameterized_event and persist_command_event so the two paths can't drift. Dominated events become idempotent duplicates and skip the mutation.

Changes

buzz-db/src/workflow.rs — create_workflow now takes id: Uuid instead of generating one internally.
buzz-db/src/lib.rs — extracted nip33_stale_write_guard(&mut conn, …); replace_parameterized_event now calls it (behavior unchanged) so the command path can share it.
buzz-relay/src/handlers/command_executor.rs — handle_workflow_def parses the d-tag UUID, looks up any existing row (after the membership gate), enforces owner + channel-immutability, preserves the webhook secret, and routes to update_workflow vs create_workflow. persist_command_event applies the shared NIP-33 stale-write guard for parameterized-replaceable command kinds.
buzz-relay/src/handlers/side_effects.rs — handle_a_tag_deletion loads the workflow and asserts owner_pubkey before deleting; missing rows are logged and treated as a no-op.
buzz-test-client/tests/e2e_workflows.rs — test_workflow_update_and_delete expanded to assert in-place update, webhook-secret stability, channel-change rejection, and deletion-forgery rejection (asserting the row persists).

Verification

cargo fmt --check                       # clean
cargo clippy (relay, db, test-client)   # no warnings
just test-unit                          # all passed

# Integration (against a relay built from THIS branch):
BUZZ_BIND_ADDR=0.0.0.0:3001 RELAY_URL=ws://localhost:3001 \
  BUZZ_HEALTH_PORT=8081 BUZZ_METRICS_PORT=9103 cargo run -p buzz-relay
RELAY_URL=ws://localhost:3001 \
  cargo test -p buzz-test-client --test e2e_workflows \
  test_workflow_update_and_delete -- --ignored
# test test_workflow_update_and_delete ... ok

Manual before/after: a 1-minute cron edited to v2/v3 previously left multiple duplicate jobs firing and could not be deleted; after the fix only the latest definition fires and a NIP-09 deletion stops it immediately.

Reviewer note — pre-existing duplication left out of scope

While extracting nip33_stale_write_guard, I deduped the two NIP-33 replacement copies the reviewer flagged (replace_parameterized_event + the new command path). A third copy of the FNV-1a advisory-lock-key snippet still lives in Db::replace_addressable_event (relay-signed NIP-29 group metadata, kinds 39000–39002). It predates this PR (present on main) and was intentionally left untouched here:

It keys on channel_id (the relay is the author, so pubkey isn't the natural key), so it isn't a drop-in for the same helper.
Folding it in would widen this PR's blast radius beyond the workflow fix.

Flagging it as a good standalone follow-up (factor the lock-key into a shared fnv_coordinate_lock_key and have all three call it).

…ent UUID Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

…nd enforce channel immutability on updates Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

…hannel_id update scenario Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

AaronGoldsmith

Self-review: flagging the non-obvious / higher-blast-radius bits so they're easy to scrutinize.

AaronGoldsmith · 2026-06-29T01:52:51Z

+                match state.db.get_workflow(wf_id).await {
+                    Ok(wf) => {
+                        if wf.owner_pubkey != owner_bytes {
+                            return Err(anyhow::anyhow!("forbidden: deletion owner mismatch"));


🔒 Security (deletion forgery). NIP-09 a-tag deletion previously only checked the actor matched the coordinate pubkey, then deleted by UUID. A forged 30620:<attacker>:<victim_uuid> could delete another user's workflow. We now load the row and assert owner_pubkey before deleting. This extra lookup is the ownership gate, not a redundant query.

AaronGoldsmith · 2026-06-29T01:52:51Z

    }

+    // Check if workflow already exists to perform update or create checks
+    let existing = state.db.get_workflow(workflow_id).await;


🔍 Ordering is load-bearing (info-leak). The is_member_cached gate above (line 580) runs before this lookup on purpose, so unauthorized users can't probe for valid workflow UUIDs via differing error responses. Please don't hoist this lookup earlier for efficiency — it silently reintroduces the leak.

AaronGoldsmith · 2026-06-29T01:52:51Z

+                    "forbidden: cannot update a workflow owned by another user".into(),
+                ));
+            }
+            if record.channel_id != Some(channel_id) {


⛓️ Channel immutability + legacy NULL. record.channel_id != Some(channel_id) is deliberate: it rejects both some→some channel moves and the legacy NULL→some case. A naive unwrap/compare would let legacy global workflows be silently re-scoped.

AaronGoldsmith · 2026-06-29T01:52:51Z

-        let secret = webhook_secret::generate_webhook_secret();
-        webhook_secret::inject_secret(&mut definition_json, &secret);
-        Some(secret)
+        let existing_secret = existing_record


🔑 Webhook secret preservation. On update we re-inject the existing secret from the stored row (returning None so it isn't rotated/re-surfaced). Previously every edit regenerated _webhook_secret, breaking existing webhook callers.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e34ed4b9e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-29T01:58:00Z

+        state
+            .db
+            .update_workflow(workflow_id, &workflow_name, &definition_json_final, &hash)
+            .await
+            .map_err(|e| IngestError::Internal(format!("error: db update_workflow: {e}")))?;


Ignore stale workflow definition replays

When an older kind:30620 event for the same workflow is delivered after a newer one (for example after a reconnect/retry, while still inside the relay's timestamp window), this unconditional update_workflow overwrites the workflows table with the stale YAML. The command path uses persist_command_event rather than the NIP-33 replace_parameterized_event stale-write check, so clients will still see the newer event as the latest while the workflow engine reads and runs the older definition from the DB. Please compare the incoming event's (created_at, id) against the current live coordinate before applying the update, or route workflow defs through the same replacement logic.

Useful? React with 👍 / 👎.

🤖 Addressed in efa13ed.

The command path now goes through the same NIP-33 replacement logic as the normal store path. Rather than duplicate the check, I extracted the lock + (created_at, id) dominance test + soft-delete into a shared buzz_db::nip33_stale_write_guard(&mut conn, …) and call it from both Db::replace_parameterized_event and persist_command_event, so the two paths cannot drift.

A dominated (older/retried) event now returns PersistResult::Duplicate, so the handler skips update_workflow entirely — the stale YAML never reaches the DB, and clients no longer see a newer event than the engine runs.

Among command kinds, only KIND_WORKFLOW_DEF (30620) is parameterized-replaceable (DM/approval kinds are 41xxx/46xxx), so behavior is unchanged for the rest.

Covered by test_workflow_update_and_delete (passing against a branch-built relay). Note: Nostr created_at is second-granularity, so same-second edits are resolved by the id tie-break — documented inline where the test sleeps 1s to stay deterministic.

👍 Useful catch — thanks.

Workflow def events (kind 30620) are routed to the command executor and never reach Db::replace_parameterized_event, so persist_command_event inserted them with plain ON CONFLICT DO NOTHING and no (created_at, id) replacement check. An older event delivered after a newer one (reconnect /retry within the timestamp window) was inserted and applied, overwriting the workflows row with stale YAML while clients still saw the newer event as latest. Extract the lock + dominance check + soft-delete into a single shared buzz_db::nip33_stale_write_guard(&mut conn, ...) and call it from both Db::replace_parameterized_event and persist_command_event, so the command and normal store paths apply identical NIP-33 replacement rules and cannot drift. A dominated event is reported as an idempotent duplicate, so the handler skips its mutation. Only KIND_WORKFLOW_DEF among command kinds is parameterized-replaceable (DM/approval kinds are 41xxx/46xxx), so behavior is unchanged for the rest. Document in the e2e test why a 1s sleep is required: created_at is second-granularity and same-second edits are resolved by the id tie-break. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

AaronGoldsmith added 3 commits June 28, 2026 18:26

fix(workflows): update workflows in-place and support deletion by cli…

67f8c53

…ent UUID Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

fix(workflows): resolve deletion forgery, preserve webhook secrets, a…

e6956dd

…nd enforce channel immutability on updates Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

security(workflows): prevent information leak and cover legacy NULL c…

9e34ed4

…hannel_id update scenario Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

AaronGoldsmith commented Jun 29, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 29, 2026

View reviewed changes

AaronGoldsmith force-pushed the bugfix/workflow-update-delete branch from 8889232 to efa13ed Compare June 29, 2026 02:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(workflows): in-place updates, deletion-forgery fix, webhook-secret & channel immutability#1340

fix(workflows): in-place updates, deletion-forgery fix, webhook-secret & channel immutability#1340
AaronGoldsmith wants to merge 4 commits into
block:mainfrom
AaronGoldsmith:bugfix/workflow-update-delete

AaronGoldsmith commented Jun 29, 2026 •

edited

Loading

Uh oh!

AaronGoldsmith left a comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Uh oh!

AaronGoldsmith Jun 29, 2026

Uh oh!

AaronGoldsmith Jun 29, 2026

Uh oh!

AaronGoldsmith Jun 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Uh oh!

AaronGoldsmith Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AaronGoldsmith commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Verification

Uh oh!

AaronGoldsmith left a comment

Choose a reason for hiding this comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AaronGoldsmith commented Jun 29, 2026 •

edited

Loading