Skip to content

fix(aztec-cli): release TXE port and kill bb children on shutdown#22730

Merged
dbanks12 merged 4 commits intomerge-train/fairiesfrom
db/fix-aztec-test-hang
Apr 24, 2026
Merged

fix(aztec-cli): release TXE port and kill bb children on shutdown#22730
dbanks12 merged 4 commits intomerge-train/fairiesfrom
db/fix-aztec-test-hang

Conversation

@dbanks12
Copy link
Copy Markdown
Contributor

@dbanks12 dbanks12 commented Apr 22, 2026

Summary

Ctrl-C (or any SIGINT/SIGTERM) during aztec test could leave the TXE node still listening on its port and in-flight bb children still running, blocking subsequent runs with EADDRINUSE. Fixes #22725.

Why it happens

Two gaps in the TXE shutdown path:

  1. startTXE (yarn-project/aztec/src/cli/cmds/start_txe.ts) bound the HTTP listener but registered no shutdown callback with the outer signalHandlers array that aztecStart passes to installSignalHandlers (yarn-project/aztec/src/cli/aztec_start_action.ts:94). Every other start* function in that file (startNode, startBot, startArchiver, startP2PBootstrap, startProverAgent, startProverBroker) already wires itself in this way — TXE was the odd one out. The listener could outlive process.exit under adversarial timing and hold the port.
  2. executeBB (yarn-project/bb-prover/src/bb/execute.ts:90) spawned bb with no signal plumbing. If the node parent received SIGINT/SIGTERM, those signals were not propagated to bb, so the child became an orphan under init.

Changes

  • startTXE: accept signalHandlers, push () => httpServer.close() so the port is released cleanly on SIGINT/SIGTERM. Brings TXE in line with the other start* functions.
  • aztec_start_action: pass signalHandlers into startTXE.
  • executeBB: forward SIGINT/SIGTERM/SIGHUP to the spawned bb child, removing the handlers on close so they don't leak across calls.

Test plan

  • yarn build passes for touched packages.
  • yarn format + yarn lint clean for aztec and bb-prover.
  • Manual: run aztec test --workspace, Ctrl-C mid-run, confirm no orphaned node/bb processes and that re-running starts cleanly.
  • Manual: pkill -TERM -f 'aztec start --txe' mid-run, same check.

Ctrl-C during `aztec test` left the TXE node and any in-flight `bb` child
running, keeping port 8081 bound and blocking subsequent runs. Fix at two
layers:

- `aztec.sh`: enable job control so the backgrounded TXE gets its own
  process group, then signal the whole group on cleanup (trap EXIT/INT/
  TERM/HUP, SIGTERM then SIGKILL escalation, `wait`).
- `startTXE`: push an `httpServer.close()` callback into `signalHandlers`
  so the port is released before `process.exit` under normal shutdown.
- `executeBB`: forward SIGINT/SIGTERM/SIGHUP to the spawned `bb` child so
  it doesn't survive as an orphan when the parent exits.

Fixes #22725.
Drop the SIGTERM → grace → SIGKILL escalation loop in favor of the single
`kill -TERM` + `wait` pattern used by ci3/run_test_cmd and ci3/parallelize_strict.
The TS-layer signal forwarding in executeBB covers the "bb ignores SIGTERM"
case, making the shell-level SIGKILL escalation redundant.
@dbanks12 dbanks12 requested review from Thunkar and charlielye April 23, 2026 20:11
@ludamad ludamad added the claude-review Triggers an automated Claude code review label Apr 23, 2026
Comment thread yarn-project/aztec/scripts/aztec.sh Outdated
export LOG_LEVEL="${LOG_LEVEL:-"error;trace:contract"}"
# Enable job control so the backgrounded TXE gets its own process group.
# Signalling the group on cleanup reaches node + any bb descendants, which
# otherwise get orphaned to init and keep the port bound after Ctrl-C.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need its own process group for graceful cleanup?

Copy link
Copy Markdown
Contributor Author

@dbanks12 dbanks12 Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not honestly. This should never be hit if the TS signal handling is all sane. Edit: should never be a concern here in bash*

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TS changes are less controversial to me

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed bash changes

@ludamad ludamad removed the claude-review Triggers an automated Claude code review label Apr 23, 2026
@dbanks12 dbanks12 changed the title fix(aztec-cli): kill TXE subtree cleanly on Ctrl-C in aztec test fix(aztec-cli): release TXE port and kill bb children on shutdown Apr 23, 2026
Copy link
Copy Markdown
Contributor

@Thunkar Thunkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TS side lgtm!

Copy link
Copy Markdown
Contributor

@charlielye charlielye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully the bb part of this becomes obsolete with this following PR, as in this path bb will kill itself when it parent dies (dark). but this looks fine for quick fix.

#21564

@dbanks12 dbanks12 enabled auto-merge (squash) April 24, 2026 14:24
@dbanks12 dbanks12 merged commit 13b47e5 into merge-train/fairies Apr 24, 2026
27 checks passed
@dbanks12 dbanks12 deleted the db/fix-aztec-test-hang branch April 24, 2026 14:24
@AztecBot
Copy link
Copy Markdown
Collaborator

✅ Successfully backported to backport-to-v4-staging #22768.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants