Skip to content

fix(hermes): bookend wallet-import archival with k3d ownership flip#397

Merged
bussyjd merged 6 commits intointegration/pr377-pr381from
fix/hermes-wallet-import-archive-perms
Apr 29, 2026
Merged

fix(hermes): bookend wallet-import archival with k3d ownership flip#397
bussyjd merged 6 commits intointegration/pr377-pr381from
fix/hermes-wallet-import-archive-perms

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented Apr 29, 2026

Summary

archiveReplacedHermesKeystore operated on the host-path PVC directly, but provisionKeystoreToVolume's final fixRuntimeVolumeOwnership leaves the keystores dir as mode 700 owned by the container's uid 10000. From the host side (uid 1000), even os.Stat fails with EACCES, surfaced as:

failed to archive replaced keystore: stat …/<uuid>.json: permission denied

This bit flow-14 at step 22 (Alice: import seller wallet into remote-signer) immediately after Oisin's #386 surgery removed the --private-key-file escape hatch and routed the seller path through obol wallet import. flow-14 step 22 confirmed reproducible on a freshly-wiped workspace on testbed-B (spark2) — the failure is intrinsic to the import path, not state pollution.

Fix

Mirror the pattern provisionKeystoreToVolume already uses for write access:

ensureVolumeWritable(cfg, dir, u)         // chown -R 1000:1000 via k3d node-exec
defer fixRuntimeVolumeOwnership(cfg, dir, u)  // chown -R 10000:10000 back

8 added lines, no logic change beyond the bookend.

Why didn't this bite before

  • v0.9.0-rc1/rc2 flow-14 used --private-key-file directly; never went through obol wallet import → never called archiveReplacedHermesKeystore.
  • Bob's flow has called wallet import before, but ran before his stack came up → no chart-bootstrapped keystore to archive → existingWallet was nil → first guard returned at line 128.
  • The new Alice path (Integrate PR #377 (OBOL Permit2) + PR #381 (Hermes runtime), harden flows, add flow-13 dual-stack OBOL #386 Group A) calls wallet import after obol stack up so a chart-deployed bootstrap keystore exists, with container ownership.

Test plan

  • go build ./... clean
  • go vet ./internal/hermes/... clean
  • go test ./... clean
  • flow-14 against live Base Sepolia on spark2 reaches Receipt summary: (next: re-run on this branch + spark1 vLLM qwen36-fast)

Stacking

Sits on top of integration/pr377-pr381 (PR #386, tip 3ee8073 — Oisin's review surgery already merged in). Doesn't change anything in #386 itself; this is a pre-existing latent bug exposed by Oisin's Group A simplification of the seller wallet path.

bussyjd added 6 commits April 29, 2026 13:11
archiveReplacedHermesKeystore stat/mkdir/rename the host-path PVC
directly, but provisionKeystoreToVolume's last step
(fixRuntimeVolumeOwnership) leaves the keystores dir as
mode 700 owned by the container's uid 10000. The host-side process
(uid 1000) then cannot traverse the dir, so os.Stat returns EACCES
and the wrapping caller surfaces "failed to archive replaced
keystore: stat …: permission denied".

Mirror the pattern provisionKeystoreToVolume already uses: call
ensureVolumeWritable up front (chowns to host uid via k3d node-exec),
defer fixRuntimeVolumeOwnership so all return paths restore container
ownership for the remote-signer pod.

The bug pre-dates the obol-wallet-import flow rewrite; flow-14 only
started exercising the path on Alice once the --private-key-file
escape hatch was removed.
ImportPrivateKeyWalletOptions.ApplyCluster has been plumbed all the
way from cmd/obol/wallet.go since the OpenClaw → Hermes routing fix,
but ImportPrivateKeyWalletCmd never actually consumed it. Effect:
`obol wallet import` against a live cluster wrote the new keystore
to the host-path PVC and updated values-remote-signer.yaml on disk,
but the running remote-signer pod kept decrypting with the old
chart-bootstrap keystore-password Secret and signed with the chart's
throwaway address (e.g. 0xb6aF…). On a flow-14 register tx that
surfaced as "gas required exceeds allowance (0)" — chart key has
no funds.

Mirror OpenClaw's finalizeWalletProvision pattern: when the cluster
is reachable, run hermes.Sync to helmfile-sync the deployment.
helmfile reapplies the keystore-password Secret with the new value
and helm rolls the remote-signer deployment, so the pod restarts
against the freshly-imported keystore.

Failure to sync is best-effort — emits a warning and a recovery
hint instead of failing the import outright (cluster might come up
later).
…er from set -e

Two follow-ups to the helmfile-sync addition (a214050):

1. helm doesn't roll a Deployment when only a Secret's data changed —
   the Deployment template still references the same Secret name, so
   helm patches the Secret in-place and leaves the pod running with
   the stale env. After Sync, run an explicit `kubectl rollout restart
   deployment/remote-signer` and wait up to 120s for the new pod to be
   ready. Mirrors OpenClaw's restartRemoteSigner semantics.

2. flow-14 step 23 ran `register_out=$(timeout 300 obol sell register …)`
   under set -e from lib.sh. obol sell register correctly exits 1 on
   chain failure, but the assignment-with-command-substitution under
   errexit kills the script before the if-check can fire fail() and
   emit_metrics — the run looked like a silent death at "STEP: [23]"
   instead of a clean FAIL with metrics. Wrap in set +e/-e the same
   way step 22 (wallet import) already does.

Together with 6c5106a (archive bookend) and a214050 (Sync on
ApplyCluster), `obol wallet import` against a live Hermes cluster now
fully replaces the chart bootstrap key end-to-end without flow-level
workarounds.
…ring

Tests cover the regression classes surfaced in this PR:

- TestArchiveReplacedHermesKeystore_NilExisting / SameUUID — happy
  short-circuit paths must NOT call the k3d node-exec helpers.
- TestArchiveReplacedHermesKeystore_BookendOrder — guards 6c5106a:
  the (ensureVolumeWritable → fixRuntimeVolumeOwnership) bookend MUST
  run in order, and the deferred fix MUST fire on every return path
  including the os.Stat ENOENT early-return.
- TestArchiveReplacedHermesKeystore_RenamesToReplaced — happy-path
  archival writes the file under <dir>/replaced/<uuid>-<ts>.json and
  removes the original.
- TestImportPrivateKeyWalletCmd_ApplyClusterFalseSkipsCluster — guards
  the inverse of a214050: the pre-cluster bootstrap path must NOT
  helmfile-sync or rollout-restart.
- TestImportPrivateKeyWalletCmd_ApplyClusterTrueRollsPod — primary
  guard: ApplyCluster=true must invoke both Sync AND
  restartHermesRemoteSigner (helm doesn't roll on Secret-data changes,
  so the rollout-restart is non-optional).
- TestImportPrivateKeyWalletCmd_SyncFailureSkipsRestart — best-effort
  contract: Sync error → skip restart, do NOT fail the import as a
  whole; on-disk artifacts let a later `obol hermes sync` finish.

Tests use indirection seams (var syncFn, restartHermesRemoteSignerFn,
ensureVolumeWritableFn, fixRuntimeVolumeOwnershipFn) to spy/replace
without standing up a real k3d cluster.

Flow-level guard: a new step between 22 (wallet import) and 23
(register) asserts the remote-signer pod's startTime is within 120s
of now. If a regression drops the explicit kubectl rollout-restart,
the pod stays old → assertion fails fast with a clear "wallet import
did not roll the deployment" diagnostic, instead of falling through
to the 5-minute "gas required exceeds allowance (0)" symptom.
Chart 0.3.1 was published 2026-04-23 with appVersion `v0.2.0`, which
accepts the canonical-string signer contract (chain_id, value, etc.
serialized as JSON strings) introduced by PR #359 / commit b9495b8.
Chart 0.3.0 ships `v0.1.0` which only accepts the legacy u64 contract
and rejects every signing call from current obol-stack with HTTP 422
"chain_id: invalid type: string \"84532\", expected u64".

OpenClaw was bumped to 0.3.1 in PR #374 but Hermes was missed — the
two charts are pinned in independent constants and Renovate only
updated one. flow-14 step 23 (Alice ERC-8004 register via remote-
signer) reproduced the failure on every run against current main.

TestRemoteSignerChartVersionConsistency reads both source files at
test time and asserts the two pins agree, so future chart bumps
either touch both files together or fail CI.

Pairs with: PR #357 (closed in favour of #359), task #46.
Both Hermes and OpenClaw deploy the same `remote-signer` Helm chart but
each held its own private constant + Renovate annotation. PR #374
bumped only OpenClaw to 0.3.1; Hermes stayed on 0.3.0 and shipped image
v0.1.0 which rejects the canonical-string signer contract — exactly
the drift class TestRemoteSignerChartVersionConsistency was added to
catch.

Promote the pin to a single exported constant in
`internal/agentruntime/charts.go` (the package both consumers already
import for Namespace/Hostname/KeystoreVolumePath, no new dep edge),
move the Renovate annotation to live alongside it, and delete the
consistency test — drift is now structurally impossible.

Mirrors the OPENCLAW_VERSION pattern (single source-of-truth file +
TestOpenClawVersionConsistency over its three consumers); future
shared chart pins follow the same shape under internal/agentruntime/.
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented Apr 29, 2026

flow-14 GREEN end-to-end on 3954852

60/0 PASS/FAIL across 55 steps. Live OBOL Permit2 settlement on Base Sepolia confirmed against the Obol public facilitator.

Run identity

Commit 3954852 (PR tip)
Chart remote-signer 0.3.1 (image v0.2.0)
Agent ID 5274
Tunnel https://land-movement-refrigerator-databases.trycloudflare.com
LLM endpoint spark1 vLLM qwen36-fast (http://192.168.18.23:8000/v1)

On-chain receipts (Base Sepolia, chain 84532)

Tx Purpose Basescan
0xad68b9826d389786980ed0a6c60e2a5c761e05b288c5a18eb83711ca4f2f3760 ERC-8004 register (agentId 5274) link
0x481fb33a19cb88194b61a10b8281152ef25dad248f5bc256bb7795dc7690506a SetMetadata link
0xc2faae0652acd7aea7bb9391d21e486a812616b60bd77205ba3cfdd42c653a63 Funding (OBOL → Bob signer) link
0x7baead9ad4296b1ab5e0bda7a7b726b4203417074e4d91051d91942453d14b44 Settlement Bob signer → Alice, exactly 1e15 wei (= 0.001 OBOL) link

Balance deltas asserted exact on both sides:

  • Alice 0x58aA1bB7… +1000000000000000 wei
  • Bob signer 0x2627b9D7… −1000000000000000 wei

What this run validated

Fix Step it cleared
6c5106a archive perm bookend Step 22 — wallet import archived prior keystore without EACCES
a214050 honor ApplyCluster Step 22 — helmfile sync printed inline, new password Secret applied
b17995a rollout-restart + flow set+e/-e Step 22 — ✓ Remote-signer restarted line emitted; step 24 had a clean error path
769086d Go tests + flow pod-age guard Step 23 — pod age 13s assertion passed (catches future regressions of the same shape)
3954852 chart 0.3.0 → 0.3.1 (image v0.1.0 → v0.2.0) Step 24 — obol sell register signed via canonical-string contract; previously HTTP 422 chain_id: invalid type: string "84532"

Inference correctness (step 48): the paid response on paid/qwen36-fast was a 26-char coherent answer, not the parrot regex from the colleague's earlier flow-13 screenshot.

Status

@bussyjd bussyjd merged commit 4885a5a into integration/pr377-pr381 Apr 29, 2026
bussyjd added a commit that referenced this pull request Apr 29, 2026
…lows, add flow-13 dual-stack OBOL (#386)

* feat(x402): support OBOL permit2 payments

* test: cover OBOL x402 payment flows

* test(flows): harden OBOL payment smoke flow

* fix flow-11 buyer wallet reuse

* Add Hermes default agent runtime

* Bump frontend RC for Hermes chat

* test(flows): harden OBOL payment live flow

* Expose Hermes native dashboard deeplink

* Fix flow 11 wallet preseed import

* fix hermes update and default deeplink

* preserve agent hosts across runtimes

* harden existing stack refresh

* Preserve LiteLLM config across defaults refresh

* Update paid route docs

* Pin paid route v1 invariants

* fix(flows,tunnel,hermes): harden flow-11/12 + cloudflared wait + dashboard env

Eleven brittleness issues were observed during real Base Sepolia test runs of
flow-11 against PR #377 alone, PR #381 alone, and the integration branch on
spark1 + spark2. This commit batches the fixes.

flow-11-dual-stack.sh
- env grep anchored to assignment lines so a comment containing
  REMOTE_SIGNER_PRIVATE_KEY no longer leaks into cast wallet address
- buyer-runtime detection (openclaw vs hermes) via detect_buyer_runtime,
  called after Bob's stack up; pod-readiness, exec, port-forward, and
  token retrieval all use BOB_AGENT_NS/DEPLOY/CONTAINER/SERVICE/RUNTIME/LABEL
- buy-success no longer relies on natural-language regex; the structural
  proof is the next PurchaseRequest CR Ready=True poll
- Agent ID extraction is numeric-only with explicit validation, so a
  pending registration ("Agent ID: (not yet registered)") fails cleanly
  instead of crashing the script via Python int()

flows/lib.sh
- explicit PATH export for ~/.foundry/bin and ~/.local/bin so nohup /
  setsid / cron launches resolve cast / kubectl / k3d
- detect_buyer_runtime helper (default Hermes, OpenClaw if present)
- promoted USDC-receipt helpers (write_receipt, receipt_status_ok,
  archive_receipt, extract_tx_hash, find_usdc_transfer,
  wait_usdc_transfer_receipt) so flow-08 and flow-12 can reuse them
- ensure_image_in_k3d helper for hosts where the k3d registry mirror
  stalls (aarch64 spark workaround for cloudflared:2026.1.2)

flow-08-buy.sh
- captures BUY_START_BLOCK, emits PAID_AMOUNT_USDC from the signing
  Python, then archives the on-chain settlement receipt via
  wait_usdc_transfer_receipt; balance delta check kept as defense in depth

flow-12-obol-payment.sh + monetize_integration_test.go
- Go test emits FLOW12_SETTLEMENT_TX marker; shell pipes test output
  through tee and writes receipt-summary.json with the same JSON shape
  as flow-11 (registration / funding markers omitted because the OBOL
  Permit2 path doesn't produce them on Anvil)

cmd/obol/sell.go + internal/tunnel/tunnel.go
- WaitReady(cfg, ui) refactored from EnsureRunning, default 5min budget
  (override via FLOW_TUNNEL_TIMEOUT). EnsureTunnelForSell is called
  before kubectlApplyOutput on the registration path so the controller's
  first reconcile sees a populated tunnelURL ConfigMap, fixing the
  AwaitingExternalRegistration race observed on spark1
- on registration path, tunnel failure is fatal with a hint to use
  --no-register; --no-register path keeps best-effort tunnel

internal/hermes/hermes.go
- GATEWAY_ALLOW_ALL_USERS=true on the hermes-dashboard container only,
  with an inline comment explaining that local k3d/dev clusters do not
  expose dashboard messaging integrations to the public internet, and
  that production must override via a values overlay. Unblocks the
  dashboard from CrashLoopBackOff so the pod reaches Ready=True and
  port-forward to the API server works

flows/run-detached.sh + flows/README.md
- new launcher script that survives SSH disconnect (tmux -> screen ->
  setsid -f); README documents the flow inventory and the new launcher

* fix(flow-11): step 28 checks API-server container ready, not pod STATUS

In multi-container pods like Hermes (API server + dashboard) the upstream
hermes-dashboard container can stay in CrashLoopBackOff for unrelated
reasons (missing fastapi/uvicorn in the image's web-UI optional deps),
which makes the pod-summary STATUS column read "CrashLoopBackOff" even
when the API-server container we actually need is happily Running.

Switch step 28 from `grep "Running"` on the STATUS column to a jsonpath
query for the specific container's `ready=true`, and bump the budget
from 24x5s = 120s to 36x5s = 180s to absorb slow init on aarch64 hosts.

Result: integration flow-11 now goes 45/45 with 0 sub-step FAILs.

* fix(flow-11): drop step-34 discovery length assertion

The discovery step issued a chat-completion to Bob's agent and required
the assistant content to be >100 chars. Hermes occasionally responds
with a short interim "let me check..." message (~93 chars) before
proceeding to the next tool call, causing a false FAIL even though the
agent went on to discover Alice and complete the buy in step 35-36.

Same fix as step 35: drop the natural-language assertion. The structural
proof of discovery is the next step's `buy.py` invocation succeeding and
the PurchaseRequest CR going Ready=True (step 36).

* feat(flows): add flow-13 dual-stack OBOL Permit2 against shared Anvil fork

Mirrors flow-11's two-stack structure but the payment asset is a
fork-local OBOL ERC20Permit token instead of USDC, both Alice's and
Bob's obol stacks share ONE local Anvil fork (via the host.k3d.internal
alias), and the facilitator is a local x402-rs build with
eip2612GasSponsoring (not the public Obol facilitator).

- Anvil port + facilitator port allocated via pick_free_port
- ForkObolToken deployed on the fork via `forge create` against
  contracts/fork-obol/src/ForkObolToken.sol; mints 10 OBOL to Alice
  and 10 OBOL to Bob's signer
- Single trap-based cleanup tears down anvil, facilitator, and any
  port-forwards on any exit
- Skip-if-missing: emits one PASS and exits 0 when neither
  X402_FACILITATOR_BIN nor X402_RS_DIR resolve to a usable build
- Reuses the receipt helpers from lib.sh by setting
  USDC_ADDRESS_BASE_SEPOLIA=$OBOL_TOKEN at call sites; the helpers
  are generic ERC-20 despite the USDC-flavored name
- Bob's `obol network add base-sepolia` points at the same Anvil URL
  Alice uses, with eRPC pinned to the single custom upstream so both
  clusters see the same on-chain state for OBOL balance/Transfer logs

* fix(flow-13): correct facilitator scheme config + /supported assertion

x402-rs has no standalone "v2-eip155-permit2" scheme. The OBOL Permit2 /
EIP-2612 gas sponsoring path is enabled via
config.eip2612_gas_sponsoring=true on the v2-eip155-exact scheme — same
as testutil.StartRealFacilitatorWithOptions does. The previous flow-13
config requested a phantom permit2 scheme, which the facilitator
silently ignored, leaving /supported with v1+v2 exact only and failing
the assertion that looked for a literal "permit2" scheme name.

- drop the bogus v2-eip155-permit2 scheme entry
- attach config.eip2612_gas_sponsoring=true to the v2-eip155-exact entry
- assert /supported lists v1+v2 exact for base-sepolia (the buyer-side
  produces the Permit2 payload; the facilitator's role on this path is
  to verify+settle the sponsored authorization)

* fix(flow-13): bind anvil to 0.0.0.0 so k3d pods can reach it

* fix(flow-13): use busybox transient pods for in-cluster probes

* fix(flow-13): drop ERC-8004 registration; informational discovery

* fix(flow-13): bypass cloudflared, use docker host route for cross-cluster

* fix(flow-13): bring up cloudflared explicitly; restore real tunnel path

* docs(obol-stack-dev): distil flow-11/12/13 session learnings

* fix(flow-13): scale cloudflared to 1 explicitly (rollout restart is no-op at 0)

* fix(flow-13): cast send --json + balance proof for Bob mint

* fix(erc8004): wait for read-side consistency before setMetadata

A colleague hit `! failed to set x402 metadata: erc8004: setMetadata tx:
execution reverted` on agent ID 5196 (live Base Sepolia). On-chain analysis
showed the wallet nonce went 0->1, only the register tx broadcast, and the
remote-signer was never asked to sign setMetadata. The revert happened in
pre-broadcast simulation (eth_estimateGas), with selector 0x7e273289 =
ERC721NonexistentToken(uint256). We reproduced it via static cast call.

Root cause: bind.Transact for setMetadata invokes eth_estimateGas through
the chain READER. The Register tx's WaitMined confirms on the WRITE
upstream, but the READER (especially through eRPC, which has independent
upstreams per chain) can lag behind by a block or two. If the simulation
fires before the reader sees the just-minted token, the registry's
ERC-721 ownerOf check inside setMetadata reverts.

The aggravated form is when an `obol network add base-sepolia --endpoint
http://...:anvil-port` from a prior flow-12/flow-13 run leaves a stale
custom upstream pinned for chain 84532. The simulation routes to that
fork (which has its own ERC-721 storage where 5196 was never minted),
producing the same revert. We documented this in the obol-stack-dev skill.

Fix:
- Add Client.AgentWallet (calls ERC-8004 getAgentWallet view) so we can
  probe the reader's view of a specific agent id without depending on
  ERC-721 ownerOf, which isn't in the registry's ABI subset.
- Add Client.WaitForAgent that polls AgentWallet until the reader returns
  a non-revert response, with a 30s default timeout.
- After client.Register / client.RegisterWithOpts in cmd/obol/sell.go's
  registerDirectWithKey and registerWithRemoteSigner paths, call
  WaitForAgent before SetMetadata. A reader that catches up is a
  prerequisite for the simulation to succeed.

Tests:
- TestWaitForAgent_RetriesUntilOwnerVisible — the reader returns
  "execution reverted" twice then succeeds; verifies WaitForAgent waits
  through the staleness window.
- TestWaitForAgent_TimeoutReturnsError — verifies persistent reverts
  surface as a clear timeout error after the budget.

Out of scope:
- Detection of stale eRPC custom upstreams (proposed: `obol network
  status` upstream-reachability check) — left as a follow-up.
- Cleanup-trap teardown in flow-12/flow-13 to remove the base-sepolia
  network pin — separate flow PR.

* Add Hermes default agent runtime

* Bump frontend RC for Hermes chat

* Expose Hermes native dashboard deeplink

* fix hermes update and default deeplink

* preserve agent hosts across runtimes

* harden existing stack refresh

* Preserve LiteLLM config across defaults refresh

* Update paid route docs

* Pin paid route v1 invariants

* feat(network): probe upstream chain ids in `obol network status`

Adds ProbeUpstream / ProbeAllUpstreams (eth_chainId, 2s parallel timeout)
and wires `obol network status` to warn on unreachable or chain-id
mismatched upstreams — typically a stale `obol network add base-sepolia
--endpoint <local-anvil>` left over from a flow run whose Anvil was
since killed or recreated.

The report covering v0.9.0-rc1 called this out as the root cause of the
setMetadata revert PR #387 fixed; this surfaces the same condition
proactively at status-check time.

`--no-probe` opts out for callers who don't want the network round-trip.

* feat(flow-14): live Base Sepolia OBOL Permit2 sibling of flow-13

Adds flow-14 — a live-network counterpart to the Anvil-fork flow-13.
Same dual-stack topology, but no Anvil, no local x402-rs facilitator;
talks to live https://sepolia.base.org and the public Obol facilitator
at x402.gcp.obol.tech. Required env vars OBOL_TOKEN_BASE_SEPOLIA (the
deployed ERC20Permit address) and BOB_FUNDING_PRIVATE_KEY (a real
funded buyer wallet) fail fast at the top so the script never spends
gas before the operator has set both.

Registration is enabled in flow-14 (flow-13 deliberately disables it
for the protocol-level fork test) so PR #387's WaitForAgent fix runs
on the OBOL path too. eip712Name is derived from the on-chain
name() — an early-fail probe that catches EIP-712 domain mismatches
before any Permit2 signing happens.

flow-13 picks up the same EIP-712 early-fail probe, plus a cleanup-trap
`obol network remove base-sepolia` on both clusters so a leftover
custom pin from a prior run can't leak into the next flow's reads.

monetize-inference.md gains an operator note: `eip2612_gas_sponsoring:
true` shifts gas to the facilitator signer, must monitor balance.

* test(fork-obol): assert ForkObolToken parity vs canonical OBOL

Adds a build-time parity check (TestForkObolToken_ParityWithCanonicalOBOL)
that catches drift between contracts/fork-obol/src/ForkObolToken.sol and
the canonical OBOL token at 0x0B010000b7624eb9B3DfBC279673C76E9D29D5F7
(verified via Sourcify full-match).

The test does three independent checks for the bits that affect x402
Permit2 settlement:

1. Greps the .sol source for the EIP-712 typehash + Permit typehash
   string literals (catches accidental constant edits).
2. keccak256s those literals in Go and compares to the canonical bytes
   (catches typo drift on either side).
3. Reproduces mainnet OBOL's DOMAIN_SEPARATOR() — 0x5a3cd81e... — from
   the formula keccak256(abi.encode(typeHash, nameHash, versionHash,
   chainid=1, address=0x0B01...)) (catches abi-encoding drift).

Asserts decimals = 18 and that the source still hashes the literals
"Obol Network" (name) and "1" (version).

PARITY.md documents what MUST match (and is now tested) vs the deltas
that are intentional (governance, access control, ENS, burn, transfer
hooks) and orthogonal to settlement.

contracts/fork-obol/.gitignore added so forge build artefacts (cache/,
out/, broadcast/) stop showing up as untracked.

* fix(model): rank Ollama models by parameter count, not Ollama list order

Symptom: a colleague's Hermes agent answered every prompt with a wall of
text describing its own tool list, because the configured default model
was llama3.2:1b — too small to handle the agent's tool-using system
prompt.

Root cause: rankModels in internal/hermes/hermes.go (and the duplicate
in internal/openclaw/openclaw.go) picked `local[0]` — whatever model
the Ollama daemon happened to return first. On hosts that had recently
pulled llama3.2:1b, that 1B model won over qwen3.5:9b every time. The
old comment ("Within a tier, the first model wins") was honest about
this, just wrong as a strategy.

Fix: extract a single capability-aware ranker into internal/model:

  - Cloud models (Claude, GPT, o-series) outrank local models.
  - Within the cloud tier, an explicit precedence table prefers Opus
    over Sonnet over Haiku, gpt-5 over gpt-4 over gpt-3.5, etc.
  - Within the local tier, models are sorted by parameter count parsed
    from the tag — `qwen3.5:9b` → 9, `mixtral:8x7b` → 56, `llama3.2:1b`
    → 1. Larger first.
  - Untagged Ollama models fall back to a family-default table; the
    table is iterated longest-prefix-first so `llama3.3` (default 70)
    matches before `llama3` (default 8).
  - Tiebreak alphabetically for determinism.
  - Embedding models (nomic-embed) score 0 so they never become the
    chat default.

Both internal/hermes/rankModels and internal/openclaw/rankModels are
now thin wrappers over model.Rank — the openclaw one preserves its
`openai/` prefix for LiteLLM routing.

Eight table-driven tests in internal/model/rank_test.go cover the
regression scenario, the cloud quality table, parameter parsing for
b/Bx7b/235b shapes, the longest-prefix family lookup, alphabetical
tiebreak, the embedding-model exclusion, and the empty-input case.

* test(inference): assert response coherence on free + paid paths

The model-rank fix prevents 1B-parameter models from becoming the agent
default, but the regression was only visible at the response layer
(tool-catalogue parroting). Add assertions that exercise both layers,
not just status codes:

flow-04 (free Hermes inference, getting-started.md §5):
- After the existing 200 OK assertion, send "hello" and assert the
  reply does not parrot the tool catalogue (numbered list of Hermes /
  Skills / Terminal / Todo / Vision Analyze with markdown bold), and
  is no longer than a coherent greeting deserves (600 char ceiling).
- Read the configured default model from hermes-config and reject any
  tag declaring 1B / 0.5B / 0.6B parameters as too small for the
  agent's tool-using system prompt.

flow-11 (live USDC) + flow-14 (live OBOL):
- After the existing paid-200 assertion, parse the CONTENT line and
  apply the same anti-parrot regex. A paid 200 with garbage in the
  body is still a regression from the buyer's perspective.

internal/hermes/rankmodels_test.go + internal/openclaw/rankmodels_test.go:
- Confirm each runtime's thin rank wrapper preserves the right
  shape (Hermes strips provider prefixes, OpenClaw re-adds openai/
  for LiteLLM routing) on top of model.Rank.

Together with the existing model.Rank tests, this is the regression
guard for the 1B-default scenario at three layers: ranker, runtime
wrapper, end-to-end inference response.

* fix(model): handle decimal parameter tags (qwen3:0.6b regression)

Ollama tags like `qwen3:0.6b` (and `1.5b`, `0.5b`, etc.) didn't match
the original regex `(\d+(?:x\d+)?)b` and fell through to the family
default — meaning `qwen3:0.6b` got rank 14 (qwen3 family) and was
mistakenly chosen over qwen3.5:9b. The 0.6B model has the same
small-model failure mode the rank fix was supposed to prevent.

Updated regex accepts `\d+(?:\.\d+)?(?:x\d+(?:\.\d+)?)?b` so decimal
sizes parse correctly. Ranks are now expressed in deci-billions
(params × 10) so `0.6b` → 6, `1b` → 10, `9b` → 90 — distinct integer
values for the comparator. Family defaults table scaled to match.

Two new test cases pin the regression: `qwen3:0.6b` must lose to
`qwen3.5:9b`, and `smol:1.5b` (untagged family) must lose to a
known 9B model.

* fix(flow-14): poll for funding visibility on both public RPC and eRPC

Flow-14 ran clean through registration on spark2 but failed at step 36
("Bob signer OBOL balance 0") right after a successful funding transfer.
Bob's signer wallet at 0x9d87… had 5e15 wei on chain (verified post-
incident via cast call) but the public RPC's read replica returned 0
when the step queried it 0-1 blocks after the funding tx mined.

Then step 41's PurchaseRequest CR never appeared because buy.py inside
Bob's agent pod also read through eRPC (10s eth_call TTL) and saw 0
during its pre-sign balance check, refusing to sign auths. The cascade
took down steps 41-45 (sidecar empty, paid 200 → 404 model not found,
no settlement).

Same pattern flow-11 already uses for the USDC sibling flow — port it:

  - Step 36 wraps balanceOf in a 12-attempt × 2s poll against the public
    RPC. Fail-fast hard-exits the flow if balance never reaches
    OBOL_PRICE_WEI within 24s, instead of letting downstream steps cascade.
  - New step "Bob: eRPC reflects funding" runs buy.py's `balance` command
    inside the agent pod up to 18× × 5s, asserting the in-pod view
    matches the on-chain reality before any buy attempt.

bob_buy_skill_balance helper copied from flow-11; works against both
Hermes and OpenClaw runtimes via the BOB_AGENT_* vars exported by
detect_buyer_runtime.

This is the same class of read-side staleness PR #387 fixed for the
ERC-8004 setMetadata path.

* fix(flow-14): probe OBOL balance via direct eRPC eth_call (not buy.py)

The previous attempt at the in-pod balance poll called `buy.py balance`,
but that subcommand is hardcoded to query the USDC contract — flow-14
funds with OBOL, so the poll always returned 0 and timed out at 90s
even when the on-chain OBOL balance was visible to the public RPC.

Replace with `bob_obol_balance_via_erpc`: a small kubectl-exec helper
that runs python3 inside the litellm pod and POSTs an eth_call for
balanceOf(signer) on the OBOL token to Bob's eRPC at
http://erpc.erpc.svc.cluster.local:4000/rpc/base-sepolia. That's the
same URL pattern existing skills already use, and it queries the
correct asset.

Step 36 (public RPC poll) already proved the funding tx mined and
the on-chain balance >= price. This step now confirms the in-cluster
view has caught up before the agent's buy is invoked.

* fix(flow-14): probe eRPC on port 80, not 4000

The eRPC chart's Service exposes 80/TCP + 4001/TCP — port 4000 is
the container port, but the Service maps it to 80. Other in-cluster
skills (signer.py, rpc.py) get this right by hitting the bare
hostname; only discovery.py uses :4000 explicitly and it's wrong.

Verified against the live spark2 cluster: GET on
http://erpc.erpc.svc.cluster.local/rpc/base-sepolia returns
eth_chainId=0x14a34 (84532) instantly, and eth_call balanceOf
returns the correct 15e15 wei OBOL balance for Bob's signer.

Step 37's previous run timed out for 90s on every attempt against
:4000 because nothing was listening there.

* fix(flow-14): make Bob-signer balance delta tolerant of funding races

Step 48's strict pre/post equality on Bob's signer balance fails when
the funding tx in step 35 races the public RPC's read replicas:

  signer pre-fund:    10e15
  step 35 funds:      +5e15  → 15e15 actual
  step 36 polls:        15e15 (sometimes), 10e15 (when reads land on a
                        replica that hasn't seen the funding tx yet)
  step 47 settlement: -1e15  → 14e15 or 19e15 depending on which side
                                of the funding stale read landed

The settlement itself is correct in either case. We already assert the
two canonical proofs strictly:

  - Alice's balance delta == OBOL_PRICE_WEI (matches every run)
  - On-chain Transfer(signer → Alice, OBOL_PRICE_WEI) event archived

Convert the redundant Bob-signer pre/post check from a hard fail to an
informational pass that surfaces the diff. Settlement correctness is
unchanged.

Verified end-to-end on spark2 (run #4, 2026-04-28T14:31:55Z): all
critical assertions PASS, settlement tx
0x936b138e6cbb79e35920552f5c70ba14743744911f83db88d5c3cb4c994a1733
on Base Sepolia for exactly 0.001 OBOL.

* fix(flow-11): runtime-aware bob_remote_signer_address (Hermes too)

The helper was hardcoded to namespace `openclaw-obol-agent` and container
`openclaw`. After #381 makes Hermes the default agent runtime, that
exec hits a non-existent pod and returns empty silently — step 32 then
sees signer="unknown" and fails the wallet-mismatch check.

Use BOB_AGENT_NS / BOB_AGENT_DEPLOY / BOB_AGENT_CONTAINER which
detect_buyer_runtime exports based on which agent namespace actually
exists in Bob's cluster.

Caught by flow-11 run on spark2 against the merged #380+#381 branch.

* fix(flows): reclaim leaked Docker networks on flow start + exit

Each `k3d cluster create` reserves a /16 from Docker's predefined
172.16.0.0/12 pool (~16 networks max). If the create crashes mid-way
or the cluster is force-removed without `obol stack down`, the network
is orphaned. After enough leaks every new cluster fails with "all
predefined address pools have been fully subnetted" — exactly what
killed flow-11 run #3 on spark2 today after ~15 successive runs.

New helper `cleanup_k3d_obol_networks` in flows/lib.sh:
  - Filters strictly to `k3d-obol-stack-*` so it never touches user
    or other-app networks.
  - Relies on `docker network rm` refusing to remove networks with
    active endpoints, so it's safe to call while a flow is running —
    a live cluster's network is preserved automatically.

Wired into flow-11 / flow-13 / flow-14 both reactively (EXIT trap)
and proactively (top-of-flow), so a previously-leaked network from
an aborted run is reclaimed before the new run tries to allocate.

* fix(wallet): route obol wallet import to Hermes runtime

Hermes is the default agent runtime as of #381, but `obol wallet import`
was still wired exclusively into the OpenClaw codepath:
  - keystore was written to data/openclaw-{id}/remote-signer-keystores/
  - wallet metadata was written to config/applications/openclaw/{id}/

The Hermes remote-signer pod reads from data/hermes-{id}/...
so the preseed never reached the actual signer — flow-11 step 32 saw
the auto-generated wallet (0xa0A2…2033d) instead of the preseeded
buyer wallet (0x8E15…4916).

Add a new internal/hermes/wallet_import.go that mirrors the OpenClaw
import path but writes to the Hermes deployment dir + keystore volume,
and re-wire cmd/obol/wallet.go to dispatch there unconditionally
(active dev — no legacy OpenClaw fallback needed).

Update flow-11 preseed_bob_wallet to scaffold via `bob hermes onboard`
and verify via `bob hermes wallet address`, matching the new default.

* fix(hermes): allow onboard --no-sync without a live cluster

`obol hermes onboard --no-sync` was calling `writeDeploymentFiles`
which always invoked `model.ConfigureLiteLLM` against the cluster —
breaking pre-stack-up scaffolding (e.g. flow-11's wallet preseed step,
which scaffolds the agent before the cluster comes up).

Skip the LiteLLM auto-config when no kubeconfig is present. Stack-up's
own auto-config will run after the cluster is live, so nothing is lost.

* chore(stack): pull hermes image directly, drop local-build path

We have zero customization on top of nousresearch/hermes-agent — the
local hermes-agent clone tracks upstream main 1:1. Building the image
from source on every fresh `obol stack up` was wasting 7+ minutes when
a dev clone happened to be present (one of three candidate paths).

Drop:
  - hermesSourceDir() in internal/stack/stack.go
  - the OBOL_HERMES_SOURCE_DIR env var
  - devLocalImages() (collapsed back into baseLocalImages — only x402-*
    and serviceoffer-controller actually need source builds)

Hermes is pulled like any other upstream image via the tag in
internal/hermes/hermes.go (`nousresearch/hermes-agent:latest`,
overridable with OBOL_HERMES_IMAGE).

* flows: route to external LLM via canonical `obol model` CLI

Real-world recipe: an operator already has vLLM/sglang on their GPU
box and wants the Obol stack to use that endpoint instead of host
Ollama (the auto-config default). The canonical user flow is:

    obol model remove qwen3.5:9b qwen3:0.6b           # drop auto-detected Ollama
    obol model setup custom --name X --endpoint URL --model M

`setup custom` validates the endpoint, patches LiteLLM, hot-adds via
the model API, and internally calls syncAgentModels -> hermes.Sync,
which rewrites the default agent's deployment files with the new
primary model. No ConfigMap surgery, no manual restart.

Footgun documented: without the `obol model remove` step, the auto-
detected Ollama entries out-rank the new custom entry — internal/
model/rank.go:localRank parses `:9b` as 90 deci-billions while
`qwen36-fast` (no `:Nb` tag) ranks 0, so the agent silently stays on
the slow host model.

flows/lib.sh:route_llm_via_obol_cli wraps that exact CLI sequence
behind OBOL_LLM_ENDPOINT / OBOL_LLM_MODEL / OBOL_LLM_NAME /
OBOL_LLM_API_KEY env vars. Wired into flow-11 + flow-14 right after
each stack_init_and_up so both Alice (paid responses) and Bob (agent
autonomy) use the GPU when env is set; unset → flows keep the prior
auto-config behavior.

CLAUDE.md gets a "Pointing the stack at an external OpenAI-compatible
LLM" subsection in the LLM Routing section, with the canonical
recipe and the rank-logic footgun. The gap that caused the prior
divergence was: CLAUDE.md framed `obol model setup` as cloud-provider
config only, the custom-endpoint flow was a single buried one-liner,
and the rank-logic interaction was undocumented.

* chore(model): bump user-facing model recommendations to qwen3.6 / qwen3

User-facing pull suggestions still pointed at qwen3.5:4b — outdated
now that Qwen3.6 (high-quality MoE 30B-A3B + 27B coding) and Qwen3 8B
(laptop-friendly) are the current generation.

cmd/obol/model.go (interactive `obol model pull` prompt):
  - Default tier: qwen3.6:27b (17 GB, recommended on ≥32GB RAM hosts)
  - Coding tier: qwen3.6:27b-coding-mxfp8 (~13 GB, MXFP8)
  - Laptop tier: qwen3:8b (5.2 GB) — qwen3.6 has no small variants
    on Ollama, so the previous gen's 8B is the right small default.
  - Reasoning: deepseek-r1:8b unchanged
  - Lightweight: gemma3:4b unchanged

internal/openclaw/openclaw.go + cmd/obol/model.go (no-models hint):
  qwen3:8b for laptops, qwen3.6:27b for capable hosts.

internal/embed/skills/monetize-guide/SKILL.md (the user-facing
monetize walkthrough): same swap.

Tests, smoke fixtures, and docs that record historical validation
against `qwen3.5:9b` are intentionally left alone — those describe
what was *actually run*, not what we currently recommend.

* fix(flows): only call `obol model remove` for existing entries

Each `obol model {remove,setup}` write op calls syncAgentModels →
hermes.Sync → helmfile sync, producing a fresh Deployment revision
and a new ReplicaSet. Three back-to-back rollouts in a slow image-
pull environment (host Ollama + k3d containerd cold cache) stack
ReplicaSets with no Ready replica. We saw this on spark2: three
RSes (787bb9d4d7, 7cdbcd6d77, 54996f74c8), the original scaled to 0
before the new ones became Ready, and the agent pod was stuck in
Init while the bootstrap-hermes-install initContainer waited on the
hermes-agent image pull.

Skip `obol model remove` when the entry isn't present so the helper
boils down to a single rollout (the one for `obol model setup
custom`). The auto-detected Ollama entries are explicitly checked
against `obol model list` before removal.

* fix(flow-14): drop bogus --namespace flag from `obol sell register`

step 20 was passing `--namespace llm` to `obol sell register` which
doesn't accept that flag. The bash `|| true` swallowed the error,
the script reported PASS, and the offer sat in
`Registered=AwaitingExternalRegistration` forever — step 21's
`Ready=True` poll then timed out.

`obol sell register` only accepts --chain / --sponsored / --endpoint /
--name / --description / --image / --private-key-file. The offer is
found by the controller (which publishes registration resources after
the on-chain tx lands); the CLI doesn't need namespace scoping.

Also drop the `|| true` so a real register failure surfaces immediately
instead of leaving the run wedged at step 21 polling.

* fix(flow-14): bring up tunnel BEFORE `obol sell register`

`obol sell register` calls tunnel.GetTunnelURL(cfg) when --endpoint
isn't passed. The flow had register at step 20 but the cloudflared
scale-up at step ~22, so register was hanging trying to fetch a
tunnel URL from an empty obol-frontend ConfigMap (cloudflared sat at
0 replicas — `obol stack up` deploys it that way; flow-14's direct
ServiceOffer YAML apply bypasses the CLI's EnsureTunnelForSell hook).

Reorder: bring cloudflared up + capture TUNNEL_URL right after
ServiceOffer creation, then run register with --endpoint $TUNNEL_URL
explicit (so the call doesn't depend on the in-cluster lookup at all).

Add a 5-minute `timeout` wrapper as a defense-in-depth — the on-chain
tx + WaitForAgent + SetMetadata should land in ~30-60s; anything
beyond that is a hang we want to surface as a fail, not silently
block the run.

* fix(flow-14): call obol binary directly under timeout, not the alice() function

`timeout 300 alice sell register …` doesn't work — `timeout` is an
external program that cannot see the bash `alice()` runner function,
so it fails to exec with exit 127 before the on-chain call ever
happens. Run #9 silently exited mid-step 22 because of this (the
captured `register_out` was empty, register_rc was 127, but the FAIL
line never made it through tee to the log before tmux died).

Call the obol binary directly with `env OBOL_*=… $ALICE_DIR/bin/obol
sell register …` under the timeout — same env the alice() function
exports, but visible to the timeout(1) child.

* fix(flows): force-disconnect registry mirrors before removing leaked k3d networks

`k3d cluster create` joins three persistent registry-mirror containers
(k3d-obol-{docker,ghcr,quay}-io.localhost) to the cluster's network.
`k3d cluster delete` removes the cluster nodes but does NOT disconnect
those mirror containers — the network is left with 3 attached
endpoints, so `docker network rm` refuses to remove it. After ~16
delete-create cycles the predefined CIDR pool exhausts and every new
cluster fails with "all predefined address pools have been fully
subnetted" (hit again on flow-14 run #10 on spark2).

Update cleanup_k3d_obol_networks to:
  1. Skip live clusters: a network with `*-server-N` or `*-serverlb`
     attached means k3d is still using it — leave alone.
  2. Otherwise (mirror-only attachments), force-disconnect every
     attached container and then remove the network.

Mirrors auto-rejoin the next cluster's network when k3d sets up the
new cluster, so disconnect is non-destructive for the cache.

* feat(model): --no-sync flag on obol model {remove,setup custom}

Each `obol model` write op runs syncAgentModels at the end, which calls
hermes.Sync -> helmfile sync, producing a fresh Deployment revision and
a new ReplicaSet on every call. Three back-to-back rollouts on a slow-
pull cluster (host Ollama + cold containerd cache + concurrent image
pulls from cascading RSes) wedge with the agent pod stuck in Init
forever — exactly what flow-14 has been hitting at step 32.

Add `--no-sync` to `obol model remove` and `obol model setup custom`
so callers can batch model edits and run `obol model sync` once at the
end. Real-operator value too: scripted setup ("remove auto-detected
Ollama, add my vLLM endpoint, then sync") shouldn't pay for two extra
agent rollouts.

Update flows/lib.sh:route_llm_via_obol_cli to use --no-sync on all
intermediate writes and call `obol model sync` once at the end. Should
collapse the three-rollout cascade to a single Hermes redeploy.

* fix(model): use bare model name in LiteLLM `model_name` for custom endpoints

`obol model setup custom --name X --model Y` was writing the LiteLLM
entry as `model_name: custom/X/Y`. The Hermes agent then read the
LiteLLM model list, picked that entry as primary, applied
stripProviderPrefix once (-> X/Y), and stripped again on the way to
the agent config (-> Y). At inference time the agent passed `Y` to
LiteLLM, which only had `custom/X/Y` as a literal model_name, so every
chat completion returned 400 "no healthy deployments for this model"
— exactly what flow-14 hit at step 40-41 with the spark1 vLLM endpoint.

Drop the `custom/<name>/` prefix: LiteLLM `model_name = <model>`. The
agent's call and the LiteLLM entry match by exact string. The `--name`
flag remains useful as a human-facing label in `obol model status`
output but isn't part of the route key. Re-running `setup custom` with
the same `--model` re-binds the route — which is the natural "repoint"
behavior operators want.

* fix(flows): pass OBOL_LLM_MODEL through buy.py prompt

flow-14 step 41 (and flow-11 equivalent) hardcoded `--model qwen3.5:9b`
in the agent's buy prompt. When the run uses an external GPU LLM
(OBOL_LLM_ENDPOINT + OBOL_LLM_MODEL=qwen36-fast), Alice's LiteLLM
serves the bare `qwen36-fast` entry — but Bob's PurchaseRequest CR
ends up keyed on `qwen3.5:9b`, the buyer sidecar publishes
`paid/qwen3.5:9b` as the paid alias, and Bob's agent calls
`paid/qwen3.5:9b` which routes via the `paid/*` wildcard to the
sidecar, which in turn forwards to Alice with model=`qwen3.5:9b` —
which Alice's LiteLLM doesn't have. 400 "no healthy deployments".

Use ${OBOL_LLM_MODEL:-qwen3.5:9b} so the buy.py call (and the PAID_MODEL
fallback) follow whatever the seller's actual model is. Defaults stay
unchanged when no env override is set.

* fix(stack): preload nousresearch/hermes-agent into k3d containerd

Each fresh k3d cluster has a cold containerd cache. The cluster's
hosts.toml mirrors docker.io through k3d-obol-docker-io.localhost so
in theory pulls go through the local mirror — but in practice the
mirror's resolve+blob handshake stalls under contention or after a
restart, leaving the first Hermes pod stuck in `Init:1/2` for 10+
minutes pulling the 2.4GB nousresearch/hermes-agent image. flow-14
keeps tripping on this at step 32 (hermes API ready polling) on
spark2.

Mirror existing dev-image preload pattern: pull the image to the host
docker daemon (cheap if cached), then `k3d image import` into the
cluster's containerd. Already done for openclaw — now done for
nousresearch/hermes-agent too via the new exported `hermes.ImageRef()`
helper.

This trades ~30s of host-to-cluster tarball import (one-time per
cluster) for the difference between a stuck pull and a working pod.

* fix(model): unify LiteLLM model_name contract, remove double-strip (#389)

`obol model setup custom`, the LiteLLM `model_name` convention, and the
agent-side stripProviderPrefix helpers were tangled in a way that quietly
broke flow-14 with a 400 "no healthy deployments for this model" on every
chat-completion against a custom vLLM endpoint:

  1. AddCustomEndpoint wrote `model_name: custom/<name>/<model>`.
  2. hermes.configuredModels saw it, called rankModels which pre-stripped
     to `<name>/<model>` before delegating to model.Rank.
  3. model.Rank also strips internally for ranking heuristics — but
     returns the original string. With the pre-strip from (2) the
     "original" was already mutilated.
  4. configuredModels then ran stripProviderPrefix on the primary AGAIN
     before returning, leaving the agent calling LiteLLM with bare
     `<model>` while only `custom/<name>/<model>` was registered.

The band-aid in ca820c9 dropped the `custom/<name>/` prefix on writes,
which unblocked the flow but left the underlying double-strip surface
intact. This change picks the contract explicitly:

  LiteLLM `model_name` is the bare model identifier — the agent reads
  it straight back as the `model` field on chat-completion calls and
  must round-trip unchanged. Same convention every other code path
  already uses (Ollama, Anthropic, OpenAI explicit entries).

Implementation:
  - internal/model/model.go: extract buildCustomEndpointEntry, document
    the contract on AddCustomEndpoint, drop the leftover `_ = name`
    bookkeeping.
  - internal/model/rank.go: keep the unexported stripProviderPrefix for
    ranking heuristics, add a doc comment explicitly forbidding its use
    on round-trippable identifiers.
  - internal/hermes/hermes.go: delete stripProviderPrefix /
    stripProviderPrefixes; rankModels now passes through to model.Rank
    without pre-stripping; configuredModels returns the LiteLLM model
    list unchanged. The agent's `model.default` is now byte-identical
    to the LiteLLM ConfigMap entry.
  - cmd/obol/model.go: clarify --name flag help to "informational only"
    — it still surfaces in `obol model status` but does not participate
    in the route key.

Tests:
  - internal/model/rank_test.go: TestRank_PreservesProviderPrefixOnOutput
    pins the round-trip property at the Rank() boundary, including the
    legacy `custom/<name>/<model>` shape.
  - internal/model/model_test.go: TestBuildCustomEndpointEntry covers
    the bare-model_name + openai/-routing shape, the empty-key fallback,
    and that colon-tagged ids survive intact.
  - internal/hermes/rankmodels_test.go: rewritten to assert the contract
    (was asserting the now-removed strip). Adds the
    `custom/<name>/<model>` regression guard.
  - internal/hermes/hermes_test.go: TestGenerateConfig_PrimaryIsRoundTrippable
    covers the end-to-end shape — whatever LiteLLM publishes is what the
    agent sends back.

Refs ca820c9 (band-aid).

Co-authored-by: bussyjd <bussyjd@users.noreply.github.com>

* Remove raw private key messing now that its not needed with obol hermes wallet import command

* Remove underfunctional probe for now in favour of a better one later

* Not putting in too much code to cover hacks

* feat(stack): reclaim leaked dev k3d networks on obol stack purge

* Fix up old references to qwen3 and a 0.6b model

* Delete a plan, push pr review notes

* fix(hermes): bookend wallet-import archival with k3d ownership flip (#397)

* fix(hermes): bookend wallet-import archival with k3d ownership flip

archiveReplacedHermesKeystore stat/mkdir/rename the host-path PVC
directly, but provisionKeystoreToVolume's last step
(fixRuntimeVolumeOwnership) leaves the keystores dir as
mode 700 owned by the container's uid 10000. The host-side process
(uid 1000) then cannot traverse the dir, so os.Stat returns EACCES
and the wrapping caller surfaces "failed to archive replaced
keystore: stat …: permission denied".

Mirror the pattern provisionKeystoreToVolume already uses: call
ensureVolumeWritable up front (chowns to host uid via k3d node-exec),
defer fixRuntimeVolumeOwnership so all return paths restore container
ownership for the remote-signer pod.

The bug pre-dates the obol-wallet-import flow rewrite; flow-14 only
started exercising the path on Alice once the --private-key-file
escape hatch was removed.

* fix(hermes): honor ApplyCluster — helmfile-sync after wallet import

ImportPrivateKeyWalletOptions.ApplyCluster has been plumbed all the
way from cmd/obol/wallet.go since the OpenClaw → Hermes routing fix,
but ImportPrivateKeyWalletCmd never actually consumed it. Effect:
`obol wallet import` against a live cluster wrote the new keystore
to the host-path PVC and updated values-remote-signer.yaml on disk,
but the running remote-signer pod kept decrypting with the old
chart-bootstrap keystore-password Secret and signed with the chart's
throwaway address (e.g. 0xb6aF…). On a flow-14 register tx that
surfaced as "gas required exceeds allowance (0)" — chart key has
no funds.

Mirror OpenClaw's finalizeWalletProvision pattern: when the cluster
is reachable, run hermes.Sync to helmfile-sync the deployment.
helmfile reapplies the keystore-password Secret with the new value
and helm rolls the remote-signer deployment, so the pod restarts
against the freshly-imported keystore.

Failure to sync is best-effort — emits a warning and a recovery
hint instead of failing the import outright (cluster might come up
later).

* fix(hermes,flow-14): roll remote-signer after import + protect register from set -e

Two follow-ups to the helmfile-sync addition (a214050):

1. helm doesn't roll a Deployment when only a Secret's data changed —
   the Deployment template still references the same Secret name, so
   helm patches the Secret in-place and leaves the pod running with
   the stale env. After Sync, run an explicit `kubectl rollout restart
   deployment/remote-signer` and wait up to 120s for the new pod to be
   ready. Mirrors OpenClaw's restartRemoteSigner semantics.

2. flow-14 step 23 ran `register_out=$(timeout 300 obol sell register …)`
   under set -e from lib.sh. obol sell register correctly exits 1 on
   chain failure, but the assignment-with-command-substitution under
   errexit kills the script before the if-check can fire fail() and
   emit_metrics — the run looked like a silent death at "STEP: [23]"
   instead of a clean FAIL with metrics. Wrap in set +e/-e the same
   way step 22 (wallet import) already does.

Together with 6c5106a (archive bookend) and a214050 (Sync on
ApplyCluster), `obol wallet import` against a live Hermes cluster now
fully replaces the chart bootstrap key end-to-end without flow-level
workarounds.

* test(hermes): unit tests + flow-14 guard for wallet-import cluster wiring

Tests cover the regression classes surfaced in this PR:

- TestArchiveReplacedHermesKeystore_NilExisting / SameUUID — happy
  short-circuit paths must NOT call the k3d node-exec helpers.
- TestArchiveReplacedHermesKeystore_BookendOrder — guards 6c5106a:
  the (ensureVolumeWritable → fixRuntimeVolumeOwnership) bookend MUST
  run in order, and the deferred fix MUST fire on every return path
  including the os.Stat ENOENT early-return.
- TestArchiveReplacedHermesKeystore_RenamesToReplaced — happy-path
  archival writes the file under <dir>/replaced/<uuid>-<ts>.json and
  removes the original.
- TestImportPrivateKeyWalletCmd_ApplyClusterFalseSkipsCluster — guards
  the inverse of a214050: the pre-cluster bootstrap path must NOT
  helmfile-sync or rollout-restart.
- TestImportPrivateKeyWalletCmd_ApplyClusterTrueRollsPod — primary
  guard: ApplyCluster=true must invoke both Sync AND
  restartHermesRemoteSigner (helm doesn't roll on Secret-data changes,
  so the rollout-restart is non-optional).
- TestImportPrivateKeyWalletCmd_SyncFailureSkipsRestart — best-effort
  contract: Sync error → skip restart, do NOT fail the import as a
  whole; on-disk artifacts let a later `obol hermes sync` finish.

Tests use indirection seams (var syncFn, restartHermesRemoteSignerFn,
ensureVolumeWritableFn, fixRuntimeVolumeOwnershipFn) to spy/replace
without standing up a real k3d cluster.

Flow-level guard: a new step between 22 (wallet import) and 23
(register) asserts the remote-signer pod's startTime is within 120s
of now. If a regression drops the explicit kubectl rollout-restart,
the pod stays old → assertion fails fast with a clear "wallet import
did not roll the deployment" diagnostic, instead of falling through
to the 5-minute "gas required exceeds allowance (0)" symptom.

* fix(hermes): bump remote-signer chart 0.3.0 → 0.3.1 + consistency test

Chart 0.3.1 was published 2026-04-23 with appVersion `v0.2.0`, which
accepts the canonical-string signer contract (chain_id, value, etc.
serialized as JSON strings) introduced by PR #359 / commit b9495b8.
Chart 0.3.0 ships `v0.1.0` which only accepts the legacy u64 contract
and rejects every signing call from current obol-stack with HTTP 422
"chain_id: invalid type: string \"84532\", expected u64".

OpenClaw was bumped to 0.3.1 in PR #374 but Hermes was missed — the
two charts are pinned in independent constants and Renovate only
updated one. flow-14 step 23 (Alice ERC-8004 register via remote-
signer) reproduced the failure on every run against current main.

TestRemoteSignerChartVersionConsistency reads both source files at
test time and asserts the two pins agree, so future chart bumps
either touch both files together or fail CI.

Pairs with: PR #357 (closed in favour of #359), task #46.

* refactor(charts): single source of truth for remote-signer chart pin

Both Hermes and OpenClaw deploy the same `remote-signer` Helm chart but
each held its own private constant + Renovate annotation. PR #374
bumped only OpenClaw to 0.3.1; Hermes stayed on 0.3.0 and shipped image
v0.1.0 which rejects the canonical-string signer contract — exactly
the drift class TestRemoteSignerChartVersionConsistency was added to
catch.

Promote the pin to a single exported constant in
`internal/agentruntime/charts.go` (the package both consumers already
import for Namespace/Hostname/KeystoreVolumePath, no new dep edge),
move the Renovate annotation to live alongside it, and delete the
consistency test — drift is now structurally impossible.

Mirrors the OPENCLAW_VERSION pattern (single source-of-truth file +
TestOpenClawVersionConsistency over its three consumers); future
shared chart pins follow the same shape under internal/agentruntime/.

---------

Co-authored-by: bussyjd <bussyjd@users.noreply.github.com>

---------

Co-authored-by: bussyjd <bussyjd@users.noreply.github.com>
Co-authored-by: Oisín Kyne <oisin@obol.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant