Delete a plan, push pr review notes to explain the preceeding PRs#395
Closed
OisinKyne wants to merge 1 commit intooisin/377-5from
Closed
Delete a plan, push pr review notes to explain the preceeding PRs#395OisinKyne wants to merge 1 commit intooisin/377-5from
OisinKyne wants to merge 1 commit intooisin/377-5from
Conversation
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #386 review and surgery — 2026-04-29
This file captures the review process for
integration/pr377-pr381(PR #386,"Integrate PR #377 OBOL Permit2 + PR #381 Hermes runtime, harden flows, add
flow-13 dual-stack OBOL"), what got cut, what stayed, and why. It is not
intended to live in the repo long-term; CLAUDE.md says plan/PR-review docs
shouldn't be committed. Keeping it under
plans/(gitignored) so it survivesthe work session and can be pasted into the PR body or archived.
Branch state at start of review
1804d43 fix(model): unify LiteLLM model_name contract, remove double-strip (#389)(this commit landed during the review; was 64bbe22 when the review started).
main.integration/pr339-375-376-obol-usdc) — OBOL Permit2 buy/sell support. Bottom of stack.feature/hermes-agent-runtime-refresh) — Hermes as default agent runtime.fix/flow11-injected-buyer-wallet) — flow-11 buyer-wallet reuse fix.fix/erc8004-setmetadata-stale-read) —WaitForAgentfor read-side staleness. Already merged into 386.feat/obol-x402-hardening-flow-14) — flow-14, model rank fix, network probe. Already merged into 386.fix/model-name-strip-complexity) — round-trip model_name contract. Already merged into 386.feature/agent-provider-smokes-model-prefer) — addsobol model prefer. Will need to rebase against this branch.Oisin's bottom-of-PR concerns (the starting point)
From the GitHub PR thread:
"is this really the best way to do this? would supplying a signature be
more sane?" + "is this key in any way privileged?"
lot of the focus of the defensive programming of this pr) is for an AI
fuck up in a clean down of an automated test... 1) chainID is an
insufficient probe for a status of an upstream. 2) erpc does the status
management already, can't we parse it?"
imagine our erpc returning stale reverts unless we wait a block/request
more exactly a fresh one"
outside in an env as an example, maybe i'll know when i look"
Plus three GitHub-Advanced-Security CodeQL hits (the int conversion at
probe.go:144was the only legitimate one; vanished with the file).Layered-on findings from my own audit (pre-edit)
loadRegistrationSigningKey()codepath (controller.go:1383pre-edit) was fully wired — 9 call sites for
SubmitRegister/SetMetadata/SetAgentURI— but never triggered: theERC8004_PRIVATE_KEY/_FILEenv vars are not set in the deploymenttemplate (
internal/embed/infrastructure/base/templates/x402.yaml), noflow sets them, no test sets them, no GitHub Actions sets them. Pure dead
code unless someone manually
kubectl set envs the controller.--private-key-fileflag was real — used byflow-11-dual-stack.shand
flow-14-live-obol-base-sepolia.shfor bootstrap. It was redundantwith
obol wallet import(which exists for Hermes since the Hermesdefault-agent landing).
model.Rankis not on a hot path. Called only fromHermes.SetupDefault,Onboard,writeDeploymentFiles,Sync(deploy/sync time). Not per chatrequest. Oisin's "called all the time?" worry was unfounded.
model.Rankvs theobol model preferbranch (Add provider smokes and model preference #379) — real conflict.Prefer reorders the LiteLLM ConfigMap so the user's choice sits at index 0;
Rank reorders again on next sync. These two branches will collide and need
reconciling when Add provider smokes and model preference #379 rebases. Deferred to that PR.
through Traefik on port 80 (8080 on macOS due to the privileged port
caveat already in CLAUDE.md). Pre-existing UX, not introduced here.
9ef8cda harden existing stack refreshwas a bundle of (a) frontendv0.1.17-rc.3 → rc.5 bump (already shipped via chore: bump frontend image to v0.1.17-rc.5 #383), (b) hardcoded-resource
Helm-ownership migration, (c) kubectl-patch field-manager migration. (b)
and (c) target a one-off transition seen on dev laptops.
4323dd3 Preserve LiteLLM config across defaults refreshis a cleanrefactor with real user value — the base helmfile template only contains
paid/*, so without preserve+restore everyobol stack upwould wipeuser-added cloud providers.
c08c873/34e62e5Docker-network reclaim only fixes the leak inside flowscripts.
obol stack purgecallsk3d cluster deletewhich leaks thenetwork when dev-mode pull-through registry-mirror containers are
attached, exhausting the predefined CIDR pool after ~16 cycles.
qwen3:8blaptop default and the scatteredqwen3:0.6bexamples werepre-dating the qwen3.5 / qwen3.6 generations. Cross-checked against
ollama.com/library at review time:
qwen3.6smallest is27b(17GB); no small variants exist.qwen3.5has4b/9b/27b/35b/122b. Validated test baseline isqwen3.5:9b(6.6GB).qwen3:0.6bdoes still exist (523MB) but is too small to be a sanedefault in 2026.
wrong; actual is 31GB.
Decisions and dialog, group by group
Group A — Controller signing key +
--private-key-fileflagThe most architecturally consequential cut. Tied directly to memory record:
"Never extract private keys from remote-signer; use its REST API for all
signing."
Two questions surfaced before any edits:
I presented two options for the flag:
obol wallet importonce at bootstrap, then runobol sell http/obol sell registerwith no flag. One signing surface.
the escape hatch.
Oisin chose A explicitly:
What got cut
internal/serviceoffercontroller/controller.go:registrationKey *ecdsa.PrivateKeyfieldregistrationOwnerAddress stringfieldloadRegistrationSigningKey()(read ofERC8004_PRIVATE_KEY/_FILE)syncRegistrationMetadata()and the metadata-sync block inreconcileRegistrationActivec.registrationKey != nilself-register branch and on-chaintombstone branch — controller now only publishes the registration
document and watches for the externally-driven register tx.
crypto/ecdsaandcryptoimportscmd/obol/sell.go:--private-key-fileflag fromobol sell httpandobol sell registerreadPrivateKeyMaterial()helperregisterDirectWithKey()helperPrivateKeyFile/PrivateKeyInputstruct fieldspaths. Now: remote-signer required; clear error if missing.
cmd/obol/sell_test.go: tests forreadPrivateKeyMaterialand the--private-key-fileflag presence.What stayed
registerSponsored,registerDirectViaSigner) — already correct, untouched.obol wallet import— already existed (cmd/obol/wallet.go:44,internal/hermes/wallet_import.go), untouched.MetadataSynced,RegistrationTxHash,RegistrationSearchFromBlockfields on the CRD — kept for compat; no longer written by the controller.
Flow rewrites
flow-11-dual-stack.sh: insertalice wallet import --instance obol-agent --private-key-file <file> --forcebeforealice sell http. Drop--private-key-filefrom thesell httpcall. Bob already had thispattern; Alice now matches.
flow-14-live-obol-base-sepolia.sh: same pattern beforeobol sell register. Drop--private-key-filefrom the call.flow-13-dual-stack-obol.sh: comment refreshed to stop referencing thedeleted flag.
.agents/skills/obol-stack-dev/SKILL.md: ERC-8004 prerequisites sectionrewritten to describe the remote-signer-only path. Setmetadata-revert
section credits fix(erc8004): wait for read-side consistency before setMetadata #387's
WaitForAgentas the actual fix and demotes the"stale Anvil fork" hypothesis from the original PR description to a
less-likely cause.
Group B — Drop
internal/network/probe.go+obol network statuschainId columnOisin:
Reasoning kept in commit:
eth_chainIdis a poor liveness probe — can't catch lagging upstreams.real chain — exactly the false-positive case the probe was built for.
parse those, not redo the work badly in-process.
ERC721NonexistentTokenon a freshly-minted agent ID) was eRPC read-side staleness, fixed by
fix(erc8004): wait for read-side consistency before setMetadata #387's
WaitForAgent.What got cut
internal/network/probe.go(deleted)internal/network/probe_test.go(deleted)cmd/obol/network.go:--no-probeand--probe-timeoutflags,renderUpstreamProbes(),uiPrinterinterface,timeimport.flow-02 and flow-05 only grep for
eRPC|Pod|Upstreamheaders, whichremain — no flow regressions.
Group C — Strip Helm-ownership migration code
Oisin:
kubectl-patchfield manager getting on the ConfigMap was a historicalcase (pre-
--field-manager=helmmodel.go). Current code uses--field-manager=helmso the trigger is unreachable.migrateBaseHelmOwnership) had noresilience case — pure one-off.
internal/embed/infrastructure/base/templates/llm.yaml:60–77showsthe base ConfigMap only contains
paid/*. Without preserve+restore,every
obol stack upwould wipe user-added cloud providers and customendpoints. Kept.
What got cut
migrateBaseHelmOwnership()+baseHelmResource+namespaceArgs()(whole block)
needsLiteLLMConfigHelmMigration()and the conditional delete-and-restorebranch in
preserveLiteLLMConfigForHelmTestNeedsLiteLLMConfigHelmMigrationWhat stayed (justified resilience)
preserveLiteLLMConfigForHelm— simplified to always snapshotrestoreLiteLLMConfig,mergeLiteLLMConfig,configMapFieldOwnershipManifestkubectl.ApplyServerSideForceConflicts(used by restore)--field-manager=helmpatches ininternal/model/model.go(one-linegood hygiene; matches helm's manager so future patches don't conflict)
TestMergeLiteLLMConfigPreservesChartDefaultsAndPreviousModelsi'm not worried about history neatness"
.agents/skills/obol-stack-dev/SKILL.md— Oisin: "the obol-stack-devskill is dev focused tbf so maybe okay"
Group D — Reclaim leaked dev k3d networks on
obol stack purgeOisin:
obol stack downcallsk3d cluster stopwhich preserves the networkfor
Upto resume; cleaning on Down would break the stop/resume workflow.Only Purge is wired up. The rare graceful-stop-fails-fall-through-to-delete
path inside Down is left to the next Purge to clean.
Implementation: lifted
cleanup_k3d_obol_networksfromflows/lib.shinto Go (
reclaimLeakedDevK3dNetworksininternal/stack/stack.go).Filters to
k3d-obol-stack-*networks, skips any with a live*-serverlbor
*-server-Nattachment, force-disconnects mirror containers, removes.Logs
"Reclaimed N leaked dev registry network(s)"only when a networkwas actually freed.
Added
TestHasLiveK3dClusterto pin the live-cluster heuristic.Group E — Qwen recommendation cleanup
Oisin:
Web-checked ollama.com/library:
Decision tree:
qwen3.6:27b17 GBqwen3.6:27b-coding-mxfp831 GB (was wrongly listed as ~13 GB in 026d30d's commit message)qwen3.5:9b6.6 GBqwen3.5:4b3.4 GB (replaces everyqwen3:0.6buser-facing reference)deepseek-r1:8b4.9 GBgemma3:4b3.3 GBinternal/model/rank.goandinternal/model/rank_test.godeliberatelyuntouched — the
qwen3:0.6breferences there are regression guards forusers who happen to have it pulled, not recommendations.
Group F — Delete
docs/plans/obol-x402-path-comparison.mdCLAUDE.md says plan/report docs don't belong in the repo. The 2026-04-11
benchmark write-up belongs in PR #386's body or a Linear thread, not
docs/plans/. Emptydocs/plans/directory removed too.Group G — This file
Per Oisin: "Add to your task list that you'll need to write to a file a
transcript of this review process, including stuff like this dialog and
why its being removed."
Lives under
plans/(untracked) so it doesn't violate the no-plan-docrule.
Things deliberately not changed
not on this branch and will need to rebase. Ownership of fixing it sits
with whoever rebases Add provider smokes and model preference #379. Note from Oisin: "okay, we'll fix this when
we tackle the obol model prefer branch, which isn't in this at all."
user lands on port 80 via Traefik. UX issue with
:8080on macOSpre-exists, covered already in CLAUDE.md.
the right url if we fail to bind."
frontend bump, i'm not worried about history neatness."
kept since the skill is dev-focused.
internal/model/rank.go/rank_test.go— left intact. Regressionguards for users with
qwen3:0.6bpulled, not recommendations.x402-verifier'sverifyOnly: trueinvariant — unchanged. Group Adid not touch the payment verification path.
review; Group A did not change.
Carry-overs / follow-up PRs
obol model prefer(Add provider smokes and model preference #379) withmodel.Rankwhen Add provider smokes and model preference #379rebases. Either Rank respects the user's explicit preference, or
Prefer writes a separate
preferred:field the agent reads first.go vetwarnings oninternal/enclave/enclave_darwin.goaboutunsafe.Pointerusage — pre-existing CGo Secure Enclave code,untouched here. Worth a separate cleanup pass.
obol stack downfallback path — the rarek3d cluster stopfailure → fallback tok3d cluster deletecouldleak a network in dev mode. Currently relies on the next Purge to
clean up. If this becomes a real problem, add a call there.
obol network statusreachability column — if we want one,parse eRPC's existing health metrics rather than redo the chainId
probe in-process.
Final status
go build ./... && go test ./...clean after every group. Pre-existinggo vetcomplaints on enclave Darwin code only.Files touched across all groups:
internal/serviceoffercontroller/controller.gocmd/obol/sell.go--private-key-fileflag + helperscmd/obol/sell_test.goflows/flow-11-dual-stack.shwallet importbeforesell httpflows/flow-13-dual-stack-obol.shflows/flow-14-live-obol-base-sepolia.shwallet importbeforesell register.agents/skills/obol-stack-dev/SKILL.mdinternal/network/probe.gointernal/network/probe_test.gocmd/obol/network.gointernal/stack/stack.gointernal/stack/stack_test.gointernal/stack/stack.goreclaimLeakedDevK3dNetworks(purge)internal/stack/stack_test.goTestHasLiveK3dClustercmd/obol/model.gointernal/openclaw/openclaw.gointernal/embed/skills/{sell,monetize-guide}/SKILL.mddocs/getting-started.mddocs/guides/monetize-inference.mdqwen3:0.6bupdatedCLAUDE.mdflows/lib.shflows/flow-03-inference.shinternal/openclaw/monetize_integration_test.godocs/plans/obol-x402-path-comparison.mddocs/plans/Six commits planned, one per group A–F. Group G (this transcript) is
intentionally not committed.