Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 67 additions & 6 deletions .agents/skills/obol-stack-dev/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,67 @@ Tier 2: Per-Instance Tier 1: Cluster-Wide Gateway
| Integration testing | `references/integration-testing.md` |
| Troubleshooting | `references/troubleshooting.md` |

## Dev Registry Cache

When `OBOL_DEVELOPMENT=true`, `obol stack up` provisions pull-through k3d registry caches before creating a new cluster. Current mirrors:

- `docker.io` -> `k3d-obol-docker-io.localhost:54100`
- `ghcr.io` -> `k3d-obol-ghcr-io.localhost:54101`
- `quay.io` -> `k3d-obol-quay-io.localhost:54102`

The generated registry config lives at `$OBOL_CONFIG_DIR/registries.yaml`. Cached image layers are stored under `~/.local/state/obol/registry-cache/` by default, or under `OBOL_REGISTRY_CACHE_DIR` if set.

Use this mental model:

- Fresh dev cluster: new cluster creation gets `--registry-config` and `--registry-use` entries, so pulls benefit from the cache.
- Existing dev cluster: `obol stack up` only starts the cluster and does not re-run registry setup.
- This is an upstream pull cache, not a dedicated local-build publishing workflow.

## Existing Dev Stack Refresh

When testing a new frontend or stack branch against an already-initialized `.workspace/config`, rebuild the local CLI before `stack up`. Current `obol stack up` refreshes `$OBOL_CONFIG_DIR/defaults` when the embedded infrastructure digest, backend, or stack ID changes, then preserves mutable LiteLLM model entries across Helm sync. If testing raw file edits without rebuilding the CLI, patch the generated defaults copy manually or re-run after rebuilding.

```bash
# Prefer origin-only fetches in this checkout. Some Radicle remotes may be stale
# and can make `git fetch --all` fail after GitHub has already fetched.
git fetch origin --prune

# Rebuild the local CLI from the current branch.
go build -o .workspace/bin/obol ./cmd/obol

# For a released frontend image, verify and pull the exact tag first.
docker manifest inspect obolnetwork/obol-stack-front-end:v0.1.17-rc.5 >/dev/null
docker pull obolnetwork/obol-stack-front-end:v0.1.17-rc.5

# Confirm the embedded source has the intended image tag before rebuilding.
rg -n 'obol-stack-front-end|tag:' internal/embed/infrastructure/values/obol-frontend.yaml.gotmpl

OBOL_CONFIG_DIR="$PWD/.workspace/config" \
OBOL_BIN_DIR="$PWD/.workspace/bin" \
OBOL_DATA_DIR="$PWD/.workspace/data" \
.workspace/bin/obol stack up
```

Expected verification for frontend image refresh:

```bash
OBOL_CONFIG_DIR="$PWD/.workspace/config" OBOL_BIN_DIR="$PWD/.workspace/bin" OBOL_DATA_DIR="$PWD/.workspace/data" \
.workspace/bin/obol kubectl -n obol-frontend get deploy obol-frontend-obol-app \
-o jsonpath='{.spec.template.spec.containers[*].image}{"\n"}'

OBOL_CONFIG_DIR="$PWD/.workspace/config" OBOL_BIN_DIR="$PWD/.workspace/bin" OBOL_DATA_DIR="$PWD/.workspace/data" \
.workspace/bin/obol kubectl -n obol-frontend rollout status deploy/obol-frontend-obol-app --timeout=180s

curl -sS -I --max-time 10 http://obol.stack
curl -sS --max-time 15 http://obol.stack/api/agents/instances
```

Known existing-stack migration failures:

- `Namespace "hermes-obol-agent" ... exists and cannot be imported into the current release`: the namespace or monetize RBAC predated Helm ownership. Current `obol stack up` adopts known base-owned resources before Helm sync. If doing it manually, label and annotate the existing resource with `app.kubernetes.io/managed-by=Helm`, `meta.helm.sh/release-name=base`, and `meta.helm.sh/release-namespace=kube-system`.
- `conflict with "kubectl-patch" ... llm/litellm-config .data.config.yaml`: older writers used a non-Helm field manager for `data.config.yaml`, which conflicts with Helm server-side apply. Current writers use Helm's field manager. During `obol stack up`, the existing LiteLLM config is backed up and previous model entries are merged into the new chart config; if a non-Helm manager is detected, the ConfigMap is deleted before Helm sync so ownership is recreated cleanly.
- `/etc/hosts` updates require interactive sudo. Codex cannot satisfy the password prompt in non-interactive execution; if DNS fails in the browser, run `obol stack up` or `obol hermes sync obol-agent` from a normal terminal, or manually add `127.0.0.1 obol-agent.obol.stack` and flush local DNS.

## 4 Inference Paths (All Through LiteLLM)

| Path | Model Name | LiteLLM model_list | Example |
Expand All @@ -80,7 +141,7 @@ All 4 paths use the same OpenClaw config pattern:
### Paid Routing Notes

- The paid path uses the **Obol LiteLLM fork** because paid-model lifecycle relies on the config-only model management API.
- `litellm-config` carries one static route: `paid/* -> openai/* -> http://127.0.0.1:8402`.
- `litellm-config` carries one static route: `paid/* -> openai/* -> http://127.0.0.1:8402/v1`.
- `x402-buyer` runs as a **sidecar in the LiteLLM pod**, not as a separate Service.
- `buy.py buy` signs auths locally and creates a `PurchaseRequest`; the controller writes per-upstream buyer files and keeps LiteLLM model entries in sync.
- The currently validated local OSS model is `qwen3.5:9b`. Prefer that exact model in live commerce tests.
Expand Down Expand Up @@ -125,9 +186,9 @@ go test -tags integration -v -timeout 10m ./internal/openclaw/ # Integration te
go test -tags integration -v -run TestIntegration_Tunnel_SellDiscoverBuySidecar_QuotaAndBalance -timeout 30m ./internal/openclaw/
```

## OpenClaw Skills System
## Agent Skills System

Skills are SKILL.md files (with optional scripts and references) that give the agent domain-specific capabilities. Delivered via host-path PVC injection to `/data/.openclaw/skills/` inside the pod.
Skills are SKILL.md files (with optional scripts and references) that give the agent domain-specific capabilities. Hermes receives embedded Obol skills through native `skills.external_dirs` at `/data/.hermes/obol-skills` with `OBOL_SKILLS_DIR` set. OpenClaw receives embedded skills through host-path PVC injection to `/data/.openclaw/skills/`.

### Default Embedded Skills

Expand Down Expand Up @@ -242,9 +303,9 @@ obol sell http qwen35 \
--upstream ollama --port 11434 --namespace llm --health-path /api/tags \
--per-request "0.001" --chain "base-sepolia" --wallet "0x<wallet>"

# Trigger reconciliation (or wait for heartbeat)
obol kubectl exec -n openclaw-obol-agent deploy/openclaw -c openclaw -- \
python3 /data/.openclaw/skills/monetize/scripts/monetize.py process qwen35 --namespace llm
# Trigger reconciliation from the default Hermes agent pod
obol kubectl exec -n hermes-obol-agent deploy/hermes -c hermes -- \
python3 /data/.hermes/obol-skills/monetize/scripts/monetize.py process qwen35 --namespace llm

# Verify 402
curl -X POST http://obol.stack:8080/services/qwen35/v1/chat/completions \
Expand Down
2 changes: 1 addition & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
.workspace/
.workspace*
.worktrees/
32 changes: 26 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Project Overview

Obol Stack: framework for AI agents to run decentralised infrastructure locally. k3d cluster with OpenClaw AI agent, blockchain networks, payment-gated inference (x402), Cloudflare tunnels. CLI: `github.com/urfave/cli/v3`.
Obol Stack: framework for AI agents to run decentralised infrastructure locally. k3d cluster with a Hermes default AI agent, optional OpenClaw instances, blockchain networks, payment-gated inference (x402), and Cloudflare tunnels. CLI: `github.com/urfave/cli/v3`.

## Conventions

Expand Down Expand Up @@ -60,6 +60,7 @@ obol
├── agent init (deploys obol-agent singleton)
├── network list, install, add, remove, status, sync, delete
├── sell inference, http, list, status, stop, delete, pricing, register
├── hermes onboard, setup, sync, list, delete, token, wallet, skills
├── openclaw onboard, setup, sync, list, delete, dashboard, cli, token, skills
├── model setup, status
├── app install, sync, list, delete
Expand All @@ -73,7 +74,7 @@ obol

Deployed on `obol stack up` from `internal/embed/infrastructure/`. Key templates in `base/templates/`: `llm.yaml` (LiteLLM + Ollama), `x402.yaml` (verifier + serviceoffer-controller), `obol-agent.yaml` (singleton), `serviceoffer-crd.yaml`, `registrationrequest-crd.yaml`, `obol-agent-monetize-rbac.yaml`, `local-path.yaml`. Plus `cloudflared/` chart and `values/` for eRPC, monitoring, frontend.

Components: eRPC (`erpc` ns), Frontend (`obol-frontend` ns), Cloudflared (`traefik` ns), Monitoring/Prometheus (`monitoring` ns), LiteLLM (`llm` ns), x402-verifier (`x402` ns), serviceoffer-controller (`x402` ns), obol-agent (`openclaw-obol-agent` ns), ServiceOffer + RegistrationRequest CRDs.
Components: eRPC (`erpc` ns), Frontend (`obol-frontend` ns), Cloudflared (`traefik` ns), Monitoring/Prometheus (`monitoring` ns), LiteLLM (`llm` ns), x402-verifier (`x402` ns), serviceoffer-controller (`x402` ns), default obol-agent (`hermes-obol-agent` ns), ServiceOffer + RegistrationRequest CRDs.

## Monetize Subsystem

Expand All @@ -83,7 +84,7 @@ Payment-gated access to cluster services via x402 (HTTP 402 micropayments, USDC

**Buy-side flow**: `buy.py probe` sees 402 pricing → `buy.py buy` pre-signs ERC-3009 auths into a `PurchaseRequest` CR in the agent namespace → serviceoffer-controller writes buyer config/auth files into `llm` and publishes `paid/<remote-model>` → the in-pod `x402-buyer` sidecar spends one auth per paid request. Agent-managed refill runs through `buy.py process --all`, not the controller.

**buy.py** lives at `/data/.openclaw/skills/buy-inference/scripts/buy.py` inside the agent pod (skill name: `buy-inference`, not `buy`). Commands:
**buy.py** lives at `${OBOL_SKILLS_DIR:-/data/.openclaw/skills}/buy-inference/scripts/buy.py` inside the agent pod (skill name: `buy-inference`, not `buy`). Commands:
```
probe <endpoint-url> [--model <id>] Probe x402 pricing from a 402 endpoint
buy <name> --endpoint <url> --model <id> Pre-sign ERC-3009 auths + create PurchaseRequest
Expand Down Expand Up @@ -179,6 +180,21 @@ k3d: 1 server, ports 80:80 + 8080:80 + 443:443 + 8443:443, `rancher/k3s:v1.35.1-

**Local access**: On macOS, port 80 is privileged and may not bind without root. Always use `http://obol.stack:8080/` (not `http://obol.stack/`) for local browser and curl access. Port 8080 maps to the same Traefik load balancer as port 80.

### Dev Registry Cache

When `OBOL_DEVELOPMENT=true`, `obol stack up` creates pull-through k3d registry caches and wires new clusters to use them on image pulls:

- `docker.io` -> `k3d-obol-docker-io.localhost:54100`
- `ghcr.io` -> `k3d-obol-ghcr-io.localhost:54101`
- `quay.io` -> `k3d-obol-quay-io.localhost:54102`

The generated k3d registry config is written to `$OBOL_CONFIG_DIR/registries.yaml`. Cache data is stored under `~/.local/state/obol/registry-cache/` by default, or under `OBOL_REGISTRY_CACHE_DIR` when set.

Important caveats:

- This is a pull-through cache for upstream registries, not a first-class local build registry workflow.
- It is only set up during cluster creation. If `obol stack up` is just starting an existing k3d cluster, registry setup is skipped.

## LLM Routing

**Service access from the Mac host** — not every cluster service is reachable via `obol.stack:8080`. Only routes published through Traefik are externally accessible. Everything else is ClusterIP-only and requires `kubectl port-forward`:
Expand All @@ -194,7 +210,7 @@ k3d: 1 server, ports 80:80 + 8080:80 + 443:443 + 8443:443, `rancher/k3s:v1.35.1-

**x402-buyer sidecar is distroless** — no `wget`, `curl`, or shell inside the container. Use port-forward from the host, not `kubectl exec`.

**LiteLLM gateway** (`llm` ns, port 4000): OpenAI-compatible proxy routing to Ollama/Anthropic/OpenAI. ConfigMap `litellm-config` (YAML config.yaml with model_list), Secret `litellm-secrets` (master key + API keys). Auto-configured with Ollama models during `obol stack up` (no manual `obol model setup` needed). `ConfigureLiteLLM()` patches config + Secret + restarts or hot-adds via the LiteLLM model API. Paid remote inference uses the Obol LiteLLM fork plus the `x402-buyer` sidecar, with a static `paid/* -> openai/* -> http://127.0.0.1:8402` route and explicit paid-model entries when needed. OpenClaw always routes through LiteLLM (openai provider slot), never native providers; `dangerouslyDisableDeviceAuth` is enabled for Traefik-proxied access.
**LiteLLM gateway** (`llm` ns, port 4000): OpenAI-compatible proxy routing to Ollama/Anthropic/OpenAI. ConfigMap `litellm-config` (YAML config.yaml with model_list), Secret `litellm-secrets` (master key + API keys). Auto-configured with Ollama models during `obol stack up` (no manual `obol model setup` needed). `ConfigureLiteLLM()` patches config + Secret + restarts or hot-adds via the LiteLLM model API. Paid remote inference uses the Obol LiteLLM fork plus the `x402-buyer` sidecar, with a static `paid/* -> openai/* -> http://127.0.0.1:8402` route and explicit paid-model entries when needed. Hermes uses a custom OpenAI-compatible provider pointed at LiteLLM; optional OpenClaw instances use the OpenAI provider slot. `dangerouslyDisableDeviceAuth` is enabled for Traefik-proxied access.

**Auto-configuration**: During `obol stack up`, `autoConfigureLLM()` detects host Ollama models and patches LiteLLM config so agent chat works immediately without manual `obol model setup`. During install, `obolup.sh` `check_agent_model_api_key()` reads `~/.openclaw/openclaw.json` agent model, resolves API key from environment (`ANTHROPIC_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN` for Anthropic; `OPENAI_API_KEY` for OpenAI), and exports it for downstream tools.

Expand All @@ -204,9 +220,13 @@ k3d: 1 server, ports 80:80 + 8080:80 + 443:443 + 8443:443, `rancher/k3s:v1.35.1-

`obol sell inference` — standalone OpenAI-compatible HTTP gateway with x402 payment gating, for bare metal / Secure Enclave. `--vm` flag runs Ollama in Apple Containerization Linux VM. Key code: `internal/inference/` (gateway, container, store) and `internal/enclave/` (Secure Enclave signing via CGo/Security.framework on Darwin, stub fallback elsewhere).

## OpenClaw & Skills
## Agent Runtimes & Skills

Hermes is the stack-managed default runtime. Default instance state lives under `applications/hermes/obol-agent`, namespace `hermes-obol-agent`, service/deployment `hermes`, ConfigMap `hermes-config`, and PVC path `$DATA_DIR/hermes-obol-agent/hermes-data/.hermes`.

OpenClaw remains an optional manual runtime. OpenClaw instances live under `applications/openclaw/<id>`, namespace `openclaw-<id>`, service/deployment `openclaw`, and ConfigMap `openclaw-config`.

Skills = SKILL.md + optional scripts/references, embedded in `obol` binary (`internal/embed/skills/`, 23 skills). Delivered via host-path PVC injection to `$DATA_DIR/openclaw-<id>/openclaw-data/.openclaw/skills/`. Categories: Infrastructure (ethereum-networks, ethereum-local-wallet, obol-stack, distributed-validators, monetize, discovery), Ethereum Dev (addresses, building-blocks, concepts, gas, indexing, l2s, orchestration, security, standards, ship, testing, tools, wallets), Frontend (frontend-playbook, frontend-ux, qa, why).
Obol skills = SKILL.md + optional scripts/references, embedded in `obol` binary (`internal/embed/skills/`). Hermes receives them via native `skills.external_dirs` at `/data/.hermes/obol-skills` with `OBOL_SKILLS_DIR` set to that path. OpenClaw receives them via PVC injection at `/data/.openclaw/skills`.

**Monetize skill** (`internal/embed/skills/monetize/`): thin compatibility wrapper around ServiceOffer CRUD, controller waiting, and `/skill.md` publication.

Expand Down
48 changes: 23 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,15 @@ obol version
obol stack init
obol stack up

# Set up your AI agent (interactive — choose a model provider)
# Apply agent capabilities to the default stack-managed agent
obol agent init

# Open the agent dashboard
obol openclaw dashboard default
# Inspect the default Hermes agent
obol hermes list
obol hermes token obol-agent
```

The agent init flow will configure [OpenClaw](https://openclaw.ai) with your chosen model provider (Ollama, Anthropic, or OpenAI) and deploy it to the cluster.
`obol stack up` provisions the default [Hermes Agent](https://github.com/NousResearch/hermes-agent) runtime behind LiteLLM. `obol agent init` applies the controller-based agent capabilities used for monetization and reconciliation.

## Blockchain Networks

Expand Down Expand Up @@ -142,39 +143,36 @@ obol model setup --provider openai --api-key sk-proj-...
obol model status
```

`model setup` patches the LiteLLM config and Secret with your API key, adds the model to the gateway, restarts LiteLLM, and syncs all deployed OpenClaw instances.
`model setup` patches the LiteLLM config and Secret with your API key, adds the model to the gateway, restarts LiteLLM, and syncs the stack-managed Hermes default agent.

## OpenClaw AI Agent
## AI Agent Runtimes

[OpenClaw](https://openclaw.ai) is the AI agent deployed by the stack. Multiple instances can run side-by-side, each with its own model provider configuration.
Hermes is the default AI agent runtime deployed by the stack as `obol-agent`. OpenClaw remains available as an optional manual runtime. Multiple Hermes and OpenClaw instances can run side-by-side.

```bash
# Create and deploy an instance (interactive provider setup)
obol openclaw onboard
# Default stack-managed Hermes agent
obol hermes list
obol hermes token obol-agent
obol hermes skills list

# Create and deploy an additional Hermes instance
obol hermes onboard --id research

# Reconfigure model provider for an existing instance
obol openclaw setup
# Create and deploy an optional OpenClaw instance
obol openclaw onboard

# List instances
# List optional OpenClaw instances
obol openclaw list

# Open the web dashboard
# Open the OpenClaw web dashboard
obol openclaw dashboard

# Manage skills (add, remove, list)
obol openclaw skills list
obol openclaw skills add <package>
obol openclaw skills remove <name>

# Remove an instance
obol openclaw delete --force
```

When only one OpenClaw instance is installed, the instance ID is optional — it is auto-selected. With multiple instances, specify the name: `obol openclaw setup prod`.
When only one runtime-specific instance is installed, the instance ID is optional. With multiple instances, specify the name: `obol hermes sync research` or `obol openclaw setup prod`.

### Skills

OpenClaw ships with 21 embedded skills that are installed automatically on first deploy. Skills give the agent domain-specific capabilities — from querying blockchains to understanding Ethereum development patterns.
The stack ships with embedded Obol skills that are installed automatically for the default Hermes agent and for OpenClaw instances. Skills give the agent domain-specific capabilities — from querying blockchains to understanding Ethereum development patterns.

#### Infrastructure

Expand Down Expand Up @@ -295,8 +293,8 @@ obol sell list
obol sell status <offer-name> -n <namespace>

# 4) Buyer wallet and balances are available.
obol kubectl exec -n openclaw-obol-agent <openclaw-pod> -- \
python3 /data/.openclaw/skills/buy-inference/scripts/buy.py balance
obol kubectl exec -n hermes-obol-agent deploy/hermes -c hermes -- \
python3 /data/.hermes/obol-skills/buy-inference/scripts/buy.py balance
```

Run the paid tests only after all four checks pass.
Expand Down
Loading
Loading