RFC: First-class vMCP support in the local CLI experience#59
RFC: First-class vMCP support in the local CLI experience#59
Conversation
jerm-dro
left a comment
There was a problem hiding this comment.
LGTM! The fact this already works for broodbox indicates this is very feasible.
|
|
||
| - Add a `thv vmcp` subcommand with `serve`, `validate`, and `init` sub-commands that integrate vMCP into the main CLI | ||
| - Support a zero-config quickstart: `thv vmcp serve --group <name>` should work without a config file for simple aggregation | ||
| - Bring the optimizer to the local experience with managed embedding service lifecycle (`--optimizer` flag auto-manages a TEI container) |
There was a problem hiding this comment.
You could more simply state the goal is to deprecate and later remove the legacy optimizer.
There was a problem hiding this comment.
Done, much cleaner framing. The Goals section now states: "Deprecate and eventually remove the legacy standalone mcp-optimizer Python project (StacklokLabs/mcp-optimizer) once the native vMCP optimizer is stable" (line 37). Thanks!
aponcedeleonch
left a comment
There was a problem hiding this comment.
Hey! Really solid RFC, the whole local experience story makes a ton of sense. Left some inline comments — mostly questions and suggestions. Nada heavy, just thinking out loud. 🤙
99da748 to
535fa98
Compare
Add RFC proposing integration of the Virtual MCP Server (vMCP) into the main thv CLI as a `thv vmcp` subcommand, bringing local optimizer support with managed TEI container lifecycle, and formalizing the library embedding pattern used by brood-box. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add goal to deprecate legacy mcp-optimizer Python project - Add --embedding-image flag for TEI container image configurability - Change TEI container naming to model-hash based for shared reuse - Switch from fixed port to random port via pkg/networking.FindAvailable() - Change from silent FTS5 fallback to fail-fast on TEI failure - Add pkg/labels/ integration for container lifecycle ownership - Remove commented-out backends from init output (single source of truth) - Expand platform considerations with Rosetta 2 emulation details - Note Unix socket transport as future security consideration - Add docs website update and legacy optimizer removal to Documentation - Add K8s dependency isolation as optional cleanup item Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
535fa98 to
d34bd76
Compare
yrobla
left a comment
There was a problem hiding this comment.
This is a well-structured RFC with a clear problem statement, a sensible three-tier optimizer design, and good backward-compatibility story. The shared pkg/vmcp/cli/ package is the right architectural call — it prevents logic duplication between thv vmcp and cmd/vmcp/. The library embedding section is also a welcome addition, since the brood-box pattern has been undocumented for a while.
I have five concrete issues to raise, in rough priority order. None are blockers for Draft status but a few (especially the informationalCommands contradiction and the TEI sharing gap) should be resolved before implementation starts.
| "registry": true, | ||
| "mcp": true, | ||
| "skill": true, | ||
| "vmcp": true, // new |
There was a problem hiding this comment.
Contradiction: vmcp marked informational but --optimizer-embedding requires the container runtime
The justification given for adding vmcp to informationalCommands is that "vMCP is a proxy that connects to already-running backends". That's true for thv vmcp validate and thv vmcp init, but thv vmcp serve --optimizer-embedding explicitly needs the container runtime to launch the managed TEI container — the very runtime the informational check is skipping.
This means a user with no container runtime who runs thv vmcp serve --optimizer-embedding will get a confusing late failure (can't start TEI container) instead of the clear early error the runtime check is supposed to provide.
Suggested fix: Don't add vmcp to informationalCommands as a blanket entry. Instead, handle the check at the serve subcommand level: skip the runtime check unless --optimizer-embedding is passed. validate and init can be registered individually as informational if that mechanism supports per-subcommand registration, or the check can be deferred to inside the serve handler.
There was a problem hiding this comment.
Yeah, you're right. The blanket entry hides exactly the failure mode the runtime check was designed to surface. So I split it up: validate and init stay informational (no runtime calls), and serve does the check inside its handler, required when --optimizer-embedding is set, skipped otherwise. A user without Docker who asks for the managed TEI container now hits an early, explicit error instead of a confusing late one. Lines 269-289 have the new wording. Thanks!
| - **Health check**: Poll `GET /health` with backoff until the TEI server reports ready. TEI must download and load the model on first start, which can take 30-60 seconds. | ||
| - **Port binding**: A random available port is allocated using ToolHive's existing `pkg/networking.FindAvailable()` pattern (the same mechanism `thv run` uses when no explicit port is given). The allocated port is reported in the logs. This avoids conflicts when multiple vMCP instances or other services are running. | ||
| - **Lifecycle coupling**: The TEI container is started before the vMCP server and stopped after it shuts down. If the TEI container fails to start or become healthy when `--optimizer-embedding` was explicitly requested, `thv vmcp serve` exits with a clear error message. The `--optimizer-embedding` flag is a clear signal that the user wants semantic search — silently degrading to FTS5-only would mask environment issues (Docker not running, port conflicts, etc.) and produce confusingly poor results. Users who want keyword-only search should use `--optimizer` explicitly. | ||
| - **Idempotent start**: If a TEI container with the matching name and ToolHive labels is already running (e.g., from a previous invocation or another vMCP instance with the same model), reuse it rather than creating a new one. |
There was a problem hiding this comment.
TEI container sharing has an unaddressed reference-counting gap
The idempotent-start behavior (reuse a running container if name + labels match) is a good idea for efficiency. But combined with the lifecycle-coupling behavior (stop TEI container on shutdown), this creates a race:
- Instance A (
--group default) starts a TEI container namedthv-embedding-bge-sm-a1b2c3. - Instance B (
--group engineering, same model) reuses that container. - Instance A shuts down gracefully → stops the TEI container.
- Instance B's semantic search now silently fails or returns errors.
The section says "The TEI container is started before the vMCP server and stopped after it shuts down", but it doesn't say what happens when the container is shared.
Suggested options (pick one):
- Label-based ref counting: increment/decrement a
thv-embedding-refcountlabel on the container. Only stop when count reaches 0. - No cleanup on shared containers: only stop the TEI container if the current process was the one that started it (track via an instance label). Shared containers are left running; document that users can
docker stopthem. - No sharing: use a unique container name per vMCP instance (e.g., include a random suffix) — simpler lifecycle, slightly more resource usage.
The simplest option is probably no-sharing with a per-instance name. The "avoid duplication" benefit of sharing is marginal for local dev use.
There was a problem hiding this comment.
Good catch, this is a real race and I didn't call it out properly. Went with the "owner-only cleanup" option: each vMCP instance tags the container it creates with a toolhive-embedding-owner=<instance-id> label, and only that instance stops it on exit. Instances that reused an existing container via idempotent-start leave it alone. The trade-off is we can leave containers running across sessions, but that's fine as long as we give users a way to clean them up, which we're now doing via thv vmcp cleanup (see my reply to your Open Question 6 comment). Updated lines 456-462. Thanks!
| -c, --config string Path to vMCP configuration file (optional) | ||
| -g, --group string ToolHive group to aggregate (default: "default") | ||
| --host string Host to bind to (default: "127.0.0.1") | ||
| -p, --port int Port to listen on (default: 4483) |
There was a problem hiding this comment.
Port 4483 collides for multiple concurrent thv vmcp serve instances
The RFC correctly uses FindAvailable() for the TEI container port, but the vMCP server itself defaults to a fixed port 4483. If a user has two groups running simultaneously — e.g., thv vmcp serve --group default and thv vmcp serve --group engineering — both default to 4483 and the second will fail to bind.
This is worth addressing explicitly since the "zero-config quickstart" story implies users will run this casually, without thinking about port management.
Suggestion: Either (a) auto-assign a random port when 4483 is taken (log the chosen port clearly), or (b) document prominently that users must specify -p for concurrent instances. Option (a) is friendlier for the quickstart UX but means the client config URL changes on each invocation — which matters for IDE config files (Open Question 2). The RFC should take a position.
There was a problem hiding this comment.
Good point, the quickstart story does imply casual usage and we should take a position. Landed on "fixed 4483, fail-fast with an actionable error" rather than auto-randomizing. Two reasons: most users run one vMCP per box and a predictable URL keeps the client-config story simple (ties into your Open Question 2 point too), and auto-randomizing would break every quickstart guide and every IDE config snippet we ask people to paste. When the port is already taken we exit with port 4483 already in use, try 'thv vmcp serve -p 0' to pick a free port so the power-user case (two groups side by side) is still one flag away. New "Port binding policy" subsection around line 167. Thanks!
|
|
||
| 2. **Config mode**: When `--config` is provided, the command loads the full YAML configuration, supporting all advanced features (aggregation policies, composite tools, auth, telemetry, etc.). This is identical to the standalone `vmcp serve` behavior. | ||
|
|
||
| Note that `groupRef` is a **required field** in the vMCP config (see `pkg/vmcp/config/config.go`). In quick mode, `--group` populates this field in the auto-generated config. In config mode, the group comes from the config file's `groupRef` field. If both `--config` and `--group` are provided, the CLI flag overrides the config file's `groupRef`, which is useful for reusing a config template across multiple groups. |
There was a problem hiding this comment.
--group overriding groupRef in a provided config file is a footgun
Allowing --group to silently override groupRef when --config is also provided is surprising. A config file is typically authored with a specific group in mind — it may have auth configurations, composite tool definitions, or backend-specific aggregation policies that only make sense for that group. Substituting a different groupRef while keeping the rest of the config intact could produce subtly wrong behavior (e.g., using auth credentials designed for group A's backends when talking to group B's backends).
Suggestion: Emit a warning (or require an explicit --force-group flag) when both --config and --group are provided. Alternatively, restrict the quick-mode --group flag to only work when --config is absent, and document that group selection with a config file should be done by editing groupRef in the file. The "template reuse" use case can still be served by thv vmcp init, which generates per-group configs.
There was a problem hiding this comment.
Agreed, the "useful for reusing a config template" framing was me rationalizing a footgun. Switched to "error if both are provided" rather than trying to warn. A config file often carries auth and aggregation policies that only make sense for a specific group, and swapping groupRef while leaving the rest alone is exactly the kind of subtle bug we want to avoid. The "template reuse" story is already covered by thv vmcp init, which generates a per-group config the user can then edit. Error message wording and justification are at lines 150-161. Thanks!
|
|
||
| 5. **Quick mode auth defaults**: Quick mode defaults to `anonymous` incoming auth and `unauthenticated` outgoing auth. Is this appropriate, or should it attempt to detect backend auth requirements from the group's workload configurations? | ||
|
|
||
| 6. **TEI container lifecycle on crash**: If `thv vmcp serve` is killed ungracefully (SIGKILL, OOM), the TEI container will be left running. Should a cleanup mechanism be added (e.g., check for orphaned `thv-embedding-*` containers on startup)? |
There was a problem hiding this comment.
Open Question 6 (orphan TEI containers) should be resolved in this RFC, not deferred
This is framed as a question but the answer is clearly "yes, we need cleanup" — orphaned TEI containers from SIGKILL or OOM will accumulate silently, each consuming ~500MB-1GB of memory for the model weights.
Leaving this unresolved means Phase 4 implementers will have to design and decide this under time pressure. The RFC is the right place to settle it.
Proposed resolution to include in the RFC: On thv vmcp serve startup, scan for containers with the toolhive-auxiliary label and a thv-embedding-* name that are not currently referenced by a running vMCP instance. Report them to the user (thv: found orphaned TEI container from previous session: thv-embedding-bge-sm-a1b2c3. Run 'thv vmcp cleanup' to remove it.). Add thv vmcp cleanup as a new subcommand (Phase 4 scope) that removes orphaned embedding containers and optionally the model cache volume. This pairs naturally with the idempotent-start behavior.
There was a problem hiding this comment.
Agreed, this is the right call. Promoted it out of Open Questions and into the body as a new thv vmcp cleanup subcommand with a supporting startup scan in thv vmcp serve. The mechanics: containers carry a toolhive-embedding-owner=<instance-id> label, cleanup enumerates thv-embedding-* containers and removes any whose owner is no longer alive, and the serve startup emits a one-line hint when it detects orphans (thv: found N orphaned embedding container(s)... run 'thv vmcp cleanup'). --model-cache and --dry-run flags are there too. This pairs with the owner-only cleanup story from your line-456 comment. See the new "Orphan cleanup: thv vmcp cleanup" subsection, and Open Question 6 is gone. Thanks!
yrobla
left a comment
There was a problem hiding this comment.
Deeper Review: Motivation, Concept, and Architecture
Building on my previous review (which covered five implementation-level issues), this review looks at the broader picture: whether the motivation holds up, whether the concept is the right one, and where the design has gaps that will create friction.
Motivation: Mostly solid, with one overstatement
The core problem statement is accurate and well-articulated. vMCP is genuinely invisible to thv users today, the config-file requirement creates friction for simple use cases, and the local optimizer story is a real gap. These are worth solving.
The one overstatement: "All ToolHive users who manage more than one MCP server locally are affected." The optimizer and composite tool features are only relevant to a subset of users. Most users with two or three MCP servers probably don't need tool aggregation at all — they just pick tools from individual servers manually. The stronger case is for users with 5+ servers where tool discovery becomes unwieldy. Worth calibrating the scope of who this is actually for, since it affects prioritization of tiers and the zero-config story.
Concept: thv vmcp is the right namespace — but the command verbs need work
I agree with the RFC's rejection of Alternative 3 (thv group serve): vMCP is a distinct concept with its own lifecycle, and grouping it under group would hide the advanced features. The thv vmcp namespace is correct.
However, thv vmcp serve as the zero-config quickstart is ergonomically awkward. vmcp is an implementation term — users think in terms of "aggregate my tools" or "unify my MCP servers", not "start a virtual MCP server". The RFC's goal is to make this a first-class experience, but naming it after an internal protocol concept creates a mismatch between user intent and command vocabulary. This is worth a deliberate UX decision, even if the conclusion is "keep vmcp" for consistency with the existing binary.
Architecture: Three areas need rethinking
1. pkg/vmcp/cli/ is a poor home for the EmbeddingServiceManager
The shared serve logic being in pkg/vmcp/cli/ is justifiable (it's reused by two cmd/ packages). But the EmbeddingServiceManager — which manages a container lifecycle — does not belong in a cli package. It is infrastructure, not CLI logic. Downstream library consumers (the very audience the library embedding section addresses) might want to embed this capability without any CLI dependency. Putting it in pkg/vmcp/cli/ signals "this is CLI plumbing, don't import it", which is the wrong signal.
Better split: pkg/vmcp/embedding/ for the EmbeddingServiceManager (callable by anyone who wants a managed TEI lifecycle), pkg/vmcp/cli/ for the config-from-flags assembly logic that only CLI callers need.
2. The optimizer UX is underspecified for local use
See inline comments (lines 341 and 454) for the two specific gaps: session-scoped index rebuild latency and first-run image pull time. These will be the two most common UX complaints after launch, and neither is addressed. The RFC references THV-0022 for the optimizer architecture but doesn't carry the local-UX implications of session-scoped indexing forward.
3. Library embedding: the stability table is declared but not enforced
See inline comment (line 636). "Officially supported" needs a versioning policy to be meaningful. Without one, this section creates expectations the project can't reliably honor.
Alternatives worth reconsidering (or addressing more directly)
Should thv vmcp serve also auto-register with client configs?
Open Question 2 asks whether thv vmcp serve should write the endpoint into Claude Code / VS Code config files. This is actually the most important UX question in the RFC and it's buried as an open question. If the answer is yes, it dramatically changes the scope of this RFC (and the port-stability question from my previous review becomes critical — a fixed port 4483 becomes load-bearing infrastructure). If the answer is no (for now), say so explicitly and explain the tradeoff.
Should zero-config quickstart be thv group serve as a thin alias?
Alternative 3 is dismissed but a thin alias thv group serve <name> → thv vmcp serve --group <name> would dramatically improve discoverability for the simple case. Users discovering groups via thv group ls would naturally try thv group serve. The two approaches aren't mutually exclusive — thv vmcp can have the full feature set while thv group serve provides the most common entry point. This is worth a sentence in the RFC even if rejected.
Structural simplification opportunity
Phases 1-3 (extract logic, add subcommands, quick mode) are implementation work. Phase 4 (optimizer with managed TEI) is a substantially larger feature with its own UX, lifecycle, and operational surface area. The RFC would be stronger — and easier to implement incrementally — if Phase 4 had its own RFC (or at minimum a clear "Phase 4 is a stretch goal" signal). As written, the detailed TEI design is given equal weight to the core thv vmcp subcommand, which makes the RFC harder to approve because reviewers must evaluate two different scope/risk levels at once.
Summary of new inline comments below
- Line 190 —
thv vmcp initbehavior when no workloads are running is unspecified - Line 284 —
pkg/vmcp/cli/mixes CLI and infrastructure concerns;EmbeddingServiceManagershould be inpkg/vmcp/embedding/ - Line 341 — Session-scoped FTS5 index rebuild latency is not addressed for local use
- Line 454 — First-run UX: image pull time is omitted; only model download time is mentioned
- Line 636 — Stability table needs an explicit versioning/release policy to be meaningful
|
|
||
| This command: | ||
| 1. Discovers all running workloads in the specified group via `groups.NewCLIManager()` and `workloads.NewManager()` | ||
| 2. Generates a YAML config with backends pre-populated from the discovered workloads |
There was a problem hiding this comment.
thv vmcp init behavior when no workloads are running is unspecified
The command discovers "all running workloads in the specified group". But what if the user runs thv vmcp init --group engineering before they've started any MCP servers? This is a plausible scenario — scaffolding the config before starting servers, or doing so in a CI environment without running containers.
The RFC should specify: does init fail if no workloads are found? Does it generate an empty but valid config with a comment explaining that backends are discovered at runtime? Does it warn the user?
Suggested behavior: init should succeed even with zero running workloads, generating a valid config with an empty backends list and a comment: # No running workloads found in group 'engineering'. Start servers with 'thv run' and re-run 'thv vmcp init' to populate backends. This makes the command useful for config templating without requiring a live environment.
There was a problem hiding this comment.
Good point, I had this case in my head but never wrote it down. Spec'd it explicitly: thv vmcp init --group <name> succeeds even when the group is empty or doesn't exist yet, emits a valid config with empty backends and a comment pointing at thv run --group ... as the next step, and prints a single stderr warning so the result isn't silent. That keeps init useful for config templating in CI without pretending everything's fine. Added right after the init bullets. Thanks!
| ``` | ||
|
|
||
| **`pkg/vmcp/cli/`** (new package) | ||
|
|
There was a problem hiding this comment.
pkg/vmcp/cli/ mixes CLI-assembly logic with infrastructure; EmbeddingServiceManager belongs in pkg/vmcp/embedding/
The shared serve/validate/init logic in pkg/vmcp/cli/ makes sense: two cmd/ packages need to share it, and pkg/ is the idiomatic Go location for shared library code. But placing the EmbeddingServiceManager in the same cli package creates a layering problem.
The EmbeddingServiceManager is container infrastructure — it pulls an image, starts a container, polls health, manages a named volume. This capability is useful independently of any CLI: a test harness, a desktop app, or a library consumer might want to programmatically start a managed TEI container without going through the CLI layer.
Putting it in pkg/vmcp/cli/ signals "don't import this outside CLI code", which is wrong. The existing library embedding guidance tells consumers to only import pkg/vmcp/* packages. If they ever want a managed embedding lifecycle, they'd have to go through the cli package, which is surprising.
Suggested split:
pkg/vmcp/embedding/—EmbeddingServiceManager,EmbeddingServiceConfig, container lifecycle management. Clean dependency: onlypkg/container/andpkg/labels/.pkg/vmcp/cli/— flag-to-config assembly (ServeConfigfrom cobra flags), quick-mode config generation,Serve(),Validate(),Init(). Clean dependency:pkg/vmcp/embedding/andpkg/vmcp/.
This also prepares the ground for exposing EmbeddingServiceManager to library consumers (your stability table could add pkg/vmcp/embedding as Stable once proven).
There was a problem hiding this comment.
Yeah, you're right. Putting infrastructure in a package named cli was a layering smell I wrote into the RFC without thinking. Split into two packages:
pkg/vmcp/embedding/— container lifecycle (pull, start, health, owner-only cleanup, orphan enumeration). Depends only onpkg/container/andpkg/labels/. Usable by test harnesses, desktop apps, and future library consumers without dragging in cobra.pkg/vmcp/cli/— CLI-assembly only: flag translation, quick-mode config generation,Serve()/Validate()/Init(). Depends onpkg/vmcp/embedding/andpkg/vmcp/.
Also added both packages to the stability table (embedding = Beta until Phase 4 lands, cli = Internal) and updated the Phase 4 implementation plan accordingly. Lines 338 (cli scoping note), 528 onward (new embedding section), and the stability table. Thanks!
| - **`call_tool`** — Dynamic invocation of any backend tool by name | ||
|
|
||
| The optimizer uses an in-memory SQLite FTS5 index for keyword search (BM25 ranking) and optionally an external embedding service for semantic vector search. The SQLite store runs inside the vMCP process — no additional infrastructure is needed for keyword-only mode. For hybrid search (keyword + semantic), an external HuggingFace Text Embeddings Inference (TEI) service provides vector embeddings. | ||
|
|
There was a problem hiding this comment.
Session-scoped FTS5 index rebuild latency is not addressed for local use
THV-0022 (referenced at the end of the RFC) describes the optimizer as having a session-scoped SQLite index: the FTS5 index is built per MCP client session, from scratch, by indexing all aggregated tool descriptions at session init time.
For a local developer with many backends (say 5 servers × 10-20 tools each = 50-100 tools), this index rebuild happens every time a new conversation starts in Claude Code or another client. For keyword-only FTS5 it's fast. For semantic search (--optimizer-embedding), it means embedding 50-100 tool descriptions via HTTP calls to the TEI service at the start of every session — potentially seconds of latency before the first tool call succeeds.
The RFC doesn't acknowledge this tradeoff or propose any mitigation (e.g., persistent index across sessions, warm-up on server start rather than on client connect, pre-indexing at vMCP startup). This is going to be one of the first UX complaints after launch for users with more than ~30 tools.
The RFC should address this: Either (a) confirm that session-scoped indexing is acceptable for the local use case and explain why, (b) propose that the local vMCP pre-warms the index at server startup so the first client connection doesn't pay the full indexing cost, or (c) add this as an explicit open question.
There was a problem hiding this comment.
Great catch, this would 100% have been the first UX complaint after launch. The session-scoped indexing model in THV-0022 is fine for keyword-only (FTS5 over 100 tools is sub-second) but it's rough for semantic search, where every new Claude Code conversation would pay "embed 50-100 tool descriptions" before the first find_tool works.
Resolved in the RFC by pre-warming at server start: thv vmcp serve --optimizer-embedding builds the FTS5 index and populates an embedding cache keyed on (model, tool-description-hash) once, at boot, and gates /readyz on it. Sessions still get their own logical view, but the embeddings themselves are pure functions of the input so they're cached across sessions. Per-session cost drops from "embed N tools" to "look up N hashes." Added a new "Index warm-up for the local case" subsection with the trade-offs (slower startup, tiny memory overhead, cache invalidation on model swap). Thanks!
|
@yrobla thanks for the deeper pass, these five were sharper than the inline ones and worth addressing directly in the RFC. Working through them: 1. Motivation overstatement. Fair call, this was me writing the Problem Statement with too much confidence. Narrowed the last paragraph (line 30) to say severity scales with the number of servers: 2-3 you pick tools manually and barely notice, 5+ is where tool discovery and context-window cost start hurting. That lines up with the "50-100 tools from 5-10 backends" threshold the optimizer section already uses around line 398. 2. 3. Open Question 2 (client config auto-registration) is buried. Yeah, and looking at it again the answer is clearly yes. 4. 5. Phase 4 scope signal. Legitimate concern on paper, but Phase 4 is smaller than it reads. Most of it is wiring up code that already exists in the toolhive repo: Will push the RFC changes in the next commit. Thanks again for the thorough review. |
Summary
thvCLI as athv vmcpsubcommand (serve, validate, init)--optimizer), managed TEI container (--optimizer-embedding), and full config controlvmcpbinary for K8s deployments and backwards compatibilityKey Design Decisions
thv vmcp serve --group defaultworks without a config filepkg/vmcp/cli/: Boththv vmcpand standalonevmcpdelegate to the same package--optimizer-embeddingauto-manages a TEI container with named volume for model caching, health polling, and graceful fallback to FTS5-onlyRelated RFCs
🤖 Generated with Claude Code