Skip to content

Latest commit

 

History

History
193 lines (144 loc) · 9.51 KB

File metadata and controls

193 lines (144 loc) · 9.51 KB

Backend Overview

The LLM Interactive Proxy supports multiple backend providers, allowing you to route requests to different LLM services while maintaining a consistent front-end API. This flexibility enables you to choose the best provider for your use case, switch providers without changing client code, and implement failover strategies.

Supported Backends

Backend IDs are the type: values in YAML and the backend_type carried on requests. Core connectors live in this repository and are always import-registered. OAuth plugin connectors ship in the sibling package llm-interactive-proxy-oauth-connectors and register when you install the optional extra, for example pip install "llm-interactive-proxy[oauth]" (see pyproject.toml optional dependency oauth).

Core connectors (this repository)

Backend ID Provider Authentication Best For
openai OpenAI API Key Production applications, standard OpenAI models
openai-responses OpenAI API Key Same credentials as OpenAI; targets /v1/responses for structured outputs (see OpenAI backend)
openai-codex OpenAI (ChatGPT / Codex CLI) Local OAuth token ChatGPT login instead of an API key
anthropic Anthropic API Key Claude via the standard Anthropic API
gemini Google Gemini API Key Metered API usage, production apps
gemini-cli-acp Google Gemini (ACP via Gemini CLI) Local OAuth token Sub-agents and tooling via Gemini CLI
cursor-cli-acp Cursor (ACP via Cursor CLI agent acp) Local Cursor login (agent login) Cursor-hosted models through the official CLI; requires agent on PATH or CURSOR_AGENT_BIN
gemini-cli-cloud-project Google Gemini (GCP) OAuth + GCP project Enterprise / team billing on Vertex-style flows
openrouter OpenRouter API Key Many third-party hosted models behind one API
nvidia NVIDIA (NIM / OpenAI-compatible) API Key (NVIDIA_API_KEY) NVIDIA integrator or self-hosted NIM
zenmux ZenMux API Key OpenAI-compatible ZenMux router
zai ZAI API Key Zhipu / Z.ai
zai-coding-plan ZAI Coding Plan API Key Coding-plan SKU / workflows
kimi-code Kimi API Key Kimi For Coding (OpenAI-compatible)
opencode-go OpenCode Go API Key OpenCode Go with internal OpenAI/Anthropic-style routing
minimax Minimax API Key Minimax models
internlm InternLM API Key (rotation supported) InternLM with optional key rotation
ollama Ollama None (local) Local and remote models via Ollama
hybrid Virtual (two backends) Inherits from sub-backends Two-phase reasoning + execution

OAuth plugin connectors (llm-interactive-proxy-oauth-connectors)

These entry points are defined in the sibling repo’s pyproject.toml under [project.entry-points."llm_proxy_backends"]. They are not present unless the optional package is installed.

Backend ID Provider Authentication Best For
antigravity-oauth Google Gemini (Antigravity) Antigravity token Internal / debugging (Gemini-shaped traffic)
cline Cline Local OAuth token Internal development and compatibility testing
gemini-oauth-auto Google Gemini (CLI) Multi-account OAuth Automatic account rotation across Google logins
gemini-oauth-plan Google Gemini (CLI) OAuth Google One / paid CLI tier
gemini-oauth-free Google Gemini (CLI) OAuth Free-tier CLI usage
kiro-oauth-auto Amazon Kiro / Q Developer Self-managed OAuth Kiro streaming via local OAuth tokens
opencode-zen OpenCode Zen OAuth OpenCode Zen API (distinct from opencode-go)
qwen-oauth Alibaba Qwen (CLI) Local OAuth token Qwen CLI OAuth

Agent Client Protocol (ACP) backends

The gemini-cli-acp and cursor-cli-acp backends spawn a local agent subprocess for each pooled workspace/session key (see connector implementation for pooling). After each completed chat turn (assistant response finished), the proxy schedules termination of that subprocess if it stays idle for stale_acp_agent_kill_idle_seconds (default 3600 seconds = 60 minutes). When you send another message or reuse the same pooled agent, the pending timer is cancelled; after the next completed turn, a new idle timer is scheduled.

This idle cleanup is enabled by default. To disable it:

  • CLI: --disable-stale-acp-agent-kills
  • Environment: DISABLE_STALE_ACP_AGENT_KILLS=true
  • Configuration file: disable_stale_acp_agent_kills: true

To change the idle delay:

  • CLI: --stale-acp-agent-kill-idle-seconds <seconds>
  • Environment: STALE_ACP_AGENT_KILL_IDLE_SECONDS=<seconds>
  • Configuration file: stale_acp_agent_kill_idle_seconds: <seconds>

psutil is a required runtime dependency (declared in pyproject.toml). Before terminating a child, the proxy uses it to verify the OS process is still the same one it spawned (creation time and, when available, executable path), so an unrelated process that reused the PID is not killed. The code also has a defensive import fallback: if psutil cannot be imported at runtime, idle-kill falls back to the subprocess handle only (weaker).

Precedence: CLI overrides environment overrides configuration file. INFO-level logs describe when a kill is scheduled, cancelled, or executed.

Frontend APIs

The proxy exposes multiple frontend APIs where clients connect. Each frontend implements a different LLM provider's API specification.

For detailed frontend API documentation, see the Frontend Overview:

Choosing a Backend

When selecting a backend, consider:

  • Cost: API key-based backends typically charge per token, while OAuth-based backends may have subscription or free tier limits
  • Performance: Different providers have different latency and throughput characteristics
  • Model Availability: Each provider offers different models with varying capabilities
  • Authentication: Choose between API keys (simpler) or OAuth (may offer free tiers)
  • Use Case: Some backends are optimized for specific tasks (e.g., zai-coding-plan for coding)
  • Tooling Model: Some CLI-mediated backends are better suited for specialized sub-agents than for acting as the main general-purpose coding agent for the whole session

Configuration

Backends are configured through environment variables and the proxy configuration file:

Basic Setup

# Set API keys for the backends you want to use
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AIza..."
export OPENROUTER_API_KEY="sk-or-..."
export NVIDIA_API_KEY="..."
export ZENMUX_API_KEY="..."
export ZAI_API_KEY="..."
export KIMI_API_KEY="..."
export MINIMAX_API_KEY="..."
export INTERNAI_API_KEY="..."

# For GCP-based Gemini
export GOOGLE_CLOUD_PROJECT="your-project-id"

Starting the Proxy

# Start with a specific default backend
python -m src.core.cli --default-backend openai

# Or specify in config file
python -m src.core.cli --config config/config.yaml

Config File Example

# config.yaml
backends:
  openai:
    type: openai
  anthropic:
    type: anthropic
  gemini:
    type: gemini

default_backend: openai

Switching Backends

You can switch backends dynamically during a session using in-chat commands:

!/backend(anthropic)
!/model(claude-3-5-sonnet-20241022)

Or use one-off commands for a single request:

!/oneoff(openrouter:qwen/qwen3-coder)

Backend-Specific Documentation

For detailed configuration and usage information for each backend, see:

Core

OAuth plugin (llm-interactive-proxy-oauth-connectors)

Extensibility

Related Features