The LLM Interactive Proxy supports multiple backend providers, allowing you to route requests to different LLM services while maintaining a consistent front-end API. This flexibility enables you to choose the best provider for your use case, switch providers without changing client code, and implement failover strategies.
Backend IDs are the type: values in YAML and the backend_type carried on requests. Core connectors live in this repository and are always import-registered. OAuth plugin connectors ship in the sibling package llm-interactive-proxy-oauth-connectors and register when you install the optional extra, for example pip install "llm-interactive-proxy[oauth]" (see pyproject.toml optional dependency oauth).
| Backend ID | Provider | Authentication | Best For |
|---|---|---|---|
openai |
OpenAI | API Key | Production applications, standard OpenAI models |
openai-responses |
OpenAI | API Key | Same credentials as OpenAI; targets /v1/responses for structured outputs (see OpenAI backend) |
openai-codex |
OpenAI (ChatGPT / Codex CLI) | Local OAuth token | ChatGPT login instead of an API key |
anthropic |
Anthropic | API Key | Claude via the standard Anthropic API |
gemini |
Google Gemini | API Key | Metered API usage, production apps |
gemini-cli-acp |
Google Gemini (ACP via Gemini CLI) | Local OAuth token | Sub-agents and tooling via Gemini CLI |
cursor-cli-acp |
Cursor (ACP via Cursor CLI agent acp) |
Local Cursor login (agent login) |
Cursor-hosted models through the official CLI; requires agent on PATH or CURSOR_AGENT_BIN |
gemini-cli-cloud-project |
Google Gemini (GCP) | OAuth + GCP project | Enterprise / team billing on Vertex-style flows |
openrouter |
OpenRouter | API Key | Many third-party hosted models behind one API |
nvidia |
NVIDIA (NIM / OpenAI-compatible) | API Key (NVIDIA_API_KEY) |
NVIDIA integrator or self-hosted NIM |
zenmux |
ZenMux | API Key | OpenAI-compatible ZenMux router |
zai |
ZAI | API Key | Zhipu / Z.ai |
zai-coding-plan |
ZAI Coding Plan | API Key | Coding-plan SKU / workflows |
kimi-code |
Kimi | API Key | Kimi For Coding (OpenAI-compatible) |
opencode-go |
OpenCode Go | API Key | OpenCode Go with internal OpenAI/Anthropic-style routing |
minimax |
Minimax | API Key | Minimax models |
internlm |
InternLM | API Key (rotation supported) | InternLM with optional key rotation |
ollama |
Ollama | None (local) | Local and remote models via Ollama |
hybrid |
Virtual (two backends) | Inherits from sub-backends | Two-phase reasoning + execution |
These entry points are defined in the sibling repo’s pyproject.toml under [project.entry-points."llm_proxy_backends"]. They are not present unless the optional package is installed.
| Backend ID | Provider | Authentication | Best For |
|---|---|---|---|
antigravity-oauth |
Google Gemini (Antigravity) | Antigravity token | Internal / debugging (Gemini-shaped traffic) |
cline |
Cline | Local OAuth token | Internal development and compatibility testing |
gemini-oauth-auto |
Google Gemini (CLI) | Multi-account OAuth | Automatic account rotation across Google logins |
gemini-oauth-plan |
Google Gemini (CLI) | OAuth | Google One / paid CLI tier |
gemini-oauth-free |
Google Gemini (CLI) | OAuth | Free-tier CLI usage |
kiro-oauth-auto |
Amazon Kiro / Q Developer | Self-managed OAuth | Kiro streaming via local OAuth tokens |
opencode-zen |
OpenCode Zen | OAuth | OpenCode Zen API (distinct from opencode-go) |
qwen-oauth |
Alibaba Qwen (CLI) | Local OAuth token | Qwen CLI OAuth |
The gemini-cli-acp and cursor-cli-acp backends spawn a local agent subprocess for each pooled workspace/session key (see connector implementation for pooling). After each completed chat turn (assistant response finished), the proxy schedules termination of that subprocess if it stays idle for stale_acp_agent_kill_idle_seconds (default 3600 seconds = 60 minutes). When you send another message or reuse the same pooled agent, the pending timer is cancelled; after the next completed turn, a new idle timer is scheduled.
This idle cleanup is enabled by default. To disable it:
- CLI:
--disable-stale-acp-agent-kills - Environment:
DISABLE_STALE_ACP_AGENT_KILLS=true - Configuration file:
disable_stale_acp_agent_kills: true
To change the idle delay:
- CLI:
--stale-acp-agent-kill-idle-seconds <seconds> - Environment:
STALE_ACP_AGENT_KILL_IDLE_SECONDS=<seconds> - Configuration file:
stale_acp_agent_kill_idle_seconds: <seconds>
psutil is a required runtime dependency (declared in pyproject.toml). Before terminating a child, the proxy uses it to verify the OS process is still the same one it spawned (creation time and, when available, executable path), so an unrelated process that reused the PID is not killed. The code also has a defensive import fallback: if psutil cannot be imported at runtime, idle-kill falls back to the subprocess handle only (weaker).
Precedence: CLI overrides environment overrides configuration file. INFO-level logs describe when a kill is scheduled, cancelled, or executed.
The proxy exposes multiple frontend APIs where clients connect. Each frontend implements a different LLM provider's API specification.
For detailed frontend API documentation, see the Frontend Overview:
- OpenAI Chat Completions -
/v1/chat/completions - OpenAI Responses API -
/v1/responses - Anthropic Messages -
/anthropic/v1/messages - Google Gemini v1beta -
/v1beta/models
When selecting a backend, consider:
- Cost: API key-based backends typically charge per token, while OAuth-based backends may have subscription or free tier limits
- Performance: Different providers have different latency and throughput characteristics
- Model Availability: Each provider offers different models with varying capabilities
- Authentication: Choose between API keys (simpler) or OAuth (may offer free tiers)
- Use Case: Some backends are optimized for specific tasks (e.g.,
zai-coding-planfor coding) - Tooling Model: Some CLI-mediated backends are better suited for specialized sub-agents than for acting as the main general-purpose coding agent for the whole session
Backends are configured through environment variables and the proxy configuration file:
# Set API keys for the backends you want to use
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AIza..."
export OPENROUTER_API_KEY="sk-or-..."
export NVIDIA_API_KEY="..."
export ZENMUX_API_KEY="..."
export ZAI_API_KEY="..."
export KIMI_API_KEY="..."
export MINIMAX_API_KEY="..."
export INTERNAI_API_KEY="..."
# For GCP-based Gemini
export GOOGLE_CLOUD_PROJECT="your-project-id"# Start with a specific default backend
python -m src.core.cli --default-backend openai
# Or specify in config file
python -m src.core.cli --config config/config.yaml# config.yaml
backends:
openai:
type: openai
anthropic:
type: anthropic
gemini:
type: gemini
default_backend: openaiYou can switch backends dynamically during a session using in-chat commands:
!/backend(anthropic)
!/model(claude-3-5-sonnet-20241022)
Or use one-off commands for a single request:
!/oneoff(openrouter:qwen/qwen3-coder)
For detailed configuration and usage information for each backend, see:
Core
- OpenAI and OpenAI Responses (
openai,openai-responses) - OpenAI Codex (
openai-codex) - Anthropic
- Gemini (API keys, CLI OAuth variants,
gemini-cli-acp, andgemini-cli-cloud-project) - Cursor CLI ACP (
cursor-cli-acp): same idea as Gemini CLI ACP but via Cursor’sagent acpCLI; install and log in with Cursor’s agent tooling, ensureagentis onPATHor setCURSOR_AGENT_BIN. There is no separate backend guide page yet. - OpenRouter
- NVIDIA
- ZAI
- Kimi Code
- OpenCode Go
- Ollama
- InternLM
- MiniMax
- ZenMux
- Hybrid backend (
hybrid)
OAuth plugin (llm-interactive-proxy-oauth-connectors)
- Antigravity OAuth
- Cline
- Gemini OAuth Auto (
gemini-oauth-auto; overview also in Gemini backends) - Kiro OAuth Auto
- OpenCode Zen
- Qwen OAuth
- Gemini OAuth plan / free (
gemini-oauth-plan,gemini-oauth-free)
Extensibility
- Model Name Rewrites - Transform model names dynamically
- Hybrid Backend - Use two models in sequence
- URI Model Parameters - Specify parameters in model strings