Backend Overview

The LLM Interactive Proxy supports multiple backend providers, allowing you to route requests to different LLM services while maintaining a consistent front-end API. This flexibility enables you to choose the best provider for your use case, switch providers without changing client code, and implement failover strategies.

Supported Backends

Backend IDs are the type: values in YAML and the backend_type carried on requests. Core connectors live in this repository and are always import-registered. OAuth plugin connectors ship in the sibling package llm-interactive-proxy-oauth-connectors and register when you install the optional extra, for example pip install "llm-interactive-proxy[oauth]" (see pyproject.toml optional dependency oauth).

Core connectors (this repository)

Backend ID	Provider	Authentication	Best For
`openai`	OpenAI	API Key	Production applications, standard OpenAI models
`openai-responses`	OpenAI	API Key	Same credentials as OpenAI; targets `/v1/responses` for structured outputs (see OpenAI backend)
`openai-codex`	OpenAI (ChatGPT / Codex CLI)	Local OAuth token	ChatGPT login instead of an API key
`anthropic`	Anthropic	API Key	Claude via the standard Anthropic API
`gemini`	Google Gemini	API Key	Metered API usage, production apps
`gemini-cli-acp`	Google Gemini (ACP via Gemini CLI)	Local OAuth token	Sub-agents and tooling via Gemini CLI
`cursor-cli-acp`	Cursor (ACP via Cursor CLI `agent acp`)	Local Cursor login (`agent login`)	Cursor-hosted models through the official CLI; requires `agent` on PATH or `CURSOR_AGENT_BIN`
`gemini-cli-cloud-project`	Google Gemini (GCP)	OAuth + GCP project	Enterprise / team billing on Vertex-style flows
`openrouter`	OpenRouter	API Key	Many third-party hosted models behind one API
`nvidia`	NVIDIA (NIM / OpenAI-compatible)	API Key (`NVIDIA_API_KEY`)	NVIDIA integrator or self-hosted NIM
`zenmux`	ZenMux	API Key	OpenAI-compatible ZenMux router
`zai`	ZAI	API Key	Zhipu / Z.ai
`zai-coding-plan`	ZAI Coding Plan	API Key	Coding-plan SKU / workflows
`kimi-code`	Kimi	API Key	Kimi For Coding (OpenAI-compatible)
`opencode-go`	OpenCode Go	API Key	OpenCode Go with internal OpenAI/Anthropic-style routing
`minimax`	Minimax	API Key	Minimax models
`internlm`	InternLM	API Key (rotation supported)	InternLM with optional key rotation
`ollama`	Ollama	None (local)	Local and remote models via Ollama
`hybrid`	Virtual (two backends)	Inherits from sub-backends	Two-phase reasoning + execution

OAuth plugin connectors (`llm-interactive-proxy-oauth-connectors`)

These entry points are defined in the sibling repo’s pyproject.toml under [project.entry-points."llm_proxy_backends"]. They are not present unless the optional package is installed.

Backend ID	Provider	Authentication	Best For
`antigravity-oauth`	Google Gemini (Antigravity)	Antigravity token	Internal / debugging (Gemini-shaped traffic)
`cline`	Cline	Local OAuth token	Internal development and compatibility testing
`gemini-oauth-auto`	Google Gemini (CLI)	Multi-account OAuth	Automatic account rotation across Google logins
`gemini-oauth-plan`	Google Gemini (CLI)	OAuth	Google One / paid CLI tier
`gemini-oauth-free`	Google Gemini (CLI)	OAuth	Free-tier CLI usage
`kiro-oauth-auto`	Amazon Kiro / Q Developer	Self-managed OAuth	Kiro streaming via local OAuth tokens
`opencode-zen`	OpenCode Zen	OAuth	OpenCode Zen API (distinct from `opencode-go`)
`qwen-oauth`	Alibaba Qwen (CLI)	Local OAuth token	Qwen CLI OAuth

Agent Client Protocol (ACP) backends

The gemini-cli-acp and cursor-cli-acp backends spawn a local agent subprocess for each pooled workspace/session key (see connector implementation for pooling). After each completed chat turn (assistant response finished), the proxy schedules termination of that subprocess if it stays idle for stale_acp_agent_kill_idle_seconds (default 3600 seconds = 60 minutes). When you send another message or reuse the same pooled agent, the pending timer is cancelled; after the next completed turn, a new idle timer is scheduled.

This idle cleanup is enabled by default. To disable it:

CLI: --disable-stale-acp-agent-kills
Environment: DISABLE_STALE_ACP_AGENT_KILLS=true
Configuration file: disable_stale_acp_agent_kills: true

To change the idle delay:

CLI: --stale-acp-agent-kill-idle-seconds <seconds>
Environment: STALE_ACP_AGENT_KILL_IDLE_SECONDS=<seconds>
Configuration file: stale_acp_agent_kill_idle_seconds: <seconds>

psutil is a required runtime dependency (declared in pyproject.toml). Before terminating a child, the proxy uses it to verify the OS process is still the same one it spawned (creation time and, when available, executable path), so an unrelated process that reused the PID is not killed. The code also has a defensive import fallback: if psutil cannot be imported at runtime, idle-kill falls back to the subprocess handle only (weaker).

Precedence: CLI overrides environment overrides configuration file. INFO-level logs describe when a kill is scheduled, cancelled, or executed.

Frontend APIs

The proxy exposes multiple frontend APIs where clients connect. Each frontend implements a different LLM provider's API specification.

For detailed frontend API documentation, see the Frontend Overview:

OpenAI Chat Completions - /v1/chat/completions
OpenAI Responses API - /v1/responses
Anthropic Messages - /anthropic/v1/messages
Google Gemini v1beta - /v1beta/models

Choosing a Backend

When selecting a backend, consider:

Cost: API key-based backends typically charge per token, while OAuth-based backends may have subscription or free tier limits
Performance: Different providers have different latency and throughput characteristics
Model Availability: Each provider offers different models with varying capabilities
Authentication: Choose between API keys (simpler) or OAuth (may offer free tiers)
Use Case: Some backends are optimized for specific tasks (e.g., zai-coding-plan for coding)
Tooling Model: Some CLI-mediated backends are better suited for specialized sub-agents than for acting as the main general-purpose coding agent for the whole session

Configuration

Backends are configured through environment variables and the proxy configuration file:

Basic Setup

# Set API keys for the backends you want to use
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AIza..."
export OPENROUTER_API_KEY="sk-or-..."
export NVIDIA_API_KEY="..."
export ZENMUX_API_KEY="..."
export ZAI_API_KEY="..."
export KIMI_API_KEY="..."
export MINIMAX_API_KEY="..."
export INTERNAI_API_KEY="..."

# For GCP-based Gemini
export GOOGLE_CLOUD_PROJECT="your-project-id"

Starting the Proxy

# Start with a specific default backend
python -m src.core.cli --default-backend openai

# Or specify in config file
python -m src.core.cli --config config/config.yaml

Config File Example

# config.yaml
backends:
  openai:
    type: openai
  anthropic:
    type: anthropic
  gemini:
    type: gemini

default_backend: openai

Switching Backends

You can switch backends dynamically during a session using in-chat commands:

!/backend(anthropic)
!/model(claude-3-5-sonnet-20241022)

Or use one-off commands for a single request:

!/oneoff(openrouter:qwen/qwen3-coder)

Backend-Specific Documentation

For detailed configuration and usage information for each backend, see:

Core

OpenAI and OpenAI Responses (openai, openai-responses)
OpenAI Codex (openai-codex)
Anthropic
Gemini (API keys, CLI OAuth variants, gemini-cli-acp, and gemini-cli-cloud-project)
Cursor CLI ACP (cursor-cli-acp): same idea as Gemini CLI ACP but via Cursor’s agent acp CLI; install and log in with Cursor’s agent tooling, ensure agent is on PATH or set CURSOR_AGENT_BIN. There is no separate backend guide page yet.
OpenRouter
NVIDIA
ZAI
Kimi Code
OpenCode Go
Ollama
InternLM
MiniMax
ZenMux
Hybrid backend (hybrid)

OAuth plugin (llm-interactive-proxy-oauth-connectors)

Antigravity OAuth
Cline
Gemini OAuth Auto (gemini-oauth-auto; overview also in Gemini backends)
Kiro OAuth Auto
OpenCode Zen
Qwen OAuth
Gemini OAuth plan / free (gemini-oauth-plan, gemini-oauth-free)

Extensibility

Custom Backends

Related Features

Model Name Rewrites - Transform model names dynamically
Hybrid Backend - Use two models in sequence
URI Model Parameters - Specify parameters in model strings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend Overview

Supported Backends

Core connectors (this repository)

OAuth plugin connectors (`llm-interactive-proxy-oauth-connectors`)

Agent Client Protocol (ACP) backends

Frontend APIs

Choosing a Backend

Configuration

Basic Setup

Starting the Proxy

Config File Example

Switching Backends

Backend-Specific Documentation

Related Features

FilesExpand file tree

overview.md

Latest commit

History

overview.md

File metadata and controls

Backend Overview

Supported Backends

Core connectors (this repository)

OAuth plugin connectors (llm-interactive-proxy-oauth-connectors)

Agent Client Protocol (ACP) backends

Frontend APIs

Choosing a Backend

Configuration

Basic Setup

Starting the Proxy

Config File Example

Switching Backends

Backend-Specific Documentation

Related Features

OAuth plugin connectors (`llm-interactive-proxy-oauth-connectors`)