The LLM Interactive Proxy supports creating custom backend configurations for providers not included in the default set. This allows you to integrate with any OpenAI-compatible API or create specialized configurations for existing providers.
Custom backends enable you to:
- Connect to proprietary or internal LLM services
- Configure specialized endpoints for existing providers
- Create custom model configurations with specific limits
- Integrate with new LLM providers as they emerge
Custom backends are configured in YAML files under config/backends/:
config/
└── backends/
└── my-custom-backend/
└── backend.yaml
Create a backend.yaml file with the following structure:
# config/backends/my-custom-backend/backend.yaml
backend_type: "custom"
api_base_url: "https://api.example.com/v1"
api_key_env_var: "MY_CUSTOM_API_KEY"
models:
"my-model-name":
limits:
context_window: 128000
max_input_tokens: 100000
max_output_tokens: 28000
requests_per_minute: 60
tokens_per_minute: 1000000- backend_type: Type of backend (use
"custom"for custom backends) - api_base_url: Base URL for the API endpoint
- api_key_env_var: Environment variable name containing the API key
- models: Dictionary of model configurations
- default_model: Default model to use if none specified
- timeout: Request timeout in seconds
- max_retries: Maximum number of retry attempts
Each model can have the following configuration:
models:
"model-name":
limits:
context_window: 128000 # Total context window (tokens)
max_input_tokens: 100000 # Maximum input tokens
max_output_tokens: 28000 # Maximum output tokens
requests_per_minute: 60 # Rate limit (requests)
tokens_per_minute: 1000000 # Rate limit (tokens)
parameters:
temperature: 0.7 # Default temperature
top_p: 0.9 # Default top_p# config/backends/internal-llm/backend.yaml
backend_type: "custom"
api_base_url: "https://internal-llm.company.com/v1"
api_key_env_var: "INTERNAL_LLM_API_KEY"
models:
"company-gpt-large":
limits:
context_window: 32000
max_input_tokens: 24000
max_output_tokens: 8000
requests_per_minute: 100Usage:
export INTERNAL_LLM_API_KEY="your-key"
python -m src.core.cli --default-backend internal-llm# config/backends/openai-specialized/backend.yaml
backend_type: "custom"
api_base_url: "https://api.openai.com/v1"
api_key_env_var: "OPENAI_API_KEY"
models:
"gpt-4-specialized":
limits:
context_window: 8000 # Restricted context
max_input_tokens: 6000
max_output_tokens: 2000
requests_per_minute: 30 # Conservative rate limits
parameters:
temperature: 0.3 # Lower temperature for precision# config/backends/multi-model/backend.yaml
backend_type: "custom"
api_base_url: "https://api.provider.com/v1"
api_key_env_var: "PROVIDER_API_KEY"
models:
"fast-model":
limits:
context_window: 4096
max_input_tokens: 3000
requests_per_minute: 120
parameters:
temperature: 0.8
"accurate-model":
limits:
context_window: 32000
max_input_tokens: 24000
requests_per_minute: 30
parameters:
temperature: 0.2
"balanced-model":
limits:
context_window: 16000
max_input_tokens: 12000
requests_per_minute: 60
parameters:
temperature: 0.5Add custom headers to all requests:
backend_type: "custom"
api_base_url: "https://api.example.com/v1"
api_key_env_var: "EXAMPLE_API_KEY"
headers:
X-Custom-Header: "value"
X-Organization-ID: "org-123"api_key_env_var: "MY_API_KEY"
auth_type: "bearer" # Defaultapi_key_env_var: "MY_API_KEY"
auth_type: "custom"
auth_header: "X-API-Key"backend_type: "custom"
api_base_url: "https://api.example.com/v1"
api_key_env_var: "EXAMPLE_API_KEY"
timeout: 60 # Request timeout in seconds
max_retries: 3 # Maximum retry attempts
retry_delay: 1.0 # Delay between retries (seconds)Custom backends support context window enforcement to prevent excessive token usage:
models:
"large-context-model":
limits:
context_window: 262144 # 256K total context
max_input_tokens: 200000 # 200K input limit
max_output_tokens: 62144 # 62K output limitWhen a request exceeds max_input_tokens, the proxy returns a 400 error:
{
"detail": {
"code": "input_limit_exceeded",
"message": "Input token limit exceeded",
"details": {
"model": "large-context-model",
"limit": 200000,
"measured": 225000
}
}
}Create custom backends for:
- Testing against local LLM instances
- Mocking LLM responses for integration tests
- Prototyping with experimental models
Use custom backends to:
- Connect to internal LLM services
- Enforce company-specific rate limits
- Add custom authentication and headers
- Integrate with proprietary models
Configure custom backends to:
- Set strict context window limits
- Enforce conservative rate limits
- Route to cost-effective alternatives
- Track usage per model variant
Create different backend configurations for:
- Free tier users (strict limits)
- Premium users (higher limits)
- Enterprise users (custom configurations)
- Verify the backend directory exists under
config/backends/ - Check that
backend.yamlis in the correct location - Ensure the backend name matches the directory name
- Verify the environment variable is set correctly
- Check that the API key is valid
- Ensure the
auth_typematches the provider's requirements
- Adjust
requests_per_minuteandtokens_per_minutein model limits - Check provider's actual rate limits
- Consider using API Key Rotation (via multiple backend instances) with load balancing
- Verify
max_input_tokensis set correctly - Ensure
context_windowis larger thanmax_input_tokens - Check that token counting is accurate for your provider
- Start with Defaults: Base your configuration on existing backends
- Test Thoroughly: Validate your configuration with test requests
- Document Limits: Clearly document rate limits and context windows
- Use Environment Variables: Never hardcode API keys in config files
- Monitor Usage: Track token usage and costs
- Version Control: Keep backend configurations in version control (without secrets)
- Context Window Enforcement - Enforce token limits
- Model Name Rewrites - Route to custom backends
- URI Model Parameters - Override model parameters