Custom Backends

The LLM Interactive Proxy supports creating custom backend configurations for providers not included in the default set. This allows you to integrate with any OpenAI-compatible API or create specialized configurations for existing providers.

Overview

Custom backends enable you to:

Connect to proprietary or internal LLM services
Configure specialized endpoints for existing providers
Create custom model configurations with specific limits
Integrate with new LLM providers as they emerge

Creating a Custom Backend

Directory Structure

Custom backends are configured in YAML files under config/backends/:

config/
└── backends/
    └── my-custom-backend/
        └── backend.yaml

Basic Backend Configuration

Create a backend.yaml file with the following structure:

# config/backends/my-custom-backend/backend.yaml
backend_type: "custom"
api_base_url: "https://api.example.com/v1"
api_key_env_var: "MY_CUSTOM_API_KEY"

models:
  "my-model-name":
    limits:
      context_window: 128000
      max_input_tokens: 100000
      max_output_tokens: 28000
      requests_per_minute: 60
      tokens_per_minute: 1000000

Configuration Fields

Required Fields

backend_type: Type of backend (use "custom" for custom backends)
api_base_url: Base URL for the API endpoint
api_key_env_var: Environment variable name containing the API key

Optional Fields

models: Dictionary of model configurations
default_model: Default model to use if none specified
timeout: Request timeout in seconds
max_retries: Maximum number of retry attempts

Model Configuration

Each model can have the following configuration:

models:
  "model-name":
    limits:
      context_window: 128000        # Total context window (tokens)
      max_input_tokens: 100000      # Maximum input tokens
      max_output_tokens: 28000      # Maximum output tokens
      requests_per_minute: 60       # Rate limit (requests)
      tokens_per_minute: 1000000    # Rate limit (tokens)
    parameters:
      temperature: 0.7              # Default temperature
      top_p: 0.9                    # Default top_p

Usage Examples

Example 1: Internal LLM Service

# config/backends/internal-llm/backend.yaml
backend_type: "custom"
api_base_url: "https://internal-llm.company.com/v1"
api_key_env_var: "INTERNAL_LLM_API_KEY"

models:
  "company-gpt-large":
    limits:
      context_window: 32000
      max_input_tokens: 24000
      max_output_tokens: 8000
      requests_per_minute: 100

Usage:

export INTERNAL_LLM_API_KEY="your-key"
python -m src.core.cli --default-backend internal-llm

Example 2: Specialized OpenAI Configuration

# config/backends/openai-specialized/backend.yaml
backend_type: "custom"
api_base_url: "https://api.openai.com/v1"
api_key_env_var: "OPENAI_API_KEY"

models:
  "gpt-4-specialized":
    limits:
      context_window: 8000          # Restricted context
      max_input_tokens: 6000
      max_output_tokens: 2000
      requests_per_minute: 30       # Conservative rate limits
    parameters:
      temperature: 0.3              # Lower temperature for precision

Example 3: Multiple Model Variants

# config/backends/multi-model/backend.yaml
backend_type: "custom"
api_base_url: "https://api.provider.com/v1"
api_key_env_var: "PROVIDER_API_KEY"

models:
  "fast-model":
    limits:
      context_window: 4096
      max_input_tokens: 3000
      requests_per_minute: 120
    parameters:
      temperature: 0.8
  
  "accurate-model":
    limits:
      context_window: 32000
      max_input_tokens: 24000
      requests_per_minute: 30
    parameters:
      temperature: 0.2
  
  "balanced-model":
    limits:
      context_window: 16000
      max_input_tokens: 12000
      requests_per_minute: 60
    parameters:
      temperature: 0.5

Advanced Configuration

Custom Headers

Add custom headers to all requests:

backend_type: "custom"
api_base_url: "https://api.example.com/v1"
api_key_env_var: "EXAMPLE_API_KEY"

headers:
  X-Custom-Header: "value"
  X-Organization-ID: "org-123"

Authentication Methods

Bearer Token (Default)

api_key_env_var: "MY_API_KEY"
auth_type: "bearer"  # Default

Custom Header

api_key_env_var: "MY_API_KEY"
auth_type: "custom"
auth_header: "X-API-Key"

Timeout and Retry Configuration

backend_type: "custom"
api_base_url: "https://api.example.com/v1"
api_key_env_var: "EXAMPLE_API_KEY"

timeout: 60              # Request timeout in seconds
max_retries: 3           # Maximum retry attempts
retry_delay: 1.0         # Delay between retries (seconds)

Context Window Enforcement

Custom backends support context window enforcement to prevent excessive token usage:

models:
  "large-context-model":
    limits:
      context_window: 262144      # 256K total context
      max_input_tokens: 200000    # 200K input limit
      max_output_tokens: 62144    # 62K output limit

When a request exceeds max_input_tokens, the proxy returns a 400 error:

{
  "detail": {
    "code": "input_limit_exceeded",
    "message": "Input token limit exceeded",
    "details": {
      "model": "large-context-model",
      "limit": 200000,
      "measured": 225000
    }
  }
}

Use Cases

Development and Testing

Create custom backends for:

Testing against local LLM instances
Mocking LLM responses for integration tests
Prototyping with experimental models

Enterprise Integration

Use custom backends to:

Connect to internal LLM services
Enforce company-specific rate limits
Add custom authentication and headers
Integrate with proprietary models

Cost Control

Configure custom backends to:

Set strict context window limits
Enforce conservative rate limits
Route to cost-effective alternatives
Track usage per model variant

Multi-Tier Service

Create different backend configurations for:

Free tier users (strict limits)
Premium users (higher limits)
Enterprise users (custom configurations)

Troubleshooting

Backend Not Found

Verify the backend directory exists under config/backends/
Check that backend.yaml is in the correct location
Ensure the backend name matches the directory name

Authentication Errors

Verify the environment variable is set correctly
Check that the API key is valid
Ensure the auth_type matches the provider's requirements

Rate Limiting Issues

Adjust requests_per_minute and tokens_per_minute in model limits
Check provider's actual rate limits
Consider using API Key Rotation (via multiple backend instances) with load balancing

Context Window Errors

Verify max_input_tokens is set correctly
Ensure context_window is larger than max_input_tokens
Check that token counting is accurate for your provider

Best Practices

Start with Defaults: Base your configuration on existing backends
Test Thoroughly: Validate your configuration with test requests
Document Limits: Clearly document rate limits and context windows
Use Environment Variables: Never hardcode API keys in config files
Monitor Usage: Track token usage and costs
Version Control: Keep backend configurations in version control (without secrets)

Related Features

Context Window Enforcement - Enforce token limits
Model Name Rewrites - Route to custom backends
URI Model Parameters - Override model parameters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Backends

Overview

Creating a Custom Backend

Directory Structure

Basic Backend Configuration

Configuration Fields

Required Fields

Optional Fields

Model Configuration

Usage Examples

Example 1: Internal LLM Service

Example 2: Specialized OpenAI Configuration

Example 3: Multiple Model Variants

Advanced Configuration

Custom Headers

Authentication Methods

Bearer Token (Default)

Custom Header

Timeout and Retry Configuration

Context Window Enforcement

Use Cases

Development and Testing

Enterprise Integration

Cost Control

Multi-Tier Service

Troubleshooting

Backend Not Found

Authentication Errors

Rate Limiting Issues

Context Window Errors

Best Practices

Related Features

Related Documentation

FilesExpand file tree

custom-backends.md

Latest commit

History

custom-backends.md

File metadata and controls

Custom Backends

Overview

Creating a Custom Backend

Directory Structure

Basic Backend Configuration

Configuration Fields

Required Fields

Optional Fields

Model Configuration

Usage Examples

Example 1: Internal LLM Service

Example 2: Specialized OpenAI Configuration

Example 3: Multiple Model Variants

Advanced Configuration

Custom Headers

Authentication Methods

Bearer Token (Default)

Custom Header

Timeout and Retry Configuration

Context Window Enforcement

Use Cases

Development and Testing

Enterprise Integration

Cost Control

Multi-Tier Service

Troubleshooting

Backend Not Found

Authentication Errors

Rate Limiting Issues

Context Window Errors

Best Practices

Related Features

Related Documentation