The InternLM backend provides access to InternLM AI models through an OpenAI-compatible API. InternLM models are known for their strong reasoning capabilities and coding performance.
InternLM is a series of large language models developed by the Shanghai AI Laboratory. The proxy supports the internlm backend, which connects to InternLM's API using API key authentication.
- OpenAI-compatible API
- Multiple API key rotation for load distribution
- Vendor prefix support (
internlm/) - Automatic non-streaming backend requests with SSE stream synthesis
- Deep thinking mode support
You need an InternLM API key to use this backend:
- Visit https://internlm.intern-ai.org.cn
- Navigate to API → API Tokens
- Create a new API token
Set your API key using environment variables:
# Single API key
export INTERNAI_API_KEY="your-api-key-here"
# Multiple API keys for rotation (optional)
export INTERNAI_API_KEY="your-primary-key"
export INTERNAI_API_KEY_1="your-second-key"
export INTERNAI_API_KEY_2="your-third-key"# Start proxy with InternLM as default backend
python -m src.core.cli --default-backend internlm
# With specific model
python -m src.core.cli --default-backend internlm --force-model internlm2.5-latestbackends:
internlm:
type: internlm
enabled: true
default_backend: internlmThe InternLM backend supports the following models:
| Model | Description |
|---|---|
internlm2.5-latest |
Latest InternLM 2.5 model (recommended) |
internlm2.5-20b |
InternLM 2.5 20B parameter model |
internlm2.5-7b |
InternLM 2.5 7B parameter model |
internlm2-latest |
Latest InternLM 2 model |
internlm2-20b |
InternLM 2 20B parameter model |
internlm2-7b |
InternLM 2 7B parameter model |
Use with vendor prefix: internlm/internlm2.5-latest
The InternLM backend supports multiple API keys for automatic rotation:
-
Set multiple keys using numbered environment variables:
export INTERNAI_API_KEY="key1" export INTERNAI_API_KEY_1="key2" export INTERNAI_API_KEY_2="key3"
-
The backend automatically rotates through keys in round-robin fashion
-
This helps distribute load and provides fallback if one key hits rate limits
Note: The InternLM API does not reliably support Server-Sent Events (SSE) streaming. The connector handles this transparently by:
- Sending non-streaming requests to the InternLM API
- Converting the complete response into an OpenAI-compatible SSE stream
- Providing the stream to clients as if it were native streaming
This ensures compatibility with all OpenAI-compatible clients while working within InternLM's API limitations.
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key="dummy-key"
)
response = client.chat.completions.create(
model="internlm/internlm2.5-latest",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)# Force all requests to use InternLM
python -m src.core.cli --default-backend internlm --force-model internlm2.5-latestIf you see authentication errors:
- Verify your
INTERNAI_API_KEYis set correctly - Check that the key is active in your InternLM account
- For multiple keys, ensure all keys are valid
The InternLM backend automatically handles streaming via non-streaming API calls. If you experience issues:
- Check proxy logs for InternLM-specific error messages
- Verify the backend is healthy:
curl http://localhost:8000/health - Ensure the InternLM API endpoint is reachable
InternLM models should be prefixed with internlm/ when using unified routing:
- ✅
internlm/internlm2.5-latest - ❌
internlm2.5-latest(may not route correctly in multi-backend setups)