Refactor proxy server by lvhan028 · Pull Request #4596 · InternLM/lmdeploy

lvhan028 · 2026-05-18T12:09:25Z

No description provided.

Design for refactoring the proxy server with modular architecture, strategy pattern for routing, and new min_cache_usage strategy that polls backend /metrics for KV cache occupation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

13-task plan covering config, node registry, routing strategies, forwarding, streaming, distserve, app factory, and CLI updates. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…Codes Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…cted, min_observed Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…sponse.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR refactors the proxy server implementation (lmdeploy/serve/proxy/) from a large monolithic module into smaller components (config, node registry, routing strategies, forwarding, app factory, and DistServe router) and adds a new min_cache_usage routing strategy that polls /metrics.

Changes:

Split proxy functionality into focused modules and rewired the CLI entrypoint to assemble ProxyConfig + NodeRegistry + routing strategy + FastAPI app.
Added min_cache_usage routing strategy with Prometheus text parsing + background polling and fallback to min_expected_latency.
Added a new proxy test suite (config/node/routing/forwarding) and updated docs OpenAPI generation.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/test_lmdeploy/serve/proxy/test_routing.py	Adds unit tests for routing strategies (random/min-expected/min-observed/min-cache) and factory selection.
tests/test_lmdeploy/serve/proxy/test_node.py	Adds unit tests for `Node` defaults and `NodeRegistry` CRUD/persist/load.
tests/test_lmdeploy/serve/proxy/test_forwarding.py	Adds unit tests for forwarding header preparation.
tests/test_lmdeploy/serve/proxy/test_config.py	Adds unit tests for proxy config defaults, env override, enums, and exception.
lmdeploy/serve/proxy/utils.py	Removes old shared constants/enums/exceptions (moved into `config.py`).
lmdeploy/serve/proxy/streaming.py	Updates import location for `APIServerException` and minor formatting.
lmdeploy/serve/proxy/routing/random.py	Implements weighted-random routing by node speed.
lmdeploy/serve/proxy/routing/min_observed.py	Implements routing based on mean observed latency history.
lmdeploy/serve/proxy/routing/min_expected.py	Implements routing based on expected latency (`unfinished/speed`).
lmdeploy/serve/proxy/routing/min_cache.py	Implements `min_cache_usage` routing + `/metrics` polling + Prometheus parsing + fallback.
lmdeploy/serve/proxy/routing/base.py	Adds `BaseStrategy` with hooks for request start/end and lifecycle start/stop.
lmdeploy/serve/proxy/routing/init.py	Adds `get_strategy()` factory for routing strategies.
lmdeploy/serve/proxy/proxy.py	Replaces monolith with a slim entrypoint wiring config/registry/strategy/app and running Uvicorn.
lmdeploy/serve/proxy/node.py	Adds `Node` and `NodeRegistry` with persistence and cache-usage metric updates.
lmdeploy/serve/proxy/forwarding.py	Adds raw request forwarding helpers (streaming + non-streaming) and header handling.
lmdeploy/serve/proxy/distserve.py	Extracts DistServe (prefill/decode) routing into `DistServeRouter`.
lmdeploy/serve/proxy/config.py	Adds `ProxyConfig`, enums, constants, error codes/messages, and `APIServerException`.
lmdeploy/serve/proxy/app.py	Adds `create_app()` FastAPI factory with endpoints, lifespan, and integration with routing strategy/registry.
lmdeploy/serve/proxy/init.py	Re-exports proxy-related public API.
lmdeploy/cli/serve.py	Adds `min_cache_usage` to CLI `--routing-strategy` choices.
docs/zh_cn/conf.py	Updates OpenAPI spec generation to build a proxy app via `create_app()`.
docs/en/conf.py	Updates OpenAPI spec generation to build a proxy app via `create_app()`.
docs/superpowers/specs/2026-05-18-proxy-refactor-design.md	Adds design spec documenting the refactor and new routing strategy.
docs/superpowers/plans/2026-05-18-proxy-refactor.md	Adds detailed implementation plan for the refactor.

Comments suppressed due to low confidence (1)

lmdeploy/serve/proxy/app.py:274

Same issue for /v1/completions streaming: computing time.time() - start when scheduling the background task records near-zero latency. Compute elapsed time inside the background task at completion.

                response = forward_request_stream(client, node.url, raw_request, '/v1/completions')
                background_task = BackgroundTasks()
                background_task.add_task(strategy.on_request_end, node, time.time() - start)
                return ProxyStreamingResponse(response, background=background_task, media_type='text/event-stream')

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    @app.post('/nodes/add', dependencies=[Depends(validate_json_request)])
+    async def add_node(node: Node, raw_request: Request = None):
+        try:


+    try:
+        loop = asyncio.get_event_loop()
+    except RuntimeError:
+        return
+    if loop.is_closed():


+                response = forward_request_stream(client, node.url, raw_request, '/v1/chat/completions')
+                background_task = BackgroundTasks()
+                background_task.add_task(strategy.on_request_end, node, time.time() - start)
+                return ProxyStreamingResponse(response, background=background_task, media_type='text/event-stream')


+    async def add(self, url: str, role: EngineRole = EngineRole.Hybrid,
+                  models: list[str] | None = None,
+                  status: Node | None = None) -> None:
+        async with self._lock:
+            if status is not None:
+                if status.models:
+                    self._nodes.pop(url, None)
+                    self._nodes[url] = status
+                    await self._persist_unlocked()
+                    return
+                node = status
+            else:
+                node = self._nodes.get(url, Node(url=url, role=role))
+
+            if models is not None:
+                node.models = models
+            elif not node.models:
+                try:
+                    import requests
+
+                    from lmdeploy.serve.openai.api_client import APIClient
+                    client = APIClient(api_server_url=url)
+                    node.models = client.available_models
+                except requests.exceptions.RequestException as e:
+                    logger.error(f"Exception when adding node {url}: {e}")
+                    return
+


+            p_nodes = await self.registry.get(model_name, role=EngineRole.Prefill)
+            if not p_nodes:
+                return self._handle_unavailable_model(model_name)
+            p_url = p_nodes[0].url
+            logger.info(f"A Prefill request is dispatched to {p_url}")


+        d_nodes = await self.registry.get(model_name, role=EngineRole.Decode)
+        if not d_nodes:
+            return self._handle_unavailable_model(model_name)
+        d_url = d_nodes[0].url


+            if stream:
+                response = stream_generate(client, request_dict, d_url, endpoint)
+                background_task = BackgroundTasks()
+                resp = StreamingResponse(response, background=background_task, media_type='text/event-stream')
+            else:


Previously, a new aiohttp.ClientSession was created per request and per metrics poll cycle. This leaked TCP connections and discarded connection pooling. Now one shared session is created in the app lifespan, stored on strategy.client and app.state.client, and reused by both request handlers and MinCacheUsageStrategy polling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add conn_limit and conn_limit_per_host to ProxyConfig, configurable via LMDEPLOY_PROXY_CONN_LIMIT (default 100) and LMDEPLOY_PROXY_CONN_LIMIT_PER_HOST (default 0=unlimited) env vars. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

0.0.0.0 is a bind address, not a connect address. Python's HTTP clients cannot connect to it, causing "Connection refused" errors. Replace 0.0.0.0 with 127.0.0.1 when adding nodes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…odel fetch failures The api_server sends {'url': '...', 'status': {'models': [...], 'role': N}} when self-registering, but the new Node model doesn't have a nested status field. Extract models and role from the nested status dict in the raw request body. Also, if model fetching fails, register the node with an empty model list instead of silently dropping it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

lvhan028 and others added 15 commits May 18, 2026 06:57

Add proxy server refactor design spec

78e0b08

Design for refactoring the proxy server with modular architecture, strategy pattern for routing, and new min_cache_usage strategy that polls backend /metrics for KV cache occupation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Rename env var to LMDEPLOY_PROXY_POLL_METRICS_INTERVAL

8f56889

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add proxy server refactor implementation plan

ba15579

13-task plan covering config, node registry, routing strategies, forwarding, streaming, distserve, app factory, and CLI updates. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(proxy): add ProxyConfig, RoutingStrategy, ServingStrategy, Error…

69279dc

…Codes Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(proxy): add Node model and NodeRegistry

6c4a647

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(proxy): add routing strategy pattern with base, random, min_expe…

9c3131f

…cted, min_observed Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(proxy): add MinCacheUsageStrategy with background metrics polling

8d8deb2

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(proxy): add forwarding module with raw request forwarding

9fa03a8

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(proxy): add streaming module with ProxyStreamingResponse

489c618

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(proxy): add DistServeRouter with isolated distserve logic

6fd99fe

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(proxy): add app factory with all endpoint handlers

d7d0747

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(proxy): rewrite proxy.py as slim entry point

84ac91d

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(proxy): update __init__.py, remove old utils.py and streaming_re…

532ca79

…sponse.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(proxy): add min_cache_usage to CLI routing strategy choices

a10ad56

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fix(docs): update proxy app import to use create_app for OpenAPI spec

7350e5b

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 18, 2026 12:09

lvhan028 marked this pull request as draft May 18, 2026 12:09

lvhan028 added the improvement label May 18, 2026

Copilot started reviewing on behalf of lvhan028 May 18, 2026 12:10 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

lvhan028 and others added 4 commits May 18, 2026 12:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor proxy server#4596

Refactor proxy server#4596
lvhan028 wants to merge 19 commits into
InternLM:mainfrom
lvhan028:refactor/proxy-server

lvhan028 commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lvhan028 commented May 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants