fix(vl): reduce multimodal feature memory use by CUHKSZzxy · Pull Request #4603 · InternLM/lmdeploy

CUHKSZzxy · 2026-05-20T08:53:41Z

Summary

Cast floating multimodal processor feature tensors to the resolved PyTorch model dtype before expansion.
Drop large multimodal references after scheduler/RPC handoff in the PyTorch serving path.
Expose resolved model config through the MP engine wrapper so VL dtype selection also works with MP engine.
Log per-request multimodal preprocess execution time from the VL preprocess executor for debugging.

Validation

Added focused coverage for multimodal feature dtype handling.
Ran syntax checks for the touched VL serving and MP-engine modules.
Checked the branch diff for whitespace errors.

Assistance

Assisted with Codex + GPT-5.5 xHigh

Copilot

Pull request overview

This PR reduces memory pressure in the VL (multimodal) serving path by aligning multimodal feature tensor dtypes with the resolved PyTorch model dtype, and by dropping large multimodal references earlier after handoff through the scheduler/RPC layers.

Changes:

Cast floating multimodal processor outputs (e.g., pixel_values) to the resolved model dtype during VL preprocessing.
Drop large multimodal/RPC payload references earlier in async serving and MP-engine RPC to lower peak memory.
Expose MP-engine model_config to enable VL dtype selection, and add timing logs + focused tests for dtype handling.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_lmdeploy/test_vl/test_mm_feature_dtype.py	Adds tests for casting only floating MM tensors + MP engine model_config exposure.
lmdeploy/vl/model/base.py	Introduces MM feature dtype normalization/casting during preprocessing.
lmdeploy/vl/engine.py	Adds `mm_feature_dtype` plumbing into `ImageEncoder` and logs preprocess duration.
lmdeploy/serve/processors/multimodal.py	Threads `request_id` into VL preprocessing calls (but has a positional-arg bug).
lmdeploy/serve/core/vl_async_engine.py	Picks resolved model dtype from engine `model_config` and passes to `ImageEncoder`.
lmdeploy/serve/core/async_engine.py	Drops `multimodal` from kwargs after generator creation; passes `request_id` into prompt processing.
lmdeploy/pytorch/engine/mp_engine/zmq_rpc.py	Clears large RPC payload references (e.g., `multimodal`, pickled blobs) after handoff.
lmdeploy/pytorch/engine/mp_engine/base.py	Exposes `model_config` and drops `multimodal` from streaming kwargs.
lmdeploy/pytorch/engine/mp_engine/base_worker.py	Adds worker RPC method to return resolved `model_config`.
lmdeploy/pytorch/engine/engine_instance.py	Clears local references to `msg`/`multimodal` after enqueueing request.

Comments suppressed due to low confidence (1)

lmdeploy/serve/processors/multimodal.py:406

Same positional-argument issue as above: vl_encoder.preprocess(messages, mm_processor_kwargs, ...) binds mm_processor_kwargs to input_prompt. This will fail for models that use the new preprocess API. Use keyword arguments (mm_processor_kwargs=...) or explicitly pass input_prompt=None and keep mm_processor_kwargs as the third arg.

            else:
                results = await self.vl_encoder.preprocess(messages, mm_processor_kwargs, request_id=request_id)
                results = await self.vl_encoder.wrap_for_pytorch(messages=results,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

CUHKSZzxy added 2 commits May 18, 2026 12:01

fix(vl): reduce multimodal feature memory use

b83addf

debug: log vl preprocess duration

23ee3b7

CUHKSZzxy marked this pull request as ready for review May 21, 2026 11:04

Copilot AI review requested due to automatic review settings May 21, 2026 11:04

Copilot started reviewing on behalf of CUHKSZzxy May 21, 2026 11:04 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread lmdeploy/serve/processors/multimodal.py

Comment thread lmdeploy/vl/engine.py

CUHKSZzxy added 2 commits May 21, 2026 20:15

fix: address multimodal preprocess review comments

2cde2e5

test: remove multimodal preprocess regression test

5258e2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vl): reduce multimodal feature memory use#4603

fix(vl): reduce multimodal feature memory use#4603
CUHKSZzxy wants to merge 4 commits into
InternLM:mainfrom
CUHKSZzxy:fix/vlm-mm-feature-dtype-release

CUHKSZzxy commented May 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CUHKSZzxy commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Assistance

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUHKSZzxy commented May 20, 2026 •

edited

Loading