Skip to content

fix(vl): reduce multimodal feature memory use#4603

Open
CUHKSZzxy wants to merge 4 commits into
InternLM:mainfrom
CUHKSZzxy:fix/vlm-mm-feature-dtype-release
Open

fix(vl): reduce multimodal feature memory use#4603
CUHKSZzxy wants to merge 4 commits into
InternLM:mainfrom
CUHKSZzxy:fix/vlm-mm-feature-dtype-release

Conversation

@CUHKSZzxy
Copy link
Copy Markdown
Collaborator

@CUHKSZzxy CUHKSZzxy commented May 20, 2026

Summary

  • Cast floating multimodal processor feature tensors to the resolved PyTorch model dtype before expansion.
  • Drop large multimodal references after scheduler/RPC handoff in the PyTorch serving path.
  • Expose resolved model config through the MP engine wrapper so VL dtype selection also works with MP engine.
  • Log per-request multimodal preprocess execution time from the VL preprocess executor for debugging.

Validation

  • Added focused coverage for multimodal feature dtype handling.
  • Ran syntax checks for the touched VL serving and MP-engine modules.
  • Checked the branch diff for whitespace errors.

Assistance

Assisted with Codex + GPT-5.5 xHigh

@CUHKSZzxy CUHKSZzxy marked this pull request as ready for review May 21, 2026 11:04
Copilot AI review requested due to automatic review settings May 21, 2026 11:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces memory pressure in the VL (multimodal) serving path by aligning multimodal feature tensor dtypes with the resolved PyTorch model dtype, and by dropping large multimodal references earlier after handoff through the scheduler/RPC layers.

Changes:

  • Cast floating multimodal processor outputs (e.g., pixel_values) to the resolved model dtype during VL preprocessing.
  • Drop large multimodal/RPC payload references earlier in async serving and MP-engine RPC to lower peak memory.
  • Expose MP-engine model_config to enable VL dtype selection, and add timing logs + focused tests for dtype handling.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_lmdeploy/test_vl/test_mm_feature_dtype.py Adds tests for casting only floating MM tensors + MP engine model_config exposure.
lmdeploy/vl/model/base.py Introduces MM feature dtype normalization/casting during preprocessing.
lmdeploy/vl/engine.py Adds mm_feature_dtype plumbing into ImageEncoder and logs preprocess duration.
lmdeploy/serve/processors/multimodal.py Threads request_id into VL preprocessing calls (but has a positional-arg bug).
lmdeploy/serve/core/vl_async_engine.py Picks resolved model dtype from engine model_config and passes to ImageEncoder.
lmdeploy/serve/core/async_engine.py Drops multimodal from kwargs after generator creation; passes request_id into prompt processing.
lmdeploy/pytorch/engine/mp_engine/zmq_rpc.py Clears large RPC payload references (e.g., multimodal, pickled blobs) after handoff.
lmdeploy/pytorch/engine/mp_engine/base.py Exposes model_config and drops multimodal from streaming kwargs.
lmdeploy/pytorch/engine/mp_engine/base_worker.py Adds worker RPC method to return resolved model_config.
lmdeploy/pytorch/engine/engine_instance.py Clears local references to msg/multimodal after enqueueing request.
Comments suppressed due to low confidence (1)

lmdeploy/serve/processors/multimodal.py:406

  • Same positional-argument issue as above: vl_encoder.preprocess(messages, mm_processor_kwargs, ...) binds mm_processor_kwargs to input_prompt. This will fail for models that use the new preprocess API. Use keyword arguments (mm_processor_kwargs=...) or explicitly pass input_prompt=None and keep mm_processor_kwargs as the third arg.
            else:
                results = await self.vl_encoder.preprocess(messages, mm_processor_kwargs, request_id=request_id)
                results = await self.vl_encoder.wrap_for_pytorch(messages=results,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lmdeploy/serve/processors/multimodal.py
Comment thread lmdeploy/vl/engine.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants