Skip to content

Remove state init#4604

Open
grimoire wants to merge 3 commits into
InternLM:mainfrom
grimoire:remove-state-init
Open

Remove state init#4604
grimoire wants to merge 3 commits into
InternLM:mainfrom
grimoire:remove-state-init

Conversation

@grimoire
Copy link
Copy Markdown
Collaborator

  • fill conv state in model when forward
  • update gate to ignore init state of gdr
  • remove init cache, it take too much times
  • l2 norm before repeat interleave

Copilot AI review requested due to automatic review settings May 20, 2026 10:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes explicit state-cache initialization for the GatedDelta/SSM path by making the model handle “init state” behavior during forward, and optimizes the kv head replication path by applying Q/K L2-normalization before repeat_interleave to reduce overhead.

Changes:

  • Add init-state metadata (is_init, is_init_token) to GatedDeltaMeta, zero conv initial states on init, and mask GDR gate for init tokens.
  • Move kv_ratio replication logic into GatedDelta (and add a helper that normalizes before replication).
  • Remove StateCacheEngine.init_caches and its call site during model forward.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
lmdeploy/pytorch/nn/gated_delta.py Adds init-token handling and moves kv replication + (optional) Q/K L2-norm before replication into the GatedDelta wrapper.
lmdeploy/pytorch/models/qwen3_5.py Stops repeating Q/K in the model and passes kv_ratio into GatedDelta.
lmdeploy/pytorch/engine/model_agent/agent.py Removes the state cache initialization call during forward.
lmdeploy/pytorch/engine/cache_engine.py Removes StateCacheEngine.init_caches implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lmdeploy/pytorch/nn/gated_delta.py Outdated
Comment thread lmdeploy/pytorch/models/qwen3_5.py Outdated

self.is_init = None
self.is_init_token = None
if not self.is_decoding:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it work for dp>1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants