docs(advance): add Add a New Speculative Decoding Method guide#4589
docs(advance): add Add a New Speculative Decoding Method guide#4589SuperMarioYL wants to merge 1 commit into
Conversation
Document the BaseSpecProposer + SPEC_PROPOSERS extension contract so that third parties can add a draft-token proposer without reverse engineering the engine. The existing spec_decoding.md teaches usage for the four shipped methods (eagle, eagle3, deepseek_mtp, qwen3_5_mtp) but does not explain the plug-in surface; users have asked for this in InternLM#1738 and InternLM#4530. Contents follow the same shape as docs/en/advance/pytorch_new_model.md: the registry / base-class / method-string triad, what BaseSpecProposer already implements, a minimal new proposer, the get_outputs contract, when to override build_model (with the in-tree Qwen3_5MTP and Eagle3 examples), and a 5-item shipping checklist. Add the page to docs/en/index.rst under the Advance section right next to spec_decoding.md.
There was a problem hiding this comment.
Pull request overview
Adds a new documentation page that explains how to extend the PyTorch engine's speculative decoding pipeline with a new proposer, and wires it into the docs toctree. This addresses the docs gap referenced by issues #1738 and #4530.
Changes:
- Adds
docs/en/advance/spec_decoding_new_method.mdwalking through theSPEC_PROPOSERSregistry,BaseSpecProposercontract,get_outputsreturn tuple, when to overridebuild_model, and a contributor checklist. - Registers the new page in
docs/en/index.rstnext tospec_decoding.md.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| docs/en/advance/spec_decoding_new_method.md | New guide describing the proposer plug-in contract, with examples mirrored from the in-tree deepseek_mtp, eagle3, and qwen3_5_mtp proposers. |
| docs/en/index.rst | Adds the new doc to the advance toctree. |
Verified against lmdeploy/pytorch/spec_decode/proposers/{base,deepseek_mtp,eagle3,qwen3_5_mtp}.py: registry name, build_specdecode_proposer signature, BaseSpecProposer API surface, the get_outputs 3-tuple, and the Eagle3/Qwen3_5MTP build_model overrides quoted in the doc all match the current code.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @SPEC_PROPOSERS.register_module(name='qwen3_5_mtp') | ||
| class Qwen3_5MTP(DeepseekMTP): | ||
|
|
||
| def build_model(self, empty_init, target_model=None, build_model_ctx=None): |
There was a problem hiding this comment.
one may also need to make changes in lmdeploy/pytorch/configurations and add model definition in lmdeploy/pytorch/models.
Motivation
The PyTorch engine has a clean plug-in surface for speculative decoding
(
BaseSpecProposer+SPEC_PROPOSERSregistry inlmdeploy/pytorch/spec_decode/proposers/base.py), and four shippedmethods register against it:
eagle,eagle3,deepseek_mtp,qwen3_5_mtp. The user-facingdocs/en/advance/spec_decoding.mdteaches usage of those four names but never explains how to add a
fifth, so users have asked the question externally:
Both are open. A short extension-contract page closes the gap without
locking the engine into anything new.
Modification
Add
docs/en/advance/spec_decoding_new_method.mdand a toctree entryfor it in
docs/en/index.rst, right next tospec_decoding.md.The page mirrors the shape of the existing
docs/en/advance/pytorch_new_model.md(which documents the model-patchextension contract):
methodstring triad.build_specdecode_proposerentry point and whyproposers/__init__.pymust import the new class.BaseSpecProposeralready provides so contributors don'tre-implement weight loading, draft forward, decoding-input update,
or fallbacks.
MyMethod(BaseSpecProposer)skeleton with@SPEC_PROPOSERS.register_module(name='my_method').get_outputs(draft token ids,model_metas,target_hidden_states).build_model, illustrated with the two in-treeprecedents (
Qwen3_5MTPshares the target embeddings;Eagle3swaps embeddings conditionally and widens
get_target_hidden_size).No code changes. All snippets and references point to symbols that
exist in
lmdeploy/pytorch/spec_decode/proposers/.BC-breaking
None — docs only.
Use cases
Anyone wanting to add a new draft-token proposer (e.g. the DFlash
method requested in #4530) can now read one page and know which class
to subclass, which method to implement, what to return, and where to
register.
Checklist
pre-commit run --files docs/en/advance/spec_decoding_new_method.md docs/en/index.rstpasses (mdformat, codespell, trailing whitespace, end-of-file, copyright check).spec_decoding.mdand explicitly names the four shipped methods so the new page does not drift from them.Closes (partially) the docs side of #1738 and #4530.