Skip to content

support qwen3.5 moe padding_free#226

Open
meichangsu1 wants to merge 3 commits into
modelscope:mainfrom
meichangsu1:qwe35_moe
Open

support qwen3.5 moe padding_free#226
meichangsu1 wants to merge 3 commits into
modelscope:mainfrom
meichangsu1:qwe35_moe

Conversation

@meichangsu1

Copy link
Copy Markdown
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

support qwen3.5 moe padding_free

Experiment results

Paste your experiment result here(if needed).
[2026-06-16 03:39:14][INFO:twinkle] Current is step 0 of 63, metric: {'loss': '6.5880', 'grad_norm': '6.843750', 'learning rate(param group 1)': '2.000000e-05', 'learning rate(param group 2)': '2.000000e-05', 'iters': 0, 'total time elapse': '3.6 minutes', 'speed': '0.00 iters/s'}
[2026-06-16 03:39:58][INFO:twinkle] Current is step 20 of 63, metric: {'loss': '5.0045', 'grad_norm': '3.125000', 'learning rate(param group 1)': '8.236931e-05', 'learning rate(param group 2)': '8.236931e-05', 'iters': 20, 'total time elapse': '257 seconds', 'speed': '0.46 iters/s'}
[2026-06-16 03:40:26][INFO:twinkle] Current is step 40 of 63, metric: {'loss': '2.9167', 'grad_norm': '3.218750', 'learning rate(param group 1)': '3.149309e-05', 'learning rate(param group 2)': '3.149309e-05', 'iters': 40, 'total time elapse': '285 seconds', 'speed': '0.71 iters/s'}
[2026-06-16 03:40:55][INFO:twinkle] Current is step 60 of 63, metric: {'loss': '2.4172', 'grad_norm': '5.468750', 'learning rate(param group 1)': '2.931021e-07', 'learning rate(param group 2)': '2.931021e-07', 'iters': 60, 'total time elapse': '314 seconds', 'speed': '0.70 iters/s'}

@meichangsu1 meichangsu1 marked this pull request as draft June 16, 2026 09:06

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends support for Qwen3.5-MoE models by generalizing the padding-free patching mechanism and sequence parallel strategies to handle both dense and MoE variants of Qwen3.5. Key changes include dynamically importing modeling functions and classes, iterating over multiple class pairs (dense and MoE) during patching, and adding comprehensive alignment tests for Qwen3.5-MoE. The code review feedback highlights two important issues: first, the patched DecoderLayer.forward should return a tuple (including router logits for MoE) to match Hugging Face's expected return format and prevent runtime crashes; second, a fallback import is needed for apply_mask_to_padding_states since it may not be present in the MoE modeling module.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/twinkle/patch/gdn_padding_free.py
@meichangsu1 meichangsu1 marked this pull request as ready for review June 16, 2026 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant