Skip to content

[New Model Bringup] Initial Commit to enable Text-only architecture for Qwen3.5#3712

Open
Rohan-Bierneni wants to merge 1 commit intomainfrom
rbierneni-qwen35-text
Open

[New Model Bringup] Initial Commit to enable Text-only architecture for Qwen3.5#3712
Rohan-Bierneni wants to merge 1 commit intomainfrom
rbierneni-qwen35-text

Conversation

@Rohan-Bierneni
Copy link
Copy Markdown
Collaborator

@Rohan-Bierneni Rohan-Bierneni commented Apr 21, 2026

Description

This pr enables maxtext to run train workloads on the text only architecture of qwen3.5, which is identical to that of qwen3-next. This current pr enables the model configs for the largest MoE model in the family: qwen3.5-397b-a17b. Other models part of the family will be added later on.

Tests

Will run a training workload and verify that the loss decreases correctly and will add a train_compile test for this new model

Update:

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 90.09009% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/layers/decoders.py 28.57% 1 Missing and 4 partials ⚠️
src/maxtext/models/qwen3_5.py 95.60% 2 Missing and 2 partials ⚠️
src/maxtext/layers/nnx_decoders.py 0.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen35-text branch from 5f80704 to 7606921 Compare April 21, 2026 18:52
Comment thread src/maxtext/configs/models/qwen3.5-397b-a17b.yml Outdated
Comment thread src/maxtext/configs/models/qwen3.5-397b-a17b.yml Outdated
Comment thread src/maxtext/configs/models/qwen3.5-397b-a17b.yml
@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen35-text branch from 7606921 to 2db8a8f Compare April 29, 2026 18:07
Add config file for 397B model

update attentions.py with new decoder block type

Update other files with new model to ensure model initialization is correct

Update decoder block type

Train Compile test is passing

resolve nits in config file formatting

resolve formatting errors

Fix conflict in maxtext_utils

Fix linter errors

Fix linter errors

Fix linter errors

Ran pyink locally for formatting
@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen35-text branch from 7e8941f to 056b165 Compare April 29, 2026 18:47
num_experts_per_tok: 10
norm_topk_prob: True

# Qwen3-Next Specific Parameters for Linear Attention (Gated Delta Net)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can probably just mention that it's Gated Delta Net parameters and drop qwen3-next ( which might be confusing ) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants