[New Model Bringup] Initial Commit to enable Text-only architecture for Qwen3.5 by Rohan-Bierneni · Pull Request #3712 · AI-Hypercomputer/maxtext

Rohan-Bierneni · 2026-04-21T18:04:17Z

Description

This pr enables maxtext to run train workloads on the text only architecture of qwen3.5, which is identical to that of qwen3-next. This current pr enables the model configs for the largest MoE model in the family: qwen3.5-397b-a17b. Other models part of the family will be added later on.

Tests

Will run a training workload and verify that the loss decreases correctly and will add a train_compile test for this new model

Update:

Ran a train workload on a mini config: the 122b MoE model. The only differences within configs are as follows:
- base_emb_dim: 4096 -> 3072
- base_num_decoder_layers: 60 -> 48
- num_experts: 512 -> 256
- num_experts_per_tok: 10 -> 8
  The loss decreases slowly and training is stable:
  Command: https://paste.googleplex.com/5762710475243520#l=15
  Logs: https://paste.googleplex.com/5093482696933376
Train compile test is passing

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-04-21T18:31:02Z

Codecov Report

❌ Patch coverage is 90.09009% with 11 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/layers/decoders.py	28.57%	1 Missing and 4 partials ⚠️
src/maxtext/models/qwen3_5.py	95.60%	2 Missing and 2 partials ⚠️
src/maxtext/layers/nnx_decoders.py	0.00%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Add config file for 397B model update attentions.py with new decoder block type Update other files with new model to ensure model initialization is correct Update decoder block type Train Compile test is passing resolve nits in config file formatting resolve formatting errors Fix conflict in maxtext_utils Fix linter errors Fix linter errors Fix linter errors Ran pyink locally for formatting

parambole · 2026-04-29T23:16:09Z

+num_experts_per_tok: 10
+norm_topk_prob: True
+
+# Qwen3-Next Specific Parameters for Linear Attention (Gated Delta Net)


Nit: we can probably just mention that it's Gated Delta Net parameters and drop qwen3-next ( which might be confusing ) ?

Rohan-Bierneni force-pushed the rbierneni-qwen35-text branch from 5f80704 to 7606921 Compare April 21, 2026 18:52

entrpn approved these changes Apr 23, 2026

View reviewed changes

parambole reviewed Apr 23, 2026

View reviewed changes

Comment thread src/maxtext/configs/models/qwen3.5-397b-a17b.yml Outdated

parambole reviewed Apr 23, 2026

View reviewed changes

Comment thread src/maxtext/configs/models/qwen3.5-397b-a17b.yml Outdated

parambole reviewed Apr 23, 2026

View reviewed changes

Comment thread src/maxtext/configs/models/qwen3.5-397b-a17b.yml

Rohan-Bierneni force-pushed the rbierneni-qwen35-text branch from 7606921 to 2db8a8f Compare April 29, 2026 18:07

Rohan-Bierneni force-pushed the rbierneni-qwen35-text branch from 7e8941f to 056b165 Compare April 29, 2026 18:47

parambole reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Model Bringup] Initial Commit to enable Text-only architecture for Qwen3.5#3712

[New Model Bringup] Initial Commit to enable Text-only architecture for Qwen3.5#3712
Rohan-Bierneni wants to merge 1 commit intomainfrom
rbierneni-qwen35-text

Rohan-Bierneni commented Apr 21, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

parambole Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Rohan-Bierneni commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

parambole Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rohan-Bierneni commented Apr 21, 2026 •

edited

Loading

codecov Bot commented Apr 21, 2026 •

edited

Loading