Extend LoRA for Gemma4#3969
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
2bc8632 to
ab61640
Compare
61626bd to
ef50ff7
Compare
ef50ff7 to
5fd616b
Compare
5fd616b to
6a64bd0
Compare
|
🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
This Pull Request successfully extends LoRA support for Gemma 4 architectures and addresses a critical issue with NNX graph caching in MoE models. However, there is a significant discrepancy between the PR description and the actual changes, as several mentioned files and unit tests are missing from the diff.
🔍 General Feedback
- Missing Implementation: The PR description mentions a "thought channel bypass" in
input_pipeline_utils.pyand new unit tests intests/post_training/unit/sft_data_processing_test.py, but these files are not included in the PR. Please ensure all intended changes are staged and pushed. - Consistency across Trainers: The dynamic disabling of NNX graph caching is a great addition for MoE stability; consider applying this same logic to DPO, RL, and Distillation trainers to ensure consistent behavior across the post-training suite.
- LoRA Targeting: The regex for Gemma 4 LoRA targeting is comprehensive but should be monitored to ensure it doesn't become overly broad as the architecture evolves.
There was a problem hiding this comment.
Are there any tests for Gemma4 Lora?
There was a problem hiding this comment.
We ran the end-to-end LoRA training loop for Gemma 4 successfully without any issues.4 successfully without any issues. log
b52137c to
bc29e4e
Compare
bc29e4e to
090040c
Compare
gagika
left a comment
There was a problem hiding this comment.
please see my comment, it can be also addressed as a follow up PR.
| deepseek2: "decoder/(dense_layers|moe_stack)/self_attention/(query|out|wkv_a|wkv_b)|decoder/(dense_layers|moe_stack)/(mlp|shared_experts)/(wi_0|wi_1|wo)" | ||
| gemma2: "decoder/layers/(self_attention_local|self_attention_global)/(query|key|value|out)|decoder/layers/(mlp_local|mlp_global)/(wi_0|wi_1|wo)" | ||
| gemma3: "decoder/layers/.*(self_attention/(query|key|value|out)|mlp/(wi_0|wi_1|wo|gate|up|down))" | ||
| gemma4: "decoder/(scanned_blocks|layers_remainder)/layers.*/.*(self_attention/(query|key|value|out)|mlp/.*(wi_0|wi_1|wo|shared_experts/(wi_0|wi_1|wo)))" |
There was a problem hiding this comment.
is the scope of the project both scan_layers=true and false, Gemma4 dense and moe?
Suggest adding a CPU unit test that asserts the regex matches the expected module paths for both scan_layers values and dense and moe.
gagika
left a comment
There was a problem hiding this comment.
please make sure follow up items are done, but feel free to merge.
Description
This PR extends the recent LoRA support to accurately target and process Gemma 4 architectures (including MoE).
Gemma 4 introduces complex nested structures (like scanned_blocks and layers_remainder) and unique chat template behaviors (such as the <|channel>thought block) that are incompatible with standard LoRA targeting and data
processing. Furthermore, MoE models require dynamic metadata synchronization during forward passes which is broken by aggressive NNX graph caching.
This PR addresses these challenges by:
Tests
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.