Skip to content

fix: model dtype is not same as lora dtype in FSDP train#183

Open
0hujun wants to merge 2 commits intomodelscope:mainfrom
0hujun:main
Open

fix: model dtype is not same as lora dtype in FSDP train#183
0hujun wants to merge 2 commits intomodelscope:mainfrom
0hujun:main

Conversation

@0hujun
Copy link
Copy Markdown

@0hujun 0hujun commented Apr 23, 2026

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

As described in #182,in ascend npu, model param dtype is bf16, but create lora param dtype is fp32 as default,that got AssertionError: FSDP expects uniform original parameter dtype but got FSDP expects uniform original parameter dtype but got {torch.bfloat16, torch.float32}
So, when lora param has created, convert all LoRA parameters to the base model dtype.

Experiment results

Train fine as usual

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the _ensure_lora_dtype method to align LoRA parameter data types with the base model, ensuring compatibility with FSDP2. Review feedback suggests several improvements, including more robust detection of the base data type to handle mixed precision, narrowing exception handling, and wrapping parameter updates in torch.no_grad() for safety.

Comment thread src/twinkle/model/transformers/transformers.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant