fix: CPU OOM issue during LoRA training by SongwuJob · Pull Request #41 · OpenMOSS/MOVA

SongwuJob · 2026-02-27T08:08:49Z

When conducting fine-tuning with the provided LoRA training script, CPU memory usage continuously increases over time and eventually the process is killed by the system due to out-of-memory (OOM).

The issue is caused by enabling torch.cuda.memory._record_memory_history(enabled="all"), which records CUDA memory events and stores them on the CPU. As training progresses, the accumulated memory history leads to excessive CPU memory consumption, resulting in CPU OOM.

fix: CPU OOM issue during LoRA training

c967bff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: CPU OOM issue during LoRA training#41

fix: CPU OOM issue during LoRA training#41
SongwuJob wants to merge 1 commit intoOpenMOSS:mainfrom
SongwuJob:main

SongwuJob commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SongwuJob commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant