fix(pt_expt): respect model default_fparam in data requirement#5417
Conversation
pt_expt's get_additional_data_requirement hard-coded default=0.0 for fparam, ignoring the model's default_fparam. For datasets without fparam.npy this fed zeros to the fitting net while pt fed the model default — after fparam normalisation a constant DC offset (e.g. [0, -100]) was injected every step, slowing multi-task training convergence. Mirror pt's logic so must/default come from the model.
📝 WalkthroughWalkthroughThe implementation of Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
source/tests/pt_expt/test_training.py (1)
564-587: Use a tensor default to cover the real model API.The fake currently returns a Python list, but real models expose
get_default_fparam()astorch.Tensor | None. Returning a tensor here makes this regression test cover the conversion path used in production.🧪 Proposed test adjustment
- def get_default_fparam(self) -> list[float]: - return [0.0, 1.0] + def get_default_fparam(self) -> torch.Tensor: + device = "cuda" if torch.cuda.is_available() else "cpu" + return torch.tensor([0.0, 1.0], device=device)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/pt_expt/test_training.py` around lines 564 - 587, The test's fake model class _M should return a torch.Tensor from get_default_fparam to match the real model API and exercise the conversion path in get_additional_data_requirement; update _M.get_default_fparam to return torch.tensor([0.0, 1.0]) (and import torch in the test) so fparam_req.default is produced from a tensor and the existing np.testing.assert_array_equal still verifies the values for fparam_req in the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/pt_expt/train/training.py`:
- Around line 122-132: The default fparam may be a CUDA torch.Tensor causing
np.asarray(cuda_tensor) to throw; update the code around
has_default_fparam/_model.get_default_fparam() and the fparam_default assignment
so that if get_default_fparam() returns a torch.Tensor you first detach it and
move it to CPU then convert to numpy (e.g. value.detach().cpu().numpy()), and if
it returns None keep 0.0; ensure this converted numpy array is what you pass as
default into DataRequirementItem("fparam", _model.get_dim_fparam(), ...).
---
Nitpick comments:
In `@source/tests/pt_expt/test_training.py`:
- Around line 564-587: The test's fake model class _M should return a
torch.Tensor from get_default_fparam to match the real model API and exercise
the conversion path in get_additional_data_requirement; update
_M.get_default_fparam to return torch.tensor([0.0, 1.0]) (and import torch in
the test) so fparam_req.default is produced from a tensor and the existing
np.testing.assert_array_equal still verifies the values for fparam_req in the
test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: aef05d71-a0a9-48fb-9e30-927287910285
📒 Files selected for processing (2)
deepmd/pt_expt/train/training.pysource/tests/pt_expt/test_training.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5417 +/- ##
=======================================
Coverage 80.46% 80.47%
=======================================
Files 821 821
Lines 86075 86076 +1
Branches 4140 4139 -1
=======================================
+ Hits 69263 69266 +3
+ Misses 15536 15534 -2
Partials 1276 1276 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
aa44999
Summary
pt_expt'sget_additional_data_requirementhard-codeddefault=0.0for thefparamdata item, ignoring the model'sdefault_fparam. For datasets withoutfparam.npy, the data system fed zeros to the fitting net whileptcorrectly fed the model's default — after fparam normalisation a constant DC offset (e.g.[0, -100]when stats areavg=[0,1],inv_std=[100,100]) was injected on every step, slowing multi-task training convergence vs theptbaseline.pt's logic (deepmd/pt/train/training.py:1762-1779): pulldefault_fparamfrom the model and setmust=not has_default_fparam. The fparam-present path is unchanged (find_fparam=1.0→ the default value is unused).has_default_fparamtrue / false) per the project's UT-coverage rule.Test plan
pytest source/tests/pt_expt/test_training.py::TestAdditionalDataRequirement -v— both new branches pass.pytest source/tests/pt_expt/test_training.py::TestTraining -v— 4 existing training-loop tests still pass.pytest source/tests/pt_expt/test_training.py::TestCompiledVaryingNframesWithParams -v— fparam-present compile path withnumb_fparam=2still passes.Summary by CodeRabbit
Release Notes
Bug Fixes
Tests