Fix: Square atom_norm in non-Huber energy and virial loss calculations.#5332
Fix: Square atom_norm in non-Huber energy and virial loss calculations.#5332anyangml wants to merge 21 commits intodeepmodeling:masterfrom
Conversation
for more information, see https://pre-commit.ci
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Fixes normalization in energy/virial MSE loss terms so that per-atom scaling is consistent with the Huber-loss code paths and the reported per-atom RMSE metrics.
Changes:
- Square
atom_norm/atom_norm_enerin non-Huber (MSE) energy loss accumulation. - Square
atom_normin non-Huber (MSE) virial loss accumulation. - Apply the same normalization fix across TF, PyTorch, Paddle, and
dpmodelbackends for the updated loss implementations.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
deepmd/tf/loss/ener.py |
Squares atom normalization in non-Huber MSE energy/virial loss accumulation (EnerStdLoss path). |
deepmd/pt/loss/ener.py |
Squares atom_norm in non-Huber MSE energy and virial loss accumulation. |
deepmd/pd/loss/ener.py |
Squares atom_norm in non-Huber MSE energy and virial loss accumulation. |
deepmd/dpmodel/loss/ener.py |
Squares atom normalization in non-Huber MSE energy and virial loss accumulation in the array-API backend. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
deepmd/dpmodel/loss/ener.py (1)
248-248: Add a regression test for the new normalization rule.Because the displayed RMSE stays unchanged, this
1/Nvs1/N^2mistake is easy to reintroduce. A tinyEnergyLoss.call()case withnatoms > 1and fixed energy/virial deltas would lock it down.Also applies to: 315-315
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/dpmodel/loss/ener.py` at line 248, The energy normalization is currently prone to a 1/N vs 1/N^2 regression; add a unit/regression test targeting EnergyLoss.call() in deepmd/dpmodel/loss/ener.py that constructs a small batch with natoms > 1, fixed per-atom energy deltas and virials, and asserts the computed energy loss scales with 1/N (not 1/N^2) by comparing expected loss values for different natom counts; reference EnergyLoss.call() and the atom_norm_ener accumulation (the expression involving atom_norm_ener**2 * (pref_e * l2_ener_loss)) to ensure the test fails if the normalization is squared incorrectly and include an explicit check for both energy-only and energy+virial modes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/tf/loss/ener.py`:
- Line 349: EnerSpinLoss.build and EnerDipoleLoss.build still use the old
single-factor 1/N normalization; update both to use the same squared per-atom
normalization pattern used in EnerStdLoss (i.e., multiply the loss term by
atom_norm_ener**2 * (pref_e * l2_ener_loss) instead of a single atom_norm_ener
factor) so all energy loss variants apply the corrected normalization
consistently; locate the energy-loss accumulation in EnerSpinLoss.build and
EnerDipoleLoss.build and replace the old single-factor scaling with the squared
atom_norm_ener**2 * (pref_e * l2_ener_loss) pattern used in EnerStdLoss.
---
Nitpick comments:
In `@deepmd/dpmodel/loss/ener.py`:
- Line 248: The energy normalization is currently prone to a 1/N vs 1/N^2
regression; add a unit/regression test targeting EnergyLoss.call() in
deepmd/dpmodel/loss/ener.py that constructs a small batch with natoms > 1, fixed
per-atom energy deltas and virials, and asserts the computed energy loss scales
with 1/N (not 1/N^2) by comparing expected loss values for different natom
counts; reference EnergyLoss.call() and the atom_norm_ener accumulation (the
expression involving atom_norm_ener**2 * (pref_e * l2_ener_loss)) to ensure the
test fails if the normalization is squared incorrectly and include an explicit
check for both energy-only and energy+virial modes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a73a40cd-4fda-4353-8c33-d0685030359b
📒 Files selected for processing (4)
deepmd/dpmodel/loss/ener.pydeepmd/pd/loss/ener.pydeepmd/pt/loss/ener.pydeepmd/tf/loss/ener.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5332 +/- ##
==========================================
- Coverage 80.46% 80.46% -0.01%
==========================================
Files 820 820
Lines 86005 86050 +45
Branches 4139 4140 +1
==========================================
+ Hits 69207 69239 +32
- Misses 15521 15535 +14
+ Partials 1277 1276 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…alization Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/98719546-6f9c-433e-a3de-dc65d98c68fb Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/98719546-6f9c-433e-a3de-dc65d98c68fb Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/8cff4193-b092-4923-afce-df5de29d63d3 Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/0be1bd64-1461-465b-8cc2-8bc019babc04 Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/6f9609a4-4571-42ca-b057-d44f7614df9c Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
…tion-fix feat(loss): add intensive parameter for backward-compatible RMSE normalization
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Some AI suggestions are needed to be resovled. Others LGTM. |
…rage Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/5ebcf063-763f-49b3-a5cb-d4832afba064 Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
…ew-comments fix(loss): clarify intensive normalization docs and add test coverage
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/c77bb90b-6600-4466-9c55-7d17ff7bf76a Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
…d-5332 fix: complete EnerSpinLoss serialization and improve intensive normalization tests
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Anyang Peng <137014849+anyangml@users.noreply.github.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deepmd/pd/loss/ener.py (1)
420-430:⚠️ Potential issue | 🟠 MajorKeep
intensivescoped to MSE virial loss.Line 430 applies
norm_expeven whenloss_func="mae"because the Paddle virial block is outside theloss_funcswitch. With virial prefactors enabled,intensive=Truechanges MAE-configured training from legacy1/Nto1/N², contradicting the MSE-only behavior described by the new option.🐛 Proposed minimal fix
- if not self.use_huber: - loss += atom_norm**norm_exp * (pref_v * l2_virial_loss) + if not self.use_huber: + virial_norm_exp = norm_exp if self.loss_func == "mse" else 1 + loss += atom_norm**virial_norm_exp * (pref_v * l2_virial_loss)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pd/loss/ener.py` around lines 420 - 430, The virial term currently applies intensity scaling atom_norm**norm_exp regardless of configured loss type, causing MAE runs to get 1/N² scaling; restrict the intensive scaling to MSE-only: inside the virial block (where find_virial, pref_v, diff_v, l2_virial_loss are computed) keep computing l2_virial_loss and adding more_loss as-is, but only multiply by atom_norm**norm_exp when adding to loss if the configured loss is MSE (e.g. check self.loss_func == "mse" or equivalent) — otherwise add the virial contribution using pref_v * l2_virial_loss without the atom_norm**norm_exp factor; keep the existing self.use_huber and self.inference checks intact.
♻️ Duplicate comments (1)
deepmd/tf/loss/ener.py (1)
1135-1138:⚠️ Potential issue | 🟡 Minor
EnerDipoleLoss.build()still uses single-factor1/Nnormalization.
EnerStdLossandEnerSpinLosswere updated to applyatom_norm**norm_exp, butEnerDipoleLoss.build()at line 1137 still doesatom_norm_ener * (pref_e * l2_ener_loss). If the intent is to normalize "all L2 loss" consistently (per the PR description), this branch should be updated too — either by threading anintensiveflag throughEnerDipoleLoss.__init__or by squaringatom_norm_enerunconditionally if the dipole loss is meant to always be intensive.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/tf/loss/ener.py` around lines 1135 - 1138, EnerDipoleLoss.build() currently multiplies L2 terms by atom_norm_ener (single-factor 1/N) while EnerStdLoss/EnerSpinLoss apply atom_norm**norm_exp; update EnerDipoleLoss to match: add/propagate the same normalization parameter (e.g., norm_exp or an intensive flag) in EnerDipoleLoss.__init__ and replace the line using atom_norm_ener * (pref_e * l2_ener_loss) with atom_norm_ener**norm_exp * (pref_e * l2_ener_loss) (and similarly adjust any other uses like pref_ed * l2_ener_dipole_loss) so dipole L2 losses are normalized consistently with the other energy loss classes.
🧹 Nitpick comments (1)
deepmd/tf/loss/ener.py (1)
1047-1054:EnerSpinLoss.deserializeaccepts@version=1that was never emitted.
serializeis new and always writes@version=2; there is no on-disk v1 payload for this class.check_version_compatibility(version, 2, 1)plus theif version < 2backfill is therefore dead code. Consider tightening tocheck_version_compatibility(version, 2, 2)and removing the backfill, unless you want to stay lenient for hand-authored configs (in which case, fine to keep as-is).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/tf/loss/ener.py` around lines 1047 - 1054, EnerSpinLoss.deserialize currently accepts a never-emitted v1 payload and contains dead backfill; update the compatibility check to require exactly v2 and remove the obsolete backfill: change the call check_version_compatibility(version, 2, 1) to check_version_compatibility(version, 2, 2) in EnerSpinLoss.deserialize and delete the conditional block that backfills "intensive" when version < 2 (the versioned serialize always writes `@version`=2 so the backfill is unnecessary).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@deepmd/pd/loss/ener.py`:
- Around line 420-430: The virial term currently applies intensity scaling
atom_norm**norm_exp regardless of configured loss type, causing MAE runs to get
1/N² scaling; restrict the intensive scaling to MSE-only: inside the virial
block (where find_virial, pref_v, diff_v, l2_virial_loss are computed) keep
computing l2_virial_loss and adding more_loss as-is, but only multiply by
atom_norm**norm_exp when adding to loss if the configured loss is MSE (e.g.
check self.loss_func == "mse" or equivalent) — otherwise add the virial
contribution using pref_v * l2_virial_loss without the atom_norm**norm_exp
factor; keep the existing self.use_huber and self.inference checks intact.
---
Duplicate comments:
In `@deepmd/tf/loss/ener.py`:
- Around line 1135-1138: EnerDipoleLoss.build() currently multiplies L2 terms by
atom_norm_ener (single-factor 1/N) while EnerStdLoss/EnerSpinLoss apply
atom_norm**norm_exp; update EnerDipoleLoss to match: add/propagate the same
normalization parameter (e.g., norm_exp or an intensive flag) in
EnerDipoleLoss.__init__ and replace the line using atom_norm_ener * (pref_e *
l2_ener_loss) with atom_norm_ener**norm_exp * (pref_e * l2_ener_loss) (and
similarly adjust any other uses like pref_ed * l2_ener_dipole_loss) so dipole L2
losses are normalized consistently with the other energy loss classes.
---
Nitpick comments:
In `@deepmd/tf/loss/ener.py`:
- Around line 1047-1054: EnerSpinLoss.deserialize currently accepts a
never-emitted v1 payload and contains dead backfill; update the compatibility
check to require exactly v2 and remove the obsolete backfill: change the call
check_version_compatibility(version, 2, 1) to
check_version_compatibility(version, 2, 2) in EnerSpinLoss.deserialize and
delete the conditional block that backfills "intensive" when version < 2 (the
versioned serialize always writes `@version`=2 so the backfill is unnecessary).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 977b2962-fd77-4fe5-bea8-4beb3c792f8a
📒 Files selected for processing (9)
deepmd/dpmodel/loss/ener.pydeepmd/dpmodel/loss/ener_spin.pydeepmd/pd/loss/ener.pydeepmd/pt/loss/ener.pydeepmd/pt/loss/ener_spin.pydeepmd/tf/loss/ener.pydeepmd/utils/argcheck.pysource/tests/consistent/loss/test_ener.pysource/tests/consistent/loss/test_ener_spin.py
🚧 Files skipped from review as they are similar to previous changes (2)
- deepmd/pt/loss/ener.py
- deepmd/dpmodel/loss/ener.py
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
🧹 Nitpick comments (2)
doc/model/train-energy.md (2)
149-150: Add a short migration note about prefactor retuning.When users switch
intensivefromfalsetotrue, the effective scale of energy/virial loss terms changes; documenting thatstart_pref_e/vandlimit_pref_e/vmay need retuning would make adoption safer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@doc/model/train-energy.md` around lines 149 - 150, Add a short migration note to the intensive option documentation explaining that toggling intensive from false to true changes the effective scale of energy and virial loss terms (when loss_func="mse"), and that users should consider retuning the prefactors start_pref_e, start_pref_v, limit_pref_e, and limit_pref_v accordingly; update the paragraph describing intensive to include a sentence recommending checking and possibly adjusting those prefactor settings when migrating to intensive=true.
134-137: Consider clarifying why the example keepsintensive: false.Since Line 149 describes intensive mode as recommended for multitask use, a short note here (e.g., “
falseshown for backward-compatible reproduction of legacy behavior”) would reduce ambiguity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@doc/model/train-energy.md` around lines 134 - 137, Update the example JSON block that sets "intensive": false to include a brief clarifying note explaining why false is used (e.g., for backward-compatible reproduction of legacy behavior or to show default non-intensive mode), so readers aren't confused by the later recommendation of intensive mode for multitask use; add the note immediately adjacent to the example or as an inline comment/short sentence in the surrounding text referencing the "intensive" field to make the rationale explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@doc/model/train-energy.md`:
- Around line 149-150: Add a short migration note to the intensive option
documentation explaining that toggling intensive from false to true changes the
effective scale of energy and virial loss terms (when loss_func="mse"), and that
users should consider retuning the prefactors start_pref_e, start_pref_v,
limit_pref_e, and limit_pref_v accordingly; update the paragraph describing
intensive to include a sentence recommending checking and possibly adjusting
those prefactor settings when migrating to intensive=true.
- Around line 134-137: Update the example JSON block that sets "intensive":
false to include a brief clarifying note explaining why false is used (e.g., for
backward-compatible reproduction of legacy behavior or to show default
non-intensive mode), so readers aren't confused by the later recommendation of
intensive mode for multitask use; add the note immediately adjacent to the
example or as an inline comment/short sentence in the surrounding text
referencing the "intensive" field to make the rationale explicit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 041f290d-4c28-47d4-8f1c-3c358c084621
📒 Files selected for processing (2)
doc/model/train-energy-spin.mddoc/model/train-energy.md
There was a problem hiding this comment.
🧹 Nitpick comments (1)
deepmd/pt/loss/ener.py (1)
653-660: Minor: thesetdefaultis redundant with the__init__default.Since
EnergyStdLoss.__init__already defaultsintensive=False, the explicitsetdefault("intensive", False)forversion < 3doesn't change behavior. It's harmless as a self-documenting guard, but can be simplified if you prefer. Not blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt/loss/ener.py` around lines 653 - 660, Remove the redundant backward-compatibility setdefault call: delete the line data.setdefault("intensive", False) in the classmethod that builds the instance (the block ending with return cls(**data)) because EnergyStdLoss.__init__ already defaults intensive=False; keep the version check if you want to retain the semantic note but avoid mutating data unnecessarily.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@deepmd/pt/loss/ener.py`:
- Around line 653-660: Remove the redundant backward-compatibility setdefault
call: delete the line data.setdefault("intensive", False) in the classmethod
that builds the instance (the block ending with return cls(**data)) because
EnergyStdLoss.__init__ already defaults intensive=False; keep the version check
if you want to retain the semantic note but avoid mutating data unnecessarily.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dcbf6d99-6149-4952-9fc1-feee972ecd2c
📒 Files selected for processing (6)
deepmd/dpmodel/loss/ener_spin.pydeepmd/pd/loss/ener.pydeepmd/pt/loss/ener.pydeepmd/pt/loss/ener_spin.pydeepmd/tf/loss/ener.pydeepmd/utils/argcheck.py
🚧 Files skipped from review as they are similar to previous changes (3)
- deepmd/utils/argcheck.py
- deepmd/pd/loss/ener.py
- deepmd/pt/loss/ener_spin.py
|
@anyangml The code where I commented was not changed?
|
Signed-off-by: Anyang Peng <137014849+anyangml@users.noreply.github.com>
| intensive : bool | ||
| If true, the MSE energy and virial terms use intensive normalization, | ||
| i.e. an additional normalization by the square of the number of atoms | ||
| (1/N^2) instead of the legacy (1/N) behavior. This keeps those MSE loss | ||
| terms consistent with per-atom RMSE reporting and less dependent on | ||
| system size. This option does not change the MAE formulation, which is | ||
| handled separately. The default is false for backward compatibility with | ||
| models trained using deepmd-kit <= 3.1.3. |
There was a problem hiding this comment.
intensive -> intensive_ener_virial
to be more self explanatory.

This is to normalize all L2 loss so that in multitask training, the scale of gradients becomes independent of system size. Do we want to update pref in examples? @iProzd @njzjz
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
intensiveoption.Tests