Skip to content

fix(pt): clone inference coords before enabling grad#5476

Open
njzjz-bot wants to merge 8 commits into
deepmodeling:masterfrom
njzjz-bothub:fix-5165-torch-lammps-no-grad
Open

fix(pt): clone inference coords before enabling grad#5476
njzjz-bot wants to merge 8 commits into
deepmodeling:masterfrom
njzjz-bothub:fix-5165-torch-lammps-no-grad

Conversation

@njzjz-bot
Copy link
Copy Markdown
Contributor

@njzjz-bot njzjz-bot commented May 29, 2026

Summary

  • clone PyTorch atomic-model coordinates before enabling gradients, so LAMMPS-provided view tensors are not modified in-place
  • pass the cloned force-coordinate tensor through private atomic-model metadata for derivative generation
  • add regression tests for leaf-view coordinate inputs in atomic and lower model paths

Tests

  • uvx pre-commit run --files deepmd/pt/model/atomic_model/base_atomic_model.py deepmd/pt/model/atomic_model/dp_atomic_model.py deepmd/pt/model/atomic_model/linear_atomic_model.py deepmd/pt/model/atomic_model/pairtab_atomic_model.py deepmd/pt/model/model/make_model.py source/tests/pt/model/test_dp_atomic_model.py source/tests/pt/model/test_dp_model.py
  • PYTHONPATH=$PWD uv run --index-strategy unsafe-best-match --with pytest --with numpy --with scipy --with pyyaml --with dargs --with typing_extensions --with h5py --with wcmatch --with packaging --with ml_dtypes --with mendeleev --with array-api-compat --with lmdb --with msgpack --with torch==2.5.1+cpu --extra-index-url https://download.pytorch.org/whl/cpu --no-project pytest source/tests/pt/model/test_dp_model.py::TestDPModel::test_forward_lower_accepts_leaf_view_input source/tests/pt/model/test_dp_atomic_model.py::TestDPAtomicModel::test_forward_common_atomic_accepts_leaf_view_input -q

Closes #5165

Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)

Summary by CodeRabbit

  • Refactor

    • Improved gradient handling in atomic models to enable gradients only when needed and preserve existing tensor state, and updated coordinate passing to ensure correct gradient flow during forward passes.
  • Tests

    • Added unit tests to confirm forward paths accept tensor view inputs and produce expected outputs (energy, extended force, virial) while preserving gradient behavior.

Review Change Stack

Avoid enabling gradients in-place on LAMMPS-provided coordinate views.
Clone the extended coordinates before building force graphs and pass that
cloned tensor to derivative generation while keeping auxiliary metadata
private to model output conversion.

Closes deepmodeling#5165

Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
@dosubot dosubot Bot added the bug label May 29, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Three atomic models avoid in-place requires_grad_ on leaf-view tensors by creating gradient-enabled clones when needed. forward_common_lower introduces force_coord and passes it into atomic forward and output fitting. Two tests verify acceptance of leaf-view coordinate inputs.

Changes

Coordinate Gradient Safety and force_coord Propagation

Layer / File(s) Summary
Atomic model coordinate gradient safety
deepmd/pt/model/atomic_model/dp_atomic_model.py, deepmd/pt/model/atomic_model/linear_atomic_model.py, deepmd/pt/model/atomic_model/pairtab_atomic_model.py
DP, Linear, and PairTab atomic models update gradient-enabling logic to avoid in-place mutations on leaf-view tensors: DP checks requires_grad before enabling; Linear and PairTab use detach().clone().requires_grad_(True) when gradients are requested and not already enabled.
Model output assembly with force_coord
deepmd/pt/model/model/make_model.py
forward_common_lower creates force_coord from cc_ext, conditionally clones/sets requires_grad_(True) when atomic gradients are needed, passes force_coord into the atomic forward call, and supplies it to fit_output_to_model_output.
Tests for leaf-view coordinate inputs
source/tests/pt/model/test_dp_atomic_model.py, source/tests/pt/model/test_dp_model.py
Added tests test_forward_common_atomic_accepts_leaf_view_input and test_forward_lower_accepts_leaf_view_input to validate that forward paths accept coordinates reshaped with .view (leaf-view tensors) and produce expected outputs.

Sequence Diagram(s)

sequenceDiagram
  participant coord_view as coord_view (leaf view)
  participant force_coord as force_coord (clone if needed)
  participant atomic as AtomicModel.forward_common_atomic
  participant fitter as fit_output_to_model_output
  coord_view->>force_coord: if not requires_grad -> detach().clone().requires_grad_(True)
  force_coord->>atomic: pass as extended_coord / positional coord
  atomic->>fitter: atomic outputs + force_coord -> assemble final model outputs
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • njzjz
  • iProzd
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: cloning inference coordinates before enabling gradients to fix the requires_grad_ in-place operation issue.
Linked Issues check ✅ Passed The PR successfully addresses the core requirements from #5165: prevents in-place requires_grad_ operations on LAMMPS view tensors by cloning them first, maintains compatibility with PyTorch 2.5.1+, and includes regression tests.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the gradient-enabling issue in the atomic models and adding tests to verify the fix; no unrelated modifications were detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Only clone extended coordinates when gradients are not already enabled.
This keeps existing autodiff and Hessian graphs intact while still avoiding
in-place requires_grad_ on non-grad coordinate views.

Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.36%. Comparing base (93f5580) to head (21cd65a).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5476      +/-   ##
==========================================
- Coverage   82.25%   81.36%   -0.90%     
==========================================
  Files         833      868      +35     
  Lines       89094    96439    +7345     
  Branches     4227     4235       +8     
==========================================
+ Hits        73287    78467    +5180     
- Misses      14515    16672    +2157     
- Partials     1292     1300       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes PyTorch inference gradient handling so view tensors passed from lower-level callers such as LAMMPS are not modified in-place when force/virial derivatives are enabled.

Changes:

  • Clones coordinate tensors before enabling gradients in PT atomic-model paths.
  • Carries the derivative coordinate tensor through _force_coord metadata to model output transformation.
  • Adds regression tests for leaf-view coordinate inputs in atomic and lower model inference paths.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
deepmd/pt/model/atomic_model/base_atomic_model.py Preserves private metadata keys while applying atom masks.
deepmd/pt/model/atomic_model/dp_atomic_model.py Clones grad-enabled coordinates and returns _force_coord.
deepmd/pt/model/atomic_model/linear_atomic_model.py Applies the same coordinate cloning and _force_coord propagation for linear models.
deepmd/pt/model/atomic_model/pairtab_atomic_model.py Applies cloning and keeps pairtab energy connected to _force_coord.
deepmd/pt/model/model/make_model.py Uses _force_coord as the autograd derivative source when available.
source/tests/pt/model/test_dp_atomic_model.py Adds atomic-model regression coverage for leaf-view coordinate inputs.
source/tests/pt/model/test_dp_model.py Adds lower-model regression coverage for leaf-view coordinate inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +92 to +93
coord = to_torch_tensor(self.coord_ext).requires_grad_(True)
coord_view = coord.view(self.nf, self.nall, 3)
Comment thread source/tests/pt/model/test_dp_model.py Outdated
Comment on lines +145 to +149
coord_view = coord_ext.requires_grad_(True).view(self.nf, -1, 3)

ret = md0.forward_lower(coord_view, atype_ext, nlist, do_atomic_virial=True)

self.assertIn("extended_force", ret)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
source/tests/pt/model/test_dp_atomic_model.py (1)

77-103: ⚡ Quick win

Add documentation and explicit verification for the leaf-view regression test.

This test protects against issue #5165 (RuntimeError when enabling gradients on leaf-view tensors), but that context isn't documented. Future maintainers won't understand why this specific test setup matters.

Consider these improvements:

  1. Add a docstring or comment explaining this is a regression test for issue #5165 and why the leaf-view scenario is critical (LAMMPS provides view tensors)
  2. Add explicit assertions to verify the test preconditions:
    self.assertTrue(coord.is_leaf, "coord must be a leaf tensor")
    self.assertTrue(coord_view._is_view(), "coord_view must be a view")
  3. Consider validating output correctness beyond just key existence—for example, check that energy values are reasonable or match a known reference

The current test will catch crashes (good), but stronger documentation and precondition checks would make the test's purpose clearer and more robust.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/pt/model/test_dp_atomic_model.py` around lines 77 - 103, The
test test_forward_common_atomic_accepts_leaf_view_input lacks documentation and
explicit precondition checks for the leaf-view regression it protects; add a
brief docstring or inline comment referencing issue `#5165` and why a leaf tensor
with a view matters, then add explicit assertions before calling md0.forward_*
to ensure coord.is_leaf is True and coord_view._is_view() (or
coord_view.is_view() depending on API) is True; finally add at least one
lightweight correctness check on outputs (e.g., energy tensor shape and finite
values from ret["energy"] or a reasonable range) to validate results beyond key
existence while keeping the original crash-protection behavior in
test_forward_common_atomic_accepts_leaf_view_input.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@source/tests/pt/model/test_dp_atomic_model.py`:
- Around line 77-103: The test
test_forward_common_atomic_accepts_leaf_view_input lacks documentation and
explicit precondition checks for the leaf-view regression it protects; add a
brief docstring or inline comment referencing issue `#5165` and why a leaf tensor
with a view matters, then add explicit assertions before calling md0.forward_*
to ensure coord.is_leaf is True and coord_view._is_view() (or
coord_view.is_view() depending on API) is True; finally add at least one
lightweight correctness check on outputs (e.g., energy tensor shape and finite
values from ret["energy"] or a reasonable range) to validate results beyond key
existence while keeping the original crash-protection behavior in
test_forward_common_atomic_accepts_leaf_view_input.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 35aef69e-ecc3-4b04-8a62-ab185f5ee8c0

📥 Commits

Reviewing files that changed from the base of the PR and between a101808 and d527099.

📒 Files selected for processing (2)
  • source/tests/pt/model/test_dp_atomic_model.py
  • source/tests/pt/model/test_dp_model.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/tests/pt/model/test_dp_model.py

@njzjz-bot
Copy link
Copy Markdown
Contributor Author

Updated the implementation to avoid passing the cloned coordinate through a private _force_coord output key. The model layer now prepares the coordinate used for force/virial autograd before calling the atomic model, so the same tensor is used both for the forward computation and derivative extraction.

This keeps atomic model outputs clean while still avoiding in-place requires_grad_ on caller-provided view tensors.

Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/pt/model/atomic_model/dp_atomic_model.py`:
- Around line 274-275: In DPAtomicModel.forward_atomic (and any call paths into
forward_common_atomic), avoid calling extended_coord.requires_grad_(True)
in-place on a potential leaf/view tensor; instead create a non-leaf clone of
extended_coord before enabling gradients so autograd/view semantics are
preserved (e.g., clone extended_coord and then set requires_grad on the clone),
and use that cloned tensor for the rest of the forward pass.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d5a0a4a7-fbb5-43e8-af5d-2b26080a3e6c

📥 Commits

Reviewing files that changed from the base of the PR and between d527099 and 889eff4.

📒 Files selected for processing (5)
  • deepmd/pt/model/atomic_model/dp_atomic_model.py
  • deepmd/pt/model/atomic_model/linear_atomic_model.py
  • deepmd/pt/model/atomic_model/pairtab_atomic_model.py
  • deepmd/pt/model/model/make_model.py
  • source/tests/pt/model/test_dp_atomic_model.py
💤 Files with no reviewable changes (1)
  • source/tests/pt/model/test_dp_atomic_model.py

Comment thread deepmd/pt/model/atomic_model/dp_atomic_model.py Outdated
@njzjz njzjz requested review from iProzd and wanghan-iapcm May 30, 2026 12:17
nframes, nloc, nnei = nlist.shape
extended_coord = extended_coord.view(nframes, -1, 3)
if self.do_grad_r() or self.do_grad_c():
if (self.do_grad_r() or self.do_grad_c()) and not extended_coord.requires_grad:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 277 reshapes extended_coord = extended_coord.view(...), making the local variable a non-leaf view. The new guard at line 278 still lets line 279 call extended_coord.requires_grad_(True) on that non-leaf view, which raises RuntimeError: you can only change requires_grad flags of leaf variables. Compare with linear_atomic_model.py:261-263 where the requires_grad_ call precedes the .view(), and with dp_atomic_model.py:274-275 which clones to obtain a leaf. The standard make_model.forward_common_lower path masks this because it clones upstream, but a direct caller of PairTabAtomicModel.forward_common_atomic with a leaf view — the exact scenario the new test_dp_atomic_model.py test is designed to cover — would crash. Reorder (requires_grad_ before .view()) or apply the same .clone() treatment as dp_atomic_model.py.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 912ea401: LinearEnergyAtomicModel now also clones before requires_grad_(True), matching the DP/pair-tab treatment, so a leaf-view input is not mutated in-place before reshaping. PairTabAtomicModel was already using the clone path on the current branch.

— OpenClaw 2026.5.28 (model: custom-chat-jinzhezeng-group/gpt-5.5)

]
ret = md0.forward_common_atomic(*args)

self.assertIn("energy", ret)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_forward_common_atomic_accepts_leaf_view_input only asserts assertIn("energy", ret) — no check that coord_view is unmutated (e.g. self.assertFalse(coord_view.requires_grad) and/or value equality against a pre-call clone). The test would pass even if dp_atomic_model.py reverted to the in-place requires_grad_(True) pattern — i.e. it does not exercise the regression the PR claims to fix. The corresponding test_dp_model.py::test_forward_lower_accepts_leaf_view_input does include the requires_grad assertion; mirror that here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, agreed. The existing DP atomic test now checks both that the caller's view remains requires_grad=False and that values are unchanged after the call. I also added the same leaf-view regression coverage for linear and pair-tab atomic models in 912ea401.

— OpenClaw 2026.5.28 (model: custom-chat-jinzhezeng-group/gpt-5.5)

atype = extended_atype[:, :nloc]
if self.do_grad_r() or self.do_grad_c():
extended_coord.requires_grad_(True)
if (self.do_grad_r() or self.do_grad_c()) and not extended_coord.requires_grad:
Copy link
Copy Markdown
Collaborator

@wanghan-iapcm wanghan-iapcm Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new compound guard and not extended_coord.requires_grad introduces a second boolean branch, but the new tests cover only the requires_grad=False branch (where the clone happens). The requires_grad=True branch — the motivating case for the guard, where existing autograd state should be preserved — is untested. The same guard added at linear_atomic_model.py:261 and pairtab_atomic_model.py:278 has no new tests at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added/kept coverage for both branches:

  • requires_grad=False: verifies leaf-view inputs are accepted without mutating the caller tensor.
  • requires_grad=True: verifies existing autograd state is preserved and gradients flow.

This now covers DP atomic, linear atomic, and pair-tab atomic paths. Local uvx pre-commit run --files ... passes. I could not run the PyTorch tests locally because this environment does not have torch; trying to fetch it transiently timed out due the large package download.

— OpenClaw 2026.5.28 (model: custom-chat-jinzhezeng-group/gpt-5.5)

njzjz-bot added 3 commits June 1, 2026 12:05
Address review feedback for the coordinate gradient regression by cloning PairTab coordinates before reshaping, asserting direct atomic callers do not mutate view inputs, and covering already grad-enabled inputs for DP, model lower, linear, and pairtab paths.

Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
Apply the same clone-before-requires-grad treatment to the linear atomic wrapper so leaf views are not mutated in-place before reshaping.

Add regression coverage for leaf-view inputs in linear and pair-tab atomic models.

Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
Apply ruff-format's line wrapping to the DP model regression test so pre-commit.ci does not need to autofix the PR branch.

Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
@njzjz njzjz requested a review from wanghan-iapcm June 2, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

3 participants