feat(pt_expt): pluggable NeighborList strategy + O(N) vesin neighbor list for Python/ASE inference#5491
Conversation
…uilder Make neighbor-list construction pluggable via an optional `NeighborList` strategy injected at `forward_common`/`call_common` (the layer where the system is extended), keeping the exported `forward_common_lower` untouched. - dpmodel (torch-free core): `NeighborList` base + `DefaultNeighborList` (the historical dense extend+build). `neighbor_list=None` reproduces the current behavior byte-identically. - pt_expt: `VesinNeighborList`, a device-aware `vesin.torch` O(N) cell list (on the input device for torch; CPU-bridged for numpy/dpmodel). It emits the same extended quartet, so force/global-virial/atomic-virial all come out of the existing autograd routines unchanged. - inference: `nlist_backend="auto"|"vesin"|"native"` on the pt_expt DeepEval and the ASE calculator; `auto` falls back to native when vesin is unavailable/unsupported, `vesin` is strict. - pyproject: depend on `vesin[torch]`. Tested: builder equivalence (numpy+torch, PBC/noPBC, device) and full model equivalence across 8 descriptor families (energy/force/virial/atomic virial) vs the dense builder, plus nlist_backend dispatch and vesin-vs-native equality through the compiled .pte.
…nce tests Replace the dynamic setattr metaprogramming over (descriptor family, pbc) with pytest.mark.parametrize, per project test conventions: one parametrized test per (family, periodic) so cases can be selected individually with -k and failures pinpoint the family. No coverage change (44 cases).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6fd4ee5799
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a pluggable NeighborList interface with DefaultNeighborList and VesinNeighborList, threads an nlist_backend selector through dpmodel/pt_expt/pt and DP, adds vesin[torch] as a dependency, and provides dispatch and numerical-equivalence tests covering single- and multi-frame, periodic and non-periodic cases. ChangesPluggable neighbor-list strategy with Vesin O(N) acceleration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 885-889: The vesin path returns the raw candidate neighbor list
from self._nlist_builder.build(...) without applying the same type-splitting
used by the native path, so modify the branch that handles getattr(self,
"_nlist_builder", None) to post-process the returned candidate nlist with the
same type-distinguishing logic used by build_neighbor_list(...,
distinguish_types=not mixed_types) before returning or handing it to
forward_common_lower; specifically, call the type-split routine (the logic that
enforces distinguish_types / splits by atom type) on the result of
self._nlist_builder.build(coords, atom_types, cells, rcut, sel) (or convert it
into the same structure expected by forward_common_lower) so non-mixed-type
models receive an identical nlist contract as the native forward_common_lower
path.
In `@deepmd/pt_expt/utils/vesin_neighbor_list.py`:
- Around line 192-211: _build_single() currently truncates candidates to the
global nsel nearest neighbors (variables dense_idx, sorted_idx, nlist, keep =
min(nsel, max_nn)), which can drop needed neighbors of other types before
per-type selection (nlist_distinguish_types). Change this so you do not limit to
nsel here: keep all in-cutoff candidates (i.e., all entries with valid &= dists
<= rcut) up to max_nn and populate nlist with that superset (use max_nn or the
count of valid candidates instead of nsel when filling nlist), then defer the
final per-type truncation to nlist_distinguish_types; ensure you still pad with
-1 for missing entries and preserve the existing sorting by distance
(sorted_idx, sorted_valid).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: aea2c30e-6a44-475f-8059-64e03b3a5b0f
📒 Files selected for processing (12)
deepmd/calculator.pydeepmd/dpmodel/model/ener_model.pydeepmd/dpmodel/model/make_model.pydeepmd/dpmodel/utils/__init__.pydeepmd/dpmodel/utils/default_neighbor_list.pydeepmd/dpmodel/utils/neighbor_list.pydeepmd/pt_expt/infer/deep_eval.pydeepmd/pt_expt/model/ener_model.pydeepmd/pt_expt/utils/vesin_neighbor_list.pypyproject.tomlsource/tests/pt_expt/infer/test_deep_eval.pysource/tests/pt_expt/utils/test_neighbor_list.py
Extend the pluggable neighbor list to the pt (torch.jit) backend. The pt model is reconstructed eagerly in DeepEval, so when nlist_backend selects vesin we build the (i,j,S) extended representation with VesinNeighborList and run the model's forward_common_lower + communicate_extended_output directly, leaving the TorchScript graph untouched. vesin is gated off for spin/hessian models and ASE neighbor_list conflicts; auto falls back to native, vesin is strict. Also: - VesinNeighborList: avoid torch.as_tensor without an explicit device (under a non-CPU ambient default device it triggers CUDA init even for CPU tensors); bridge numpy via from_numpy and use torch tensors directly. Makes the builder device-robust. - drop the getattr(self, "_use_vesin"/"_nlist_builder", ...) defensive defaults; both attributes are always initialized in __init__ via _setup_nlist_backend. Tested: source/tests/pt/model/test_nlist_backend.py (dispatch + vesin-vs-native energy/force/virial/atomic-virial equivalence for se_e2_a and dpa1, PBC/non-PBC).
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
deepmd/pt_expt/utils/vesin_neighbor_list.py (1)
205-224:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep all in-cutoff candidates until the lower layer does the type split.
This still truncates to the global
sum(sel)nearest neighbors before per-type formatting, so type-separated models can lose required neighbors and diverge from the native builder. A simplesel=[1, 1]case with candidatesA@1.0,A@1.1,B@1.2becomes[A, A]here, leaving no way for the downstream type split to recover the requiredB. Keep all in-cutoff candidates here (or at least a per-type-safe superset) and defer the final truncation until after type separation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/pt_expt/utils/vesin_neighbor_list.py` around lines 205 - 224, The current code truncates neighbors early by using keep = min(nsel, max_nn) which drops in-cutoff candidates needed for per-type splitting; change the logic to keep all in-cutoff candidates up to max_nn and defer final truncation until after the type split: allocate a neighbor buffer wide enough for max_nn (e.g., nlist_all = torch.full((nloc, max_nn), -1, ...)) or adjust nlist to width = max(nsel, max_nn), replace keep = min(nsel, max_nn) with keep = max_nn (or compute keep = sorted_valid.sum(dim=-1).clamp_max(max_nn) if you want per-row limits), and fill nlist_all[:, :keep] using sorted_idx and sorted_valid just like the existing torch.where; leave any slicing to nsel to the downstream type-splitting code so per-type selection can recover required neighbors.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@source/tests/pt/model/test_nlist_backend.py`:
- Around line 123-128: The test is currently iterating dict keys because
dp_native.eval and dp_vesin.eval return dicts; change the comparison to pull the
actual arrays by key (e.g. for label in ["e","f","v","ae","av"] and then compare
ref[label] vs out[label]) instead of zipping ref and out directly. Update the
loop that uses variables ref and out to index into those dicts (ref[label],
out[label]) and then call np.testing.assert_allclose with those values and the
existing rtol/atol and err_msg=f"{name} {label}" so the numeric tensors are
actually compared.
---
Duplicate comments:
In `@deepmd/pt_expt/utils/vesin_neighbor_list.py`:
- Around line 205-224: The current code truncates neighbors early by using keep
= min(nsel, max_nn) which drops in-cutoff candidates needed for per-type
splitting; change the logic to keep all in-cutoff candidates up to max_nn and
defer final truncation until after the type split: allocate a neighbor buffer
wide enough for max_nn (e.g., nlist_all = torch.full((nloc, max_nn), -1, ...))
or adjust nlist to width = max(nsel, max_nn), replace keep = min(nsel, max_nn)
with keep = max_nn (or compute keep = sorted_valid.sum(dim=-1).clamp_max(max_nn)
if you want per-row limits), and fill nlist_all[:, :keep] using sorted_idx and
sorted_valid just like the existing torch.where; leave any slicing to nsel to
the downstream type-splitting code so per-type selection can recover required
neighbors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9526f665-8b61-4e12-90c1-dd691dfba614
📒 Files selected for processing (4)
deepmd/pt/infer/deep_eval.pydeepmd/pt_expt/infer/deep_eval.pydeepmd/pt_expt/utils/vesin_neighbor_list.pysource/tests/pt/model/test_nlist_backend.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt_expt/infer/deep_eval.py
Add multi-frame tests (frames with different box sizes -> different per-frame ghost counts) that exercise the VesinNeighborList per-frame loop + pad-to-common-nall + stack path, which was previously only covered with nframes=1: - pt_expt: builder-level multi-frame neighbor-multiset equivalence (numpy + torch) and end-to-end model multi-frame equivalence (se_e2_a, dpa1). - pt: multi-frame DeepPot.eval vesin-vs-native equivalence (se_e2_a, dpa1). The pt suite now runs green on CPU under the cuda:9999999 sentinel default device (the builder was made device-clean in the previous commit), so it no longer depends on a GPU runner.
Replace the from_numpy/ascontiguousarray branch with a clean `torch.as_tensor` inside a `with torch.device(device)` block (device = CPU for numpy/dpmodel inputs, the input tensor's device for torch), matching the existing `with torch.device(...)` guard around `nl.compute` and the project convention of pinning the device on tensor creation. Fixes the CUDA-init-under-placeholder- default-device issue without the clunky numpy bridge.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
source/tests/pt_expt/utils/test_neighbor_list.py (1)
423-448: ⚡ Quick winConsider parametrizing
periodicfor the multi-frame pt_expt test.The single-frame
test_pt_expt_equivalencecovers both PBC and non-PBC, but this multi-frame variant only exercises the periodic path (box is always passed). Since the PR notes multi-frame vesin paths are not fully tested, adding the non-periodic case closes a real coverage gap at minimal cost.♻️ Proposed change
-@pytest.mark.parametrize("name", ["se_e2_a", "dpa1"]) # non-mixed + attention -def test_pt_expt_multiframe_equivalence(name: str) -> None: +@pytest.mark.parametrize("name", ["se_e2_a", "dpa1"]) # non-mixed + attention +@pytest.mark.parametrize("periodic", [False, True]) # non-PBC vs PBC +def test_pt_expt_multiframe_equivalence(name: str, periodic: bool) -> None: """Multi-frame (frames with differing ghost counts) pt_expt outputs are invariant to the nlist strategy -- exercises the builder's per-frame pad + stack feeding the batched model forward. """ coord_np, atype_np, box_np = _multiframe_system() + box_np = box_np if periodic else None model_dict = ALL_MODELS[name] md = get_model(copy.deepcopy(model_dict)) md.eval() atype_t = torch.tensor(atype_np, dtype=torch.int64) - box_t = torch.tensor(box_np, dtype=torch.float64) + box_t = None if box_np is None else torch.tensor(box_np, dtype=torch.float64)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/tests/pt_expt/utils/test_neighbor_list.py` around lines 423 - 448, Parametrize test_pt_expt_multiframe_equivalence with a periodic flag (e.g., add pytest.mark.parametrize("periodic", [True, False])) and run the same multi-frame checks for both periodic and non-periodic cases: when periodic is True build box_t from box_np and pass box=box_t to md.forward, when False set box_t=None and call md.forward without the box argument (or pass box=None) so the non-PBC path is exercised; keep the same neighbor_list variants (DefaultNeighborList, VesinNeighborList) and assertion checks for "energy", "force", "virial", "atom_virial".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@source/tests/pt_expt/utils/test_neighbor_list.py`:
- Around line 423-448: Parametrize test_pt_expt_multiframe_equivalence with a
periodic flag (e.g., add pytest.mark.parametrize("periodic", [True, False])) and
run the same multi-frame checks for both periodic and non-periodic cases: when
periodic is True build box_t from box_np and pass box=box_t to md.forward, when
False set box_t=None and call md.forward without the box argument (or pass
box=None) so the non-PBC path is exercised; keep the same neighbor_list variants
(DefaultNeighborList, VesinNeighborList) and assertion checks for "energy",
"force", "virial", "atom_virial".
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1f41cd3f-c5a1-4ce8-896f-a8d2d169cec2
📒 Files selected for processing (2)
source/tests/pt/model/test_nlist_backend.pysource/tests/pt_expt/utils/test_neighbor_list.py
🚧 Files skipped from review as they are similar to previous changes (1)
- source/tests/pt/model/test_nlist_backend.py
vesin's nl.compute rejects an empty `points` array ("`points` can not be a NULL
pointer"), so an all-empty system (e.g. test_zero_input, coords shape [nf,0,3])
crashed once vesin became the default builder. Return an empty extended
representation directly for a zero-atom frame, matching the native builder.
Fixes the CI failures in test_models.py::TestDeepPot_fparam_aparam_*::test_zero_input
(.pth and .pte). Adds a builder-level empty-system regression test.
| sel = self._sel | ||
| mixed_types = self._mixed_types | ||
|
|
||
| if self._nlist_builder is not None: |
There was a problem hiding this comment.
This currently bypasses an explicitly supplied ASE neighbor list whenever nlist_backend="auto" (the new default). In DeepEval.__init__, _setup_nlist_backend() runs before self.neighbor_list is assigned, so ase_provided is always false; if vesin is installed, _nlist_builder is enabled and this early return is taken before the native/ASE branch below.
That changes existing behavior for callers/tests using DeepPot(..., neighbor_list=ase.neighborlist.NewPrimitiveNeighborList(...)). It also matches the current CI failure in TestEvalDescriptorASE.test_eval_descriptor_ase_vs_native, where the ASE result differs because the ASE path is not actually used.
Please set self.neighbor_list = neighbor_list before calling _setup_nlist_backend() (as done in the pt backend), or otherwise make auto preserve explicit ASE neighbor lists.
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
…r path eval_descriptor calls the descriptor directly, bypassing forward_common_lower's format_nlist. The native _build_nlist_native builds with distinguish_types=not mixed_types, so for a non-mixed-type model it hands the descriptor a type-blocked nlist; the vesin branch returned a non-distinguished list, giving a wrong descriptor on this path (CI: TestEvalDescriptorASE). Apply nlist_distinguish_types to the vesin output when not mixed_types, matching the native builder. The main eval path is unaffected (its format_nlist re-formats; energy/force/virial already matched native to ~1e-19). Adds a direct eval_descriptor vesin-vs-native regression test.
Motivation
When deepmd consumes a neighbor list,
forward_common/call_commonextends the local region into ~26 periodic-image buffer regions —extend_coord_with_ghoststhen a densebuild_neighbor_list— materializing ≈27×N ghost atoms and an O(N²)[N, 27N]distance matrix. This is the Python/ASE front-end bottleneck at large N (DPA4 manuscript §2.4).What this PR does
Makes neighbor-list construction pluggable via an optional
NeighborListstrategy injected atforward_common/call_common(the layer where the system is extended). The exportedforward_common_lower(the.pt2/AOTI/C++ entry) is left untouched, so there is zero export risk.NeighborListbase +DefaultNeighborList(the historical dense extend+build).neighbor_list=Nonereproduces today's behavior byte-identically.VesinNeighborList, a device-awarevesin.torchO(N) cell list: it runs on the input tensor's device (CPU or CUDA) for torch, and is CPU-bridged for numpy/dpmodel. It builds an(i, j, S)edge list, materializes only the real-neighbor ghostscoord[j] + S@box, and emits the same extended quartet(extended_coord, extended_atype, nlist, mapping). Because the representation is identical, force / global-virial / atomic-virial all come out of the existing autograd +communicate_extended_outputroutines unchanged.nlist_backend="auto" | "vesin" | "native"on the pt_exptDeepEvaland the ASEDPcalculator.autouses vesin when available/applicable and silently falls back to native otherwise;vesinis strict (raises if unavailable, or for spin / ASE-neighbor_listconflicts);nativeforces the dense builder.vesin[torch].Verification
native vs vesin agree to fp round-off (energy
0.0, force/virial/atomic-virial ≤ ~1e-18, the only difference being ghost-enumeration order):source/tests/pt_expt/utils/test_neighbor_list.py— builder equivalence (numpy + torch namespaces, PBC/non-PBC, input-device placement) and full model equivalence across 8 descriptor families (se_e2_a, se_r, se_e3, dpa1, se_atten_v2, dpa2, dpa3, hybrid) for dpmodel (energy/atomic-energy) and pt_expt (energy/force/virial/atomic-virial), plus theneighbor_list=Nonebyte-identical fallback.source/tests/pt_expt/infer/test_deep_eval.py::TestDeepEvalNlistBackend—nlist_backenddispatch validation and vesin-vs-native equality through the compiled.pte.Known limitations
forward_commonpath only. This is the path where deepmd builds the nlist (DeepPot / ASE). C/C++/LAMMPS enter at the exportedforward_lowerwith an externally-supplied list — accepting(i,j,S)there is a planned follow-up.S≠0edge — correctness preserved viamappingsummation).sel-exceeded) path are not yet directly tested.Summary by CodeRabbit
New Features
Tests
Chores