feat(pt_expt): make model.json optional in .pt2/.pte loading#5416
feat(pt_expt): make model.json optional in .pt2/.pte loading#5416OutisLi wants to merge 5 commits intodeepmodeling:masterfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughDeepEval gains a metadata-only loading path: Changes
Sequence DiagramsequenceDiagram
actor User
participant Loader
participant DeepEval
participant Metadata
participant DPModel
User->>Loader: Load .pte/.pt2 archive
Loader->>Metadata: Check `model/extra/metadata.json`
alt metadata.json exists
Loader->>Metadata: Extract metadata
alt model.json exists in `model/extra/`
Loader->>DPModel: Load binary model
Loader->>DeepEval: Initialize with DPModel + metadata
else model.json absent
Loader->>DeepEval: Call _init_from_metadata (hoist fields)
Note over DeepEval: _dpmodel = None
end
else metadata.json missing
Loader->>User: Raise ValueError
end
User->>DeepEval: Call get_dim_fparam()
alt _dpmodel exists
DeepEval->>DPModel: Query model for dim_fparam
DPModel->>DeepEval: Return value
else metadata-only
DeepEval->>Metadata: Read hoisted metadata fields
Metadata->>DeepEval: Return value
end
DeepEval->>User: Return result
User->>DeepEval: Call eval_descriptor()
alt _dpmodel exists
DeepEval->>DPModel: Evaluate descriptor
DPModel->>DeepEval: Return result
DeepEval->>User: Return result
else metadata-only
DeepEval->>User: Raise NotImplementedError
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR makes pt_expt .pte/.pt2 inference loading accept archives that omit extra/model.json, using extra/metadata.json as the minimum contract (matching the C++ DeepPotPTExpt reader).
Changes:
- Add a metadata-only init path in
DeepEvaland make.pte/.pt2loaders treatmodel.jsonas optional while requiringmetadata.json. - Extend exported metadata to fully support dpmodel-free metadata accessors (
sel_type, deterministiccategoryserialization). - Add a new end-to-end test suite that strips
extra/model.jsonand asserts parity vs full archives, plus guard tests for missing metadata.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
source/tests/pt_expt/infer/test_deep_eval_metadata_only.py |
New tests validating metadata-only archive loading parity and dpmodel-only feature guards. |
deepmd/pt_expt/utils/serialization.py |
Metadata export augmented for metadata-only inference (cast category to int; include sel_type). |
deepmd/pt_expt/infer/deep_eval.py |
Loaders now require metadata.json but allow missing model.json; new _init_from_metadata; dpmodel-only hooks guarded. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5416 +/- ##
===========================================
- Coverage 82.36% 68.60% -13.76%
===========================================
Files 824 801 -23
Lines 87130 82083 -5047
Branches 4197 3535 -662
===========================================
- Hits 71761 56311 -15450
- Misses 14093 25087 +10994
+ Partials 1276 685 -591 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
deepmd/pt_expt/utils/serialization.py (1)
345-350: Optional: make the missing-entry error hint at the layout migration.Pre-PR archives stored metadata under
extra/(nomodel/prefix). After this PR, the reader strictly requiresmodel/extra/model.json, so pre-existing.pt2files on disk will fail here with a message that doesn't suggest what changed. Since.pt2has no versioning field, a small nudge in the error text would save users a debugging round-trip.Suggested tweak
- raise ValueError( - f"Invalid .pt2 file '{model_file}': missing '{model_json_entry}'" - ) + raise ValueError( + f"Invalid .pt2 file '{model_file}': missing '{model_json_entry}'. " + "Archives produced before the PyTorch 2.11 layout migration stored " + "metadata under 'extra/' and must be re-exported." + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/utils/serialization.py` around lines 345 - 350, The ValueError raised when model_json_entry is missing (inside the with zipfile.ZipFile(model_file, "r") as zf block) should include a hint about the pre-PR layout: if model.json used to live under "extra/" instead of "model/extra/", add a suggested cause and recovery (e.g., "this archive may be using the older layout where metadata was stored under 'extra/' — re-run the migration or recreate the .pt2") to the error message so users know why model_file failed to load and what to try next.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 345-350: The ValueError raised when model_json_entry is missing
(inside the with zipfile.ZipFile(model_file, "r") as zf block) should include a
hint about the pre-PR layout: if model.json used to live under "extra/" instead
of "model/extra/", add a suggested cause and recovery (e.g., "this archive may
be using the older layout where metadata was stored under 'extra/' — re-run the
migration or recreate the .pt2") to the error message so users know why
model_file failed to load and what to try next.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 167dc0c0-b82c-4db6-8079-f1afcfcfa385
📒 Files selected for processing (3)
deepmd/pt_expt/infer/deep_eval.pydeepmd/pt_expt/utils/serialization.pysource/tests/pt_expt/infer/test_deep_eval.py
✅ Files skipped from review due to trivial changes (1)
- source/tests/pt_expt/infer/test_deep_eval.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt_expt/infer/deep_eval.py
wanghan-iapcm
left a comment
There was a problem hiding this comment.
Test coverage gap: .pt2 metadata-only branch is untested
The new _init_from_metadata fallback in _load_pt2 (the model_json_str == "" branch) is not exercised end-to-end. test_deep_eval_metadata_only.py only does the zip-strip surgery on .pte.
Since test_deep_eval.py::setUpClass already builds a shared AOTI-compiled .pt2 (paying the ~82 s compile cost once per class), adding a metadata-only .pt2 case is essentially free: strip model/extra/model.json from the shared archive, reload via
DeepPot, and assert numeric parity against the full archive — the same pattern already used for .pte.
Add a spin metadata-only parity test
The new _init_from_metadata reconstructs ModelOutputDef from the serialized fitting_output_defs, but _init_from_model_json builds the spin output def by hand (energy, shape=[1], magnetic=True, etc.). Whether these two paths produce
equivalent defs for a spin model is not asserted anywhere — the existing test_deep_eval_spin.py only loads full archives.
Likewise, get_use_spin's metadata fallback ([bool(v) for v in self.metadata.get("use_spin", [])]) only fires when a spin archive is loaded metadata-only, so it's currently dead from the test suite's perspective.
Suggestion: reuse the .pte fixture in test_deep_eval_spin.py, strip extra/model.json (same pattern as test_deep_eval_metadata_only.py::_strip_extra_model_json), reload, and assert:
- bitwise numeric parity of
eval(energy / force / virial / magnetic force) against the full archive, get_has_spin(),get_use_spin(),get_ntypes_spin()agree,- the reconstructed
_model_output_defmatches the hand-built one (e.g. comparedef_outp.get_data()keys + per-variablemagnetic/r_differentiable/c_differentiableflags).
There was a problem hiding this comment.
🧹 Nitpick comments (2)
deepmd/pt_expt/infer/deep_eval.py (1)
145-162: Dual code paths for spin_model_output_def— confirm they stay in sync.
_init_from_model_jsonconstructs a fixedOutputVariableDef("energy", ..., magnetic=True)for spin models, while_init_from_metadata(line 204) reconstructs whatever_collect_metadataserialized frommodel.atomic_output_def(). The new_assert_fitting_output_defs_matchtest pins these as equivalent today, but ifSpinModel.atomic_output_def()ever gains additional fields (e.g. an extra output, or a flag flip), only the metadata path will pick them up, silently diverging from the model.json path.Consider unifying both paths through
_collect_metadata-style reconstruction (or, conversely, always derive fromatomic_output_def()when available) so there is a single source of truth.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/infer/deep_eval.py` around lines 145 - 162, The code builds _model_output_def in two different ways (_init_from_model_json uses a hardcoded OutputVariableDef("energy", magnetic=True) while _init_from_metadata uses _collect_metadata/model.atomic_output_def()), which can diverge; update the model-json path in _init_from_model_json to derive the fitting output definition from the model itself (call SpinModel.atomic_output_def() or deserialize via _collect_metadata) so both paths use the same source of truth for _model_output_def; adjust the branch that sets self._model_output_def (the block creating ModelOutputDef(FittingOutputDef([...])) ) to use the dpmodel.atomic_output_def() or the metadata deserializer and keep the _assert_fitting_output_defs_match test unchanged.source/tests/pt_expt/infer/test_deep_eval.py (1)
554-573: Optional: extract the metadata-only zip surgery into a shared helper.The same archive-stripping pattern appears in
test_deep_eval_spin.py(_strip_extra_model_json, usingendswith) andtest_deep_eval_metadata_only.py. Consolidating into a single helper (e.g. undersource/tests/pt_expt/infer/_metadata_only_utils.py) would keep the matching predicate consistent (endswith("extra/model.json")works for both.pteand.pt2) and avoid drift across the three call sites.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/pt_expt/infer/test_deep_eval.py` around lines 554 - 573, Extract the metadata-only zip surgery used in test_deep_eval.py (the block creating cls.meta_tmpfile and copying entries except "model/extra/model.json") into a shared helper function (e.g., _strip_extra_model_json in a new source/tests/pt_expt/infer/_metadata_only_utils.py) and replace the duplicated logic in test_deep_eval.py, test_deep_eval_spin.py, and test_deep_eval_metadata_only.py with calls to that helper; ensure the helper takes the source zip path and target tmpfile path, uses a stable predicate like entry.filename.endswith("extra/model.json") to skip the extra model.json, and preserves creation of the temporary file and subsequent DeepPot/deserialize_to_file usage by the callers (refer to cls.meta_tmpfile, deserialize_to_file, and DeepPot in the tests to wire return values).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 145-162: The code builds _model_output_def in two different ways
(_init_from_model_json uses a hardcoded OutputVariableDef("energy",
magnetic=True) while _init_from_metadata uses
_collect_metadata/model.atomic_output_def()), which can diverge; update the
model-json path in _init_from_model_json to derive the fitting output definition
from the model itself (call SpinModel.atomic_output_def() or deserialize via
_collect_metadata) so both paths use the same source of truth for
_model_output_def; adjust the branch that sets self._model_output_def (the block
creating ModelOutputDef(FittingOutputDef([...])) ) to use the
dpmodel.atomic_output_def() or the metadata deserializer and keep the
_assert_fitting_output_defs_match test unchanged.
In `@source/tests/pt_expt/infer/test_deep_eval.py`:
- Around line 554-573: Extract the metadata-only zip surgery used in
test_deep_eval.py (the block creating cls.meta_tmpfile and copying entries
except "model/extra/model.json") into a shared helper function (e.g.,
_strip_extra_model_json in a new
source/tests/pt_expt/infer/_metadata_only_utils.py) and replace the duplicated
logic in test_deep_eval.py, test_deep_eval_spin.py, and
test_deep_eval_metadata_only.py with calls to that helper; ensure the helper
takes the source zip path and target tmpfile path, uses a stable predicate like
entry.filename.endswith("extra/model.json") to skip the extra model.json, and
preserves creation of the temporary file and subsequent
DeepPot/deserialize_to_file usage by the callers (refer to cls.meta_tmpfile,
deserialize_to_file, and DeepPot in the tests to wire return values).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e27497ba-ad1c-44bc-a345-96c043daa787
📒 Files selected for processing (5)
deepmd/pt_expt/infer/deep_eval.pydeepmd/pt_expt/utils/serialization.pysource/tests/pt_expt/infer/test_deep_eval.pysource/tests/pt_expt/infer/test_deep_eval_metadata_only.pysource/tests/pt_expt/infer/test_deep_eval_spin.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
deepmd/pt_expt/utils/serialization.py (1)
258-282: Spin output-def + mask filter + category coercion look correct.
- For spin models, sourcing from
model.model_output_def().def_outpcorrectly preserves themagnetic=Trueflag that the spin override sets on the energy variable;atomic_output_def()would have dropped it.- Filtering
vdef.name == "mask"is needed becauseModelOutputDefretains the backbone's full atomic output (includingmask) even though spin strips it frommodel_output_type.int(vdef.category)is the right move for anIntEnumto keep JSON output stable across Python versions.Minor style nit (optional): the spin/non-spin branches could be unified by always using
model.model_output_def().def_outp, since for non-spinModelOutputDefsimply wrapsatomic_output_def(). Feel free to ignore if you prefer the explicit branching.♻️ Optional unification
- if is_spin: - fitting_output_def = model.model_output_def().def_outp - else: - fitting_output_def = model.atomic_output_def() + fitting_output_def = model.model_output_def().def_outp🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/utils/serialization.py` around lines 258 - 282, The current branching between is_spin and non-spin to set fitting_output_def is redundant; always use model.model_output_def().def_outp (which wraps atomic_output_def() for non-spin) so remove the if/else and assign fitting_output_def = model.model_output_def().def_outp, keeping the existing mask filter (if is_spin and vdef.name == "mask": continue) and the rest of the loop unchanged; update any comments if desired to reflect the unified source.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 258-282: The current branching between is_spin and non-spin to set
fitting_output_def is redundant; always use model.model_output_def().def_outp
(which wraps atomic_output_def() for non-spin) so remove the if/else and assign
fitting_output_def = model.model_output_def().def_outp, keeping the existing
mask filter (if is_spin and vdef.name == "mask": continue) and the rest of the
loop unchanged; update any comments if desired to reflect the unified source.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 614c407b-a536-47ea-a624-eca9d58b086a
📒 Files selected for processing (2)
deepmd/pt_expt/infer/deep_eval.pydeepmd/pt_expt/utils/serialization.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt_expt/infer/deep_eval.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
deepmd/pt_expt/infer/deep_eval.py (1)
1111-1127: Clean guard, but consider centralising the metadata-only message.The
NotImplementedErrortext is duplicated exactly in three call sites via this helper, which is good. As a small follow-up, mention the public symptom (e.g."loaded from a metadata-only .pt2/.pte archive") rather than the internal "dpmodel instance" so that end users hitting this throughdp test/ ASE see actionable wording. Non-blocking.♻️ Optional wording tweak
- if self._dpmodel is None: - raise NotImplementedError( - f"{feature} requires the dpmodel instance, which is only " - "available when the .pt2 / .pte archive contains " - "'model.json'. The loaded archive is metadata-only; " - "re-export with the full dpmodel serialisation to enable " - "this feature." - ) + if self._dpmodel is None: + raise NotImplementedError( + f"{feature} is unavailable: the loaded .pt2/.pte archive " + "is metadata-only (no 'model.json' embedded). Re-export " + "the model with full dpmodel serialisation to enable " + f"{feature}." + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/infer/deep_eval.py` around lines 1111 - 1127, Replace the internal-implementation phrasing in _require_dpmodel with a user-facing message that says the feature is unavailable because the model was "loaded from a metadata-only .pt2/.pte archive" (and instructs to re-export with the full dpmodel including model.json); centralise that text into a module-level constant (e.g. METADATA_ONLY_MSG) and use it from _require_dpmodel and the callers that reference eval_descriptor / eval_typeebd / eval_fitting_last_layer so all three places share the same public-facing wording.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 1111-1127: Replace the internal-implementation phrasing in
_require_dpmodel with a user-facing message that says the feature is unavailable
because the model was "loaded from a metadata-only .pt2/.pte archive" (and
instructs to re-export with the full dpmodel including model.json); centralise
that text into a module-level constant (e.g. METADATA_ONLY_MSG) and use it from
_require_dpmodel and the callers that reference eval_descriptor / eval_typeebd /
eval_fitting_last_layer so all three places share the same public-facing
wording.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b5a5d28e-bba0-4f15-93c5-7a6a3d34fe17
📒 Files selected for processing (1)
deepmd/pt_expt/infer/deep_eval.py
The pt_expt DeepEval's inference path runs through aoti_load_package / the exported module; `self._dpmodel` is only used to resolve metadata (rcut / type_map / atomic_output_def / dim_fparam / ...), which is already available in extra/metadata.json (the contract the C++ reader DeepPotPTExpt enforces). Drop the requirement that extra/model.json be present: * _load_pt2 / _load_pte: model.json is optional; metadata.json is now the minimum contract. * _init_from_metadata: reconstructs ModelOutputDef from the serialised fitting_output_defs and hoists sel / mixed_types to plain attributes. * get_dim_fparam / get_dim_aparam / get_sel_type / model_type / get_use_spin: fall back to metadata when _dpmodel is None. * eval_descriptor / eval_typeebd / eval_fitting_last_layer: raise a descriptive NotImplementedError in metadata-only mode (they inspect the dpmodel instance directly). Also fixes two metadata-completeness gaps so metadata-only load is exact: * _collect_metadata: add the `sel_type` field so get_sel_type works without a dpmodel round-trip (relevant for dipole / polar / wfc). * _collect_metadata: force vdef.category to plain int for deterministic JSON serialisation across Python versions. Archives produced by existing pt_expt serialisation still contain model.json and continue to use the dpmodel path unchanged. Regression covered by 77 existing tests in test_deep_eval.py + a dedicated new suite (test_deep_eval_metadata_only.py) that strips extra/model.json and asserts bitwise parity against the full archive.
… 2.11 layout
``aoti_compile_and_package`` writes every entry of the compiled ``.pt2``
archive under a top-level ``model/`` directory; deepmd-kit then appended
its own metadata JSON blobs (``metadata.json``, ``model.json``,
``model_def_script.json``) at the root-level ``extra/`` path via
``zipfile``. Starting with PyTorch 2.11, the strict single-model
loader ``torch.export.pt2_archive._package.load_pt2`` refuses archives
that carry files outside ``model/``:
RuntimeError: [enforce fail at inline_container.cc:340] .
file in archive is not in a subdirectory model/: extra/metadata.json
``torch._inductor.package.package.load_package`` catches this error and
falls back to the legacy C++ loader, but prints the misleading warning
``Loading outdated pt2 file. Please regenerate your package.`` every
time the archive is opened -- even though the archive version itself
(``archive_version == '0'``) is already current.
Move the deepmd-kit metadata blobs into ``model/extra/`` so that the
fast path through ``load_pt2`` accepts the archive cleanly and the
misleading warning disappears. A module-level constant
``PT2_EXTRA_PREFIX`` in ``deepmd.pt_expt.utils.serialization`` is the
single source of truth for the prefix; both the writer
(``_deserialize_to_file_pt2``) and the readers
(``_serialize_from_file_pt2``, ``DeepEval._load_pt2``) derive their
entry names from it.
The C++ reader in ``source/api_cc/src/commonPTExpt.h::read_zip_entry``
needs no changes: it already matches ``entry_name`` as a
``/``-delimited suffix, so ``"extra/metadata.json"`` resolves against
both the old root-level and the new ``model/extra/`` location
transparently. The ``test_pt2_has_metadata`` assertion in
``source/tests/pt_expt/infer/test_deep_eval.py`` is updated to expect
the new paths.
Head branch was pushed to by a user without write access
The pt_expt DeepEval's inference path runs through aoti_load_package / the exported module;
self._dpmodelis only used to resolve metadata (rcut / type_map / atomic_output_def / dim_fparam / ...), which is already available in extra/metadata.json (the contract the C++ reader DeepPotPTExpt enforces). Drop the requirement that extra/model.json be present:Also fixes two metadata-completeness gaps so metadata-only load is exact:
sel_typefield so get_sel_type works without a dpmodel round-trip (relevant for dipole / polar / wfc).Archives produced by existing pt_expt serialisation still contain model.json and continue to use the dpmodel path unchanged. Regression covered by 77 existing tests in test_deep_eval.py + a dedicated new suite (test_deep_eval_metadata_only.py) that strips extra/model.json and asserts bitwise parity against the full archive.
Summary by CodeRabbit
New Features
Behavior Changes
Tests