Skip to content

Improve robustness of openmx/md cell parsing and support new output format#967

Open
shigeandtomo wants to merge 10 commits intodeepmodeling:masterfrom
shigeandtomo:openmx-fix
Open

Improve robustness of openmx/md cell parsing and support new output format#967
shigeandtomo wants to merge 10 commits intodeepmodeling:masterfrom
shigeandtomo:openmx-fix

Conversation

@shigeandtomo
Copy link
Copy Markdown
Contributor

@shigeandtomo shigeandtomo commented Apr 18, 2026

Dear developers,

This PR improves the robustness of the openmx/md parser to support multiple output file formats, including both newer and older versions.

Main changes:

  • Improve robustness of Cell_Vectors parsing by removing dependency on fixed column positions
  • Add support for the scf_conv flag introduced in OpenMX version 4.0 (upcoming)

Thank you for your consideration.

Summary by CodeRabbit

  • New Features

    • Emits explicit SCF-convergence warnings when parsing OpenMX trajectories.
  • Improvements

    • More robust parsing of cell vectors across OpenMX MD output variants.
    • Simplified handling of atomic species ordering/deduplication.
  • Tests

    • Added new OpenMX test module and Au111Surface test snapshot.
    • Removed legacy Methane2 snapshot and an older convergence test module.

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. dpdata enhancement New feature or request labels Apr 18, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 18, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 2 untouched benchmarks


Comparing shigeandtomo:openmx-fix (a290ae4) with master (e0f56ac)

Open in CodSpeed

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 18, 2026

Warning

Rate limit exceeded

@shigeandtomo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 21 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 14 minutes and 21 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3d716a46-2d19-40e2-ad19-a8ebacf7f27d

📥 Commits

Reviewing files that changed from the base of the PR and between 627076f and a290ae4.

📒 Files selected for processing (1)
  • tests/test_openmx_multiple_formats.py
📝 Walkthrough

Walkthrough

Refactored OpenMX parsing: simplified atom-species handling, reworked cell parsing to extract nine floats after "Cell_Vectors=", added SCF convergence detection via warnings, removed redundant SCF checks in coordinate/force loaders, replaced Methane2-based test with new Au111Surface test artifacts and test module.

Changes

Cohort / File(s) Summary
OpenMX Parsing Logic
dpdata/openmx/omx.py
Simplified atom-species de-duplication; removed redundant natoms assignment and unused enumerate variables; replaced token-count branching in load_cells with parsing of the substring after "Cell_Vectors=", converting the first nine floats into a 3×3 cell and removing the previous unsupported-format RuntimeError path; scan for scf_conv=<int> on the same line and emit warnings.warn("SCF not converged!") when scf_conv == 0; removed SCF-nonconvergence check from load_coords; updated __main__ test filename to Au111Surface.
Test Artifacts
tests/openmx/Au111Surface.md, tests/openmx/Methane2.md
Added tests/openmx/Au111Surface.md containing two OpenMX snapshots (time=0.000 fs and 1.000 fs) with energies, cell vectors, per-atom lines, and differing scf_conv flags; removed all contents from tests/openmx/Methane2.md.
Test Infrastructure
tests/test_openmx_check_convergence.py, tests/test_openmx_multiple_formats.py
Deleted tests/test_openmx_check_convergence.py (previous Methane2-based unit tests); added tests/test_openmx_multiple_formats.py implementing a shared assertion mixin and two test cases (TestOPENMXTraj, TestOPENMXLabeledTraj) that validate atom metadata, cell matrices, and coordinates for frames from openmx/Au111Surface using fmt="openmx/md".

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main changes: improving robustness of cell parsing and supporting a new OpenMX output format (scf_conv flag).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/test_openmx_multiple_formats.py (1)

49-56: Assert the expected non-convergence warning.

The new fixture intentionally includes scf_conv=0; wrapping construction with assertWarnsRegex both verifies the new behavior and avoids unasserted warning noise.

Proposed test update
 class TestOPENMXTraj(unittest.TestCase, TestOPENMXTRAJProps):
     def setUp(self):
-        self.system = dpdata.System("openmx/Au111Surface", fmt="openmx/md")
+        with self.assertWarnsRegex(UserWarning, "SCF not converged"):
+            self.system = dpdata.System("openmx/Au111Surface", fmt="openmx/md")
 
 
 class TestOPENMXLabeledTraj(unittest.TestCase, TestOPENMXTRAJProps):
     def setUp(self):
-        self.system = dpdata.LabeledSystem("openmx/Au111Surface", fmt="openmx/md")
+        with self.assertWarnsRegex(UserWarning, "SCF not converged"):
+            self.system = dpdata.LabeledSystem("openmx/Au111Surface", fmt="openmx/md")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_openmx_multiple_formats.py` around lines 49 - 56, The tests create
systems in TestOPENMXTraj.setUp and TestOPENMXLabeledTraj.setUp using
dpdata.System and dpdata.LabeledSystem but do not assert the new non-convergence
warning produced by the fixture with scf_conv=0; update both setUp methods to
wrap the construction in self.assertWarnsRegex(WarningClassOrBase,
r"non-converge|non[- ]converg", msg=None) (use the actual warning class emitted
by the OpenMX loader if available, otherwise Warning) so the tests explicitly
expect and match the non-convergence warning and suppress unasserted warning
noise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dpdata/openmx/omx.py`:
- Around line 77-85: The SCF warning uses an unnecessary f-string and omits the
required stacklevel when calling warnings.warn; inside the loop that detects
tokens starting with "scf_conv=" (the code that sets scf_conv and checks if
scf_conv == 0) replace warnings.warn(f"SCF not converged!") with
warnings.warn("SCF not converged!", stacklevel=2) (or an appropriate stacklevel)
so the message is a plain string and includes the stacklevel argument.

In `@tests/test_openmx_multiple_formats.py`:
- Around line 19-32: The test_cell assertion uses the wrong indexing for the
expected array: it compares self.system["cells"][ff][ii][jj] against
cells[ii][jj] which ignores the frame axis and can index out of bounds; update
the expected access to include the frame dimension (use cells[ff][ii][jj]) so
the test compares corresponding frames and scalar elements; modify the assertion
inside test_cell accordingly (referencing test_cell, self.system["cells"], and
the local variable cells).

---

Nitpick comments:
In `@tests/test_openmx_multiple_formats.py`:
- Around line 49-56: The tests create systems in TestOPENMXTraj.setUp and
TestOPENMXLabeledTraj.setUp using dpdata.System and dpdata.LabeledSystem but do
not assert the new non-convergence warning produced by the fixture with
scf_conv=0; update both setUp methods to wrap the construction in
self.assertWarnsRegex(WarningClassOrBase, r"non-converge|non[- ]converg",
msg=None) (use the actual warning class emitted by the OpenMX loader if
available, otherwise Warning) so the tests explicitly expect and match the
non-convergence warning and suppress unasserted warning noise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9dabf793-73ed-46d2-a0ce-01eb96e8ee39

📥 Commits

Reviewing files that changed from the base of the PR and between e0f56ac and 36fa629.

⛔ Files ignored due to path filters (2)
  • tests/openmx/Au111Surface.dat is excluded by !**/*.dat
  • tests/openmx/Methane2.dat is excluded by !**/*.dat
📒 Files selected for processing (5)
  • dpdata/openmx/omx.py
  • tests/openmx/Au111Surface.md
  • tests/openmx/Methane2.md
  • tests/test_openmx_check_convergence.py
  • tests/test_openmx_multiple_formats.py
💤 Files with no reviewable changes (2)
  • tests/openmx/Methane2.md
  • tests/test_openmx_check_convergence.py

Comment thread dpdata/openmx/omx.py Outdated
Comment thread tests/test_openmx_multiple_formats.py
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 18, 2026

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 86.70%. Comparing base (e0f56ac) to head (a290ae4).

Files with missing lines Patch % Lines
dpdata/openmx/omx.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #967   +/-   ##
=======================================
  Coverage   86.70%   86.70%           
=======================================
  Files          86       86           
  Lines        8039     8034    -5     
=======================================
- Hits         6970     6966    -4     
+ Misses       1069     1068    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dpdata/openmx/omx.py`:
- Around line 75-80: The convergence check was moved to look for "scf_conv=" on
the Cell_Vectors line but omitted the legacy per-coordinate flag in load_coords;
update load_coords() to first look for the new "scf_conv=" token (as currently
done) and if not found, also check parts[14] on coordinate lines for the legacy
convergence flag and issue warnings.warn("SCF not converged!", stacklevel=2)
when that legacy flag indicates non-convergence; also add unit tests that feed
both legacy-format input (with parts[14] non-zero) and new-format input (with
scf_conv=0 token) and assert that a warning is raised in each case to cover both
formats.

In `@tests/test_openmx_multiple_formats.py`:
- Around line 51-57: The setup methods for TestOPENMXTRAJProps.setUp and
TestOPENMXLabeledTraj.setUp call dpdata.System and dpdata.LabeledSystem which
emit a UserWarning for a non-converged SCF; wrap those constructor calls in a
warnings.catch_warnings() context and use warnings.simplefilter("ignore",
UserWarning) (or record and assert if you prefer an explicit check) so the
warning is captured/suppressed during setup rather than leaking outside tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ee5affe4-ea88-489a-afdf-acf06e9c8293

📥 Commits

Reviewing files that changed from the base of the PR and between 36fa629 and 627076f.

📒 Files selected for processing (2)
  • dpdata/openmx/omx.py
  • tests/test_openmx_multiple_formats.py

Comment thread dpdata/openmx/omx.py
Comment thread tests/test_openmx_multiple_formats.py Outdated
@shigeandtomo
Copy link
Copy Markdown
Contributor Author

@njzjz

Dear Jinzhe Zeng,

This PR introduces support for the upcoming OpenMX version 4.0 output format.

Since the new version is expected to be released soon, it would be very helpful if this PR could be reviewed and merged when possible.

Thank you very much for your time and consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dpdata enhancement New feature or request size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant