Refactor: convert ia-it nested loops to iat flat loops with OpenMP in ESolver_DP by chengleizheng · Pull Request #7394 · deepmodeling/abacus-develop

chengleizheng · 2026-05-29T08:58:02Z

Replaced ia-it nested loops with flat iat loops using ucell.iat2it/iat2ia lookup arrays and added #pragma omp parallel for guarded by #ifdef _OPENMP in runner() coord building, runner() force assignment, and type_map() atype assignment.

…t2ia lookup arrays and added #pragma omp parallel for guarded by #ifdef _OPENMP in runner() coord building, runner() force assignment, and type_map() atype assignment.

mohanchen · 2026-05-30T23:30:43Z

Nice try, you can do more, and put your test and analysis here.

mohanchen · 2026-05-30T23:50:30Z

+#ifdef _OPENMP
+#pragma omp parallel for
+#endif
+    for (int iat = 0; iat < ucell.nat; ++iat)


I recommend default(none) because it requires explicit variable scoping and avoids hidden parallel errors.

Thanks for the recommendation!😊

…ared variables.

chengleizheng · 2026-06-01T02:40:01Z

二、代码修改详情

修改总览

修改点	位置	改动内容
`runner()` 坐标构建循环	L73-84	`ia-it` 双层嵌套 → `iat` 扁平循环 + OpenMP
`runner()` 力赋值循环	L103-105	加 `#pragma omp parallel for`
`type_map()` atype 赋值循环	L189-209	校验与赋值分离 + `iat` 扁平循环 + OpenMP

2.1 `runner()` 坐标构建

改动前（ia-it 嵌套循环）：

int iat = 0;
for (int it = 0; it < ucell.ntype; ++it)          // 外层：元素种类（1~3）
{
    for (int ia = 0; ia < ucell.atoms[it].na; ++ia) // 内层：同类原子（数百）
    {
        coord[3 * iat]     = ucell.atoms[it].tau[ia].x * ucell.lat0_angstrom;
        coord[3 * iat + 1] = ucell.atoms[it].tau[ia].y * ucell.lat0_angstrom;
        coord[3 * iat + 2] = ucell.atoms[it].tau[ia].z * ucell.lat0_angstrom;
        iat++;  // 共享变量，多线程下产生数据竞争
    }
}
assert(ucell.nat == iat);

改动后（iat 扁平循环 + OpenMP）：

#ifdef _OPENMP
#pragma omp parallel for default(none) shared(ucell, coord)
#endif
for (int iat = 0; iat < ucell.nat; ++iat)
{
    int it = ucell.iat2it[iat];  // 预建索引表，O(1) 查表
    int ia = ucell.iat2ia[iat];  // 预建索引表，O(1) 查表
    coord[3 * iat]     = ucell.atoms[it].tau[ia].x * ucell.lat0_angstrom;
    coord[3 * iat + 1] = ucell.atoms[it].tau[ia].y * ucell.lat0_angstrom;
    coord[3 * iat + 2] = ucell.atoms[it].tau[ia].z * ucell.lat0_angstrom;
}

改动要点：

消除共享变量 iat++，每次迭代写入位置仅依赖循环变量 iat，无数据竞争
利用 UnitCell 初始化时预建的 iat2it[] / iat2ia[] 索引数组（O(1) 查表），替代嵌套循环的类型遍历
864 次迭代均匀分配，天然负载均衡
消除了冗余的 assert(nat == iat)（iat 扁平循环不再需要累加校验）

2.2 `runner()` 力赋值

#ifdef _OPENMP
#pragma omp parallel for default(none) shared(ucell, f, fact_f)
#endif
for (int i = 0; i < ucell.nat; ++i)
{
    dp_force(i, 0) = f[3 * i] * fact_f;
    dp_force(i, 1) = f[3 * i + 1] * fact_f;
    dp_force(i, 2) = f[3 * i + 2] * fact_f;
}

此处循环体原本就是单层 iat 循环，仅添加 OpenMP pragma，改动量最小。

2.3 `type_map()` 校验与赋值分离

改动前：label 校验（含 WARNING_QUIT）和 atype 赋值混杂在双层循环中，校验代码被每条原子无意义地重复执行。

改动后：拆为两个阶段：

// 阶段 1：校验（串行，仅循环 ntype 次）
for (int it = 0; it < ucell.ntype; ++it)
    if (label.find(ucell.atoms[it].label) == label.end())
        WARNING_QUIT(...);

// 阶段 2：赋值（并行，ntype → nat 次迭代）
#ifdef _OPENMP
#pragma omp parallel for default(none) shared(ucell, label)
#endif
for (int iat = 0; iat < ucell.nat; ++iat)
    atype[iat] = label[ucell.atoms[ucell.iat2it[iat]].label];

工程细节

所有 #pragma omp 用 #ifdef _OPENMP 宏保护，对齐项目规范（参考 esolver_of_tddft.cpp）
virial 赋值（3×3 循环，共 9 次迭代）不并行——线程启动开销大于计算量
schedule 使用默认 static，因为每次迭代工作量一致

三、性能测试结果

测试环境：

体系：864 原子 Al（abacus-user-guide/examples/md/3_DPMD）
势函数：Al-SCAN.pb（DeepMD 模型）
MD 类型：MSST，10 步
编译：cmake .. -DDeePMD_DIR=/home/chenglei/miniconda3/envs/deepmd

优化前：mpirun -np 1，代码里没有 #pragma omp，循环纯串行执行

 TIME STATISTICS
-------------------------------------------------------
 CLASS_NAME     NAME           TIME/s  CALLS   AVG/s  PER/%
-------------------------------------------------------
            total              10.23  1        10.23  100.00
 Driver     atomic_world       10.23  1        10.23  100.00
 Run_MD     md_line             9.14  1         9.14   89.36
 MD_func    force_virial        2.64  10        0.26   25.79
 ESolver_DP runner              2.60  10        0.26   25.42
-------------------------------------------------------
 TOTAL  Time  : 10s

优化后：mpirun -np 1，未设 OMP_NUM_THREADS，OpenMP 默认使用机器上所有核

 TIME STATISTICS
-------------------------------------------------------
 CLASS_NAME     NAME           TIME/s  CALLS   AVG/s  PER/%
-------------------------------------------------------
            total               6.28  1         6.28  100.00
 Run_MD     md_line             5.71  1         5.71   90.90
 MD_func    force_virial        2.59  10        0.26   41.18
 ESolver_DP runner              2.55  10        0.25   40.54
-------------------------------------------------------
 TOTAL  Time  : 6s

效果对比

指标	优化前	优化后	变化
总时间	10.23 s	6.28 s	-38.6%
Run_MD md_line	9.14 s	5.71 s	-37.5%
ESolver_DP runner	2.60 s	2.55 s	-1.9%
runner 单步平均	0.260 s	0.255 s	-1.9%

分析：

ESolver_DP::runner 自身的耗时小幅下降（2.60s → 2.55s），因为坐标构建和力赋值的循环在 864 原子规模下本身开销有限
总时间大幅缩短（10.23s → 6.28s）的主要原因：启用 OpenMP 后 MKL/BLAS 等数学库自动受益于多线程。

Replaced ia-it nested loops with flat iat loops using ucell.iat2it/ia…

eb9114b

…t2ia lookup arrays and added #pragma omp parallel for guarded by #ifdef _OPENMP in runner() coord building, runner() force assignment, and type_map() atype assignment.

mohanchen requested a review from 19hello May 29, 2026 09:50

mohanchen added Feature Discussed The features will be discussed first but will not be implemented soon project_learning labels May 29, 2026

mohanchen reviewed May 30, 2026

View reviewed changes

Add default(none) to explicitly distinguish private variables from sh…

006d439

…ared variables.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: convert ia-it nested loops to iat flat loops with OpenMP in ESolver_DP#7394

Refactor: convert ia-it nested loops to iat flat loops with OpenMP in ESolver_DP#7394
chengleizheng wants to merge 2 commits into
deepmodeling:developfrom
chengleizheng:develop

chengleizheng commented May 29, 2026 •

edited

Loading

Uh oh!

mohanchen commented May 30, 2026

Uh oh!

mohanchen May 30, 2026

Uh oh!

chengleizheng Jun 1, 2026

Uh oh!

chengleizheng commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chengleizheng commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mohanchen commented May 30, 2026

Uh oh!

mohanchen May 30, 2026

Choose a reason for hiding this comment

Uh oh!

chengleizheng Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chengleizheng commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

二、代码修改详情

修改总览

2.1 runner() 坐标构建

2.2 runner() 力赋值

2.3 type_map() 校验与赋值分离

工程细节

三、性能测试结果

优化前：mpirun -np 1，代码里没有 #pragma omp，循环纯串行执行

优化后：mpirun -np 1，未设 OMP_NUM_THREADS，OpenMP 默认使用机器上所有核

效果对比

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chengleizheng commented May 29, 2026 •

edited

Loading

chengleizheng commented Jun 1, 2026 •

edited

Loading

2.1 `runner()` 坐标构建

2.2 `runner()` 力赋值

2.3 `type_map()` 校验与赋值分离