Describe the bug
Build: ABACUS DSP build, run on multiple DSP nodes.
Calculation: sDFT scf
If many nodes are used, a segmentation fault will be raised after the output of TIME STATISTICS in the running_scf.log.
If node number =3,4,5, abacus_dsp will terminate without error, while a segfault will be raised if node number = 6,7,8.
The running_scf.log is truncated at the TIME STATISTICS block.
TIME STATISTICS
---------------------------------------------------------------
CLASS_NAME NAME TIME/s CALLS AVG/s PER/%
---------------------------------------------------------------
...
---------------------------------------------------------------
# no output below!
The block starting with
NAME-------------------------|MEMORY(MB)------------------
will not be output.
In this case, stderr gives a segmentation fault:
[cn4053:721248:0:721248] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x6c656e72656b57)
==== backtrace (tid: 721248) ====
0 /usr/local/ucx/lib/libucs.so.0(ucs_handle_error+0x2d4) [0x40003df0979c]
1 /usr/local/ucx/lib/libucs.so.0(+0x2a92c) [0x40003df0992c]
2 /usr/local/ucx/lib/libucs.so.0(+0x2acd4) [0x40003df09cd4]
3 linux-vdso.so.1(__kernel_rt_sigreturn+0) [0x400039ed65b8]
4 /lib/aarch64-linux-gnu/libc.so.6(cfree+0x24) [0x40003d468a2c]
5 /abacus-develop/build_dsp/abacus_dsp(+0x6e704c) [0xaaaadf64204c]
6 /abacus-develop/build_dsp/abacus_dsp(+0x55f8d8) [0xaaaadf4ba8d8]
7 /abacus-develop/build_dsp/abacus_dsp(+0x3769cc) [0xaaaadf2d19cc]
8 /abacus-develop/build_dsp/abacus_dsp(+0x376220) [0xaaaadf2d1220]
9 /abacus-develop/build_dsp/abacus_dsp(+0x3763dc) [0xaaaadf2d13dc]
10 /abacus-develop/build_dsp/abacus_dsp(+0x99d48) [0xaaaadeff4d48]
11 /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe8) [0x40003d40fe10]
12 /abacus-develop/build_dsp/abacus_dsp(+0x99bc8) [0xaaaadeff4bc8]
=================================
srun: error: cn4053: task 20: Segmentation fault
slurmstepd: error: mpi/pmix_v3: _errhandler: cn4053 [5]: pmixp_client_v2.c:211: Error handler invoked: status = -25, source = [slurm.pmix.5500412.0:20]
slurmstepd: error: *** STEP 5500412.0 ON cn4048 CANCELLED AT 2026-04-19T21:23:54 ***
srun: Job step aborted: Waiting up to 302 seconds for job step to finish.
srun: error: cn4053: tasks 21-23: Killed
Expected behavior
There should be a block as follows after TIME STATISTICS in running_scf.log without segfault:
NAME-------------------------|MEMORY(MB)------------------
total 6221.1736
SDFT::chi0_cpu 2120.1782
...
------------- < 1.0 MB has been ignored ----------------
----------------------------------------------------------
To Reproduce
ABACUS Release v3.9.0.27
Environment
No response
Additional Context
CASE: sdft scf, C-sdft-8atom-700eV-3.5η
issue7269.tar.gz
Describe the bug
Build: ABACUS DSP build, run on multiple DSP nodes.
Calculation: sDFT scf
If many nodes are used, a segmentation fault will be raised after the output of TIME STATISTICS in the
running_scf.log.If node number =3,4,5,
abacus_dspwill terminate without error, while a segfault will be raised if node number = 6,7,8.The
running_scf.logis truncated at theTIME STATISTICSblock.The block starting with
will not be output.
In this case,
stderrgives a segmentation fault:Expected behavior
There should be a block as follows after
TIME STATISTICSinrunning_scf.logwithout segfault:To Reproduce
ABACUS Release v3.9.0.27
Environment
No response
Additional Context
CASE: sdft scf, C-sdft-8atom-700eV-3.5η
issue7269.tar.gz