Parallel model sometimes hangs
Description
Some parallel models are waiting forever / hang
Steps to Reproduce the Problem
- Generate bulk mesh with
generateStructuredMesh -e hex --nx 100 --lx 1 --ny 100 --ly 1 --nz 100 --lz 1 -o cube_1x1x1_hex_1e6.vtu
- Use partition.sh with argument 4 creating partitioned model (2, 8, 16 partitions are running as expected)
- Use the usual
cube_1x1x1.gml
- Prj. file is cube_hex_1e6.prj
mpirun -np 4 xterm -fa 'Monospace' -fs 10 -e gdb --args ~/w/o/debug_petsc/bin/ogs -l debug cube_hex_1e6.prj -o results
backtrace
#0 0x00007fffec1139df in __GI___poll (fds=0x61100004c440, nfds=4, timeout=0) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007ffff7878550 in __interceptor_poll (fds=0x61100004c440, nfds=4, timeout=0)
at /usr/src/debug/gcc/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4044
#2 0x00007fffeadf196a in poll (__timeout=<optimized out>, __nfds=4, __fds=0x61100004c440) at /usr/include/bits/poll2.h:39
#3 poll_dispatch (base=0x61700001ea80, tv=0x7fffffff6ad0) at /usr/src/debug/libevent/libevent-2.1.12-stable/poll.c:182
#4 0x00007fffeadee185 in event_base_loop (base=0x61700001ea80, flags=2)
at /usr/src/debug/libevent/libevent-2.1.12-stable/event.c:1992
#5 0x00007fffea3cab2d in opal_progress_events () at runtime/opal_progress.c:191
#6 opal_progress_events.isra.0 () at runtime/opal_progress.c:172
#7 0x00007fffea37d774 in opal_progress () at runtime/opal_progress.c:247
#8 0x00007ffff4323b16 in ompi_request_wait_completion (req=0x61f000038a00) at ../ompi/request/request.h:440
#9 ompi_request_default_wait (req_ptr=0x7fffffff6ca0, status=0x0) at request/req_wait.c:42
#10 0x00007ffff437dfe7 in ompi_coll_base_bcast_intra_generic (buffer=buffer@entry=0x7fffffff6e14,
original_count=original_count@entry=1, datatype=datatype@entry=0x7ffff43de600 <ompi_mpi_int>, root=root@entry=0,
comm=comm@entry=0x7ffff43ef780 <ompi_mpi_comm_world>, module=module@entry=0x61a00001b680, count_by_segment=1,
tree=0x60e00002b380) at mca/coll/base/coll_base_bcast.c:163
#11 0x00007ffff437e843 in ompi_coll_base_bcast_intra_bintree (buffer=0x7fffffff6e14, count=1,
datatype=0x7ffff43de600 <ompi_mpi_int>, root=0, comm=0x7ffff43ef780 <ompi_mpi_comm_world>, module=0x61a00001b680,
segsize=0) at mca/coll/base/coll_base_bcast.c:272
#12 0x00007fffd49feddf in ompi_coll_tuned_bcast_intra_do_this (buf=<optimized out>, count=<optimized out>,
dtype=<optimized out>, root=<optimized out>, comm=<optimized out>, module=<optimized out>, algorithm=<optimized out>,
faninout=0, segsize=0) at /usr/src/debug/openmpi/openmpi-4.1.5/ompi/mca/coll/tuned/coll_tuned_bcast_decision.c:157
#13 0x00007fffd49fee51 in ompi_coll_tuned_bcast_intra_dec_fixed (buff=<optimized out>, count=<optimized out>,
datatype=<optimized out>, root=<optimized out>, comm=<optimized out>, module=<optimized out>)
at /usr/src/debug/openmpi/openmpi-4.1.5/ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:649
#14 0x00007ffff4334230 in PMPI_Bcast (buffer=0x7fffffff6e14, count=1, datatype=0x7ffff43de600 <ompi_mpi_int>, root=0,
comm=0x7ffff43ef780 <ompi_mpi_comm_world>) at mpi/c/profile/pbcast.c:114
#15 0x00007ffff4b01227 in PetscOptionsGetenv () from /opt/petsc/linux-c-opt/lib/libpetsc.so.3.18
#16 0x00007ffff4b0025d in PetscStrreplace () from /opt/petsc/linux-c-opt/lib/libpetsc.so.3.18
#17 0x00007ffff4b191e9 in PetscOptionsInsertFile () from /opt/petsc/linux-c-opt/lib/libpetsc.so.3.18
#18 0x00007ffff4b1b302 in PetscOptionsInsert () from /opt/petsc/linux-c-opt/lib/libpetsc.so.3.18
#19 0x00007ffff4b3178a in ?? () from /opt/petsc/linux-c-opt/lib/libpetsc.so.3.18
#20 0x00007ffff4b31e3e in PetscInitialize () from /opt/petsc/linux-c-opt/lib/libpetsc.so.3.18
#21 0x000055556ddde552 in ApplicationsLib::LinearSolverLibrarySetup::LinearSolverLibrarySetup (this=0x7fffffffcc50, argc=6,
argv=0x7fffffffcfc8) at /home/fischeth/w/o/s/Applications/ApplicationsLib/LinearSolverLibrarySetup.h:34
#22 0x000055556ddab050 in Simulation::Simulation (this=0x7fffffffcc50, argc=6, argv=0x7fffffffcfc8)
at /home/fischeth/w/o/s/Applications/ApplicationsLib/Simulation.cpp:28
#23 0x000055556dbb6d2c in main (argc=6, argv=0x7fffffffcfc8) at /home/fischeth/w/o/s/Applications/CLI/ogs.cpp:93
Specifications
- Version: ogs current master
- Platform: linux at envinf2