[CHiPPS-tickets] [COIN-OR High-Performance Parallel Search] #23: MPI BLIS hangs during termination check...
COIN-OR High-Performance Parallel Search
coin-trac at coin-or.org
Mon Jan 26 16:02:10 EST 2009
#23: MPI BLIS hangs during termination check...
-----------------------+----------------------------------------------------
Reporter: nedwards | Owner: yanxu
Type: defect | Status: new
Priority: major | Component: ALPS
Version: stable/0.9 | Keywords:
-----------------------+----------------------------------------------------
I am getting pretty regular hangs during the termination check of MPI BLIS
(stable 0.9). I suspect a race condition, because it seems to happen more
regularly when I use more processors, and I can often get a successful
completion by reducing the number of processors I use. I often run MPI
BLIS with 40 or so MPI processes on 5 machines with multiple dual and
quad-core processors. When this occurs, it is quite repeatable - blis will
hang each time I run it, until I change the number of MPI processes.
The last messages are:
{{{
Alps0190I Master[0] is doing termination check
Alps0192I Master[0] asked other processes to stop searching
}}}
I've attached a relatively small example MPS file that is currently
generating the problem...
--
Ticket URL: <https://projects.coin-or.org/CHiPPS/ticket/23>
COIN-OR High-Performance Parallel Search <http://projects.coin-or.org/CHiPPS>
A framework for data-intensive tree-search algorithms.
More information about the CHiPPS-tickets
mailing list