[CHiPPS-tickets] [COIN-OR High-Performance Parallel Search] #23: MPI BLIS hangs during termination check...

COIN-OR High-Performance Parallel Search coin-trac at coin-or.org
Mon Jan 26 16:02:10 EST 2009


#23: MPI BLIS hangs during termination check...
-----------------------+----------------------------------------------------
Reporter:  nedwards    |       Owner:  yanxu
    Type:  defect      |      Status:  new  
Priority:  major       |   Component:  ALPS 
 Version:  stable/0.9  |    Keywords:       
-----------------------+----------------------------------------------------
 I am getting pretty regular hangs during the termination check of MPI BLIS
 (stable 0.9). I suspect a race condition, because it seems to happen more
 regularly when I use more processors, and I can often get a successful
 completion by reducing the number of processors I use. I often run MPI
 BLIS with 40 or so MPI processes on 5 machines with multiple dual and
 quad-core processors. When this occurs, it is quite repeatable - blis will
 hang each time I run it, until I change the number of MPI processes.

 The last messages are:


 {{{
 Alps0190I Master[0] is doing termination check
 Alps0192I Master[0] asked other processes to stop searching
 }}}
 I've attached a relatively small example MPS file that is currently
 generating the problem...

-- 
Ticket URL: <https://projects.coin-or.org/CHiPPS/ticket/23>
COIN-OR High-Performance Parallel Search <http://projects.coin-or.org/CHiPPS>
A framework for data-intensive tree-search algorithms.



More information about the CHiPPS-tickets mailing list