[Cbc] Cbc segfault after many hours of CPU time

acw at ascent.com acw at ascent.com
Mon Mar 11 09:43:22 EDT 2013


It's probable that the debug version is imposing some constraint on 
thread-switching that
prevents the problem from occurring.

I ran the problem over the weekend with no -threads option (I think this 
is equivalent to
-threads 0, right?), and indeed, when I came in this morning it had 
happily chugged through
40 million nodes without faulting. So all the experimental evidence points 
to a concurrency
bug.

For now I will use -threads 104 and take the mild performance hit, but we 
will be waiting
eagerly for your "trivial feature"; a factor of several thousand speedup 
sounds lovely!

Thank you very much for your efforts.

Allan Wechsler



From:
John Forrest <john.forrest at fastercoin.com>
To:
acw at ascent.com
Cc:
cbc at list.coin-or.org
Date:
03/11/2013 04:44 AM
Subject:
Re: [Cbc] Cbc segfault after many hours of CPU time



Status report on segfault.

Unable to reproduce with debug version.

Have reproduced segfaults three times with optimized version.  It is 
always to do with multi-threading.  Once I could see something I didn't 
like so have modified that.  However the other two times it was not 
obvious.  Looking at registers and disassembled code I could see 
segfault but going from a few instructions back it should have worked. 
So classic overwriting due to threads.  But I can't see what is wrong 
with locking/unlocking threads to stop overwriting. Will continue 
looking slowly.

However if you use -thread 104 instead of 4 that switches on 
deterministic parallel.  This is not quite as efficient (it would be a 
lot better if the effort per node was better determined e.g. as in 
Cplex's"ticks") but does not have same problem.  That has been running a 
long time without a problem (>63 million nodes).

Using my (not in svn yet) trunk and throwing every cut at problem I can 
prove solution of  499243.8 is optimal after 23 million nodes.

However looking at problem I tried another trivial experimental feature 
on problem.  There may be bugs, but I don't think so.  That took 15,368 
nodes and 54 seconds.

John Forrest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.coin-or.org/pipermail/cbc/attachments/20130311/716dcd68/attachment.html>


More information about the Cbc mailing list