[Coin-discuss] problem with CoinMessage while running CLP/CBC

John J Forrest jjforre at us.ibm.com
Thu Jun 14 04:52:35 EDT 2007


Kish,

I had in fact found that bug a week or go, but forgot to commit fix.  It
only occurs on 64 bit computers and if copies of Cbc/Cbc models are made in
certain situations.  An offset into an array was being computed and this
should have been a long int not just an int.

Change committed.

John


                                                                           
             Kish Shen                                                     
             <kish.shen at crossc                                             
             oreop.com>                                                 To 
             Sent by:                  coin-discuss at list.coin-or.org       
             coin-discuss-boun                                          cc 
             ces at list.coin-or.         Kish Shen <kish at crosscoreop.com>    
             org                                                   Subject 
                                       [Coin-discuss] problem with         
                                       CoinMessage while running CLP/CBC   
             06/13/07 10:43 PM                                             
                                                                           
                                                                           
             Please respond to                                             
             Discussions about                                             
                open source                                                
               software for                                                
                Operations                                                 
                 Research                                                  
             <coin-discuss at lis                                             
              t.coin-or.org>                                               
                                                                           
                                                                           




Hi,

I have been running into this problem with CLP/CBC where I get a SIGSEGV
that seems to happen
in the CoinMessage routines while solving an LP problem using CLP.  I am
not sure exactly
what the problem is (CLP, CBC, my own code, other system software, or a
combination...), so
I thought this is the best list to post to, and hopefully I can get some
help with the problem.

My use of CLP/CBC is to provide an interface to these solver (eplex) in our
programming language
ECLiPSe: the user would specify their problem in our language, and can then
invoke CLP/CBC
(via the eplex interface) to solve their problem. As part of our testing
procedures, we run nightly
build and tests, and recently, some of these tests, associated with testing
the eplex interface,
would produce SIGSEGV.

The problem only occur on x86_64, running Linux, and only when we run
ECLiPSe embedded in
Java. Here is an example error log from Java:

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0x00002aaabbebf026, pid=28950, tid=47940664790736
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.5.0_07-b03 mixed mode)
# Problematic frame:
# C  [seosiclpcbc.so+0x335026]  _ZN14CoinOneMessageaSERKS_+0x1e
#

---------------  T H R E A D  ---------------

Current thread (0x0000000040115c70):  JavaThread "main" [_thread_in_native,
id=2
....
Stack: [0x00007fff9c959000,0x00007fff9cb59000),  sp=0x00007fff9cb51ac0,
free sp
ace=2018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
C  [seosiclpcbc.so+0x335026]  _ZN14CoinOneMessageaSERKS_+0x1e
C  [seosiclpcbc.so+0x3352ab]
_ZN18CoinMessageHandler7messageEiRK12CoinMessages+
0x6b
C  [seosiclpcbc.so+0x1a17bb]  _ZN11ClpPresolve9postsolveEb+0x88f
C  [seosiclpcbc.so+0xdc70b]  _ZN21OsiClpSolverInterface12initialSolveEv
+0x959

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  com.parctechnologies.eclipse.NativeEclipse.HandleEvents
(Ljava/lang/Integer;)I
+0
j  com.parctechnologies.eclipse.EmbeddedEclipse.getNextControlSignal
(ZZ)Lcom/par

---
So the problem seems to be in CoinOneMessage.

I found that if I compile CBC with --enable-debug, then I can get an
assertion error instead:

java: ../../../CoinUtils/src/CoinMessageHandler.cpp:123:
CoinMessages::CoinMessages(const CoinMessages&): Assertion
`newAddress-temp<lengthMessages_' failed.

This problem only occur if I compile my code against relatively recent CBC
source: we use the
trunk branch of CBC, and I don't have this problem with a CBC trunk branch
I downloaded
5 Dec 2006, but I do get the problem with the next trunk branch I
downloaded on 5 March 2007.
I have also tried with a trunk branch I downloaded 11 June, and I still
have the same problem.
[thanks to John for fixing the problem with compiling the trunk branch CBC
last week!]

I cannot compile my code against the stable branch (missing functionality).


In our tests, we repeatedly run CLP/CBC to solve many test problems. The
SIGSEGV seem to
happen only occasionally, and not in the same place in the test, although
it seems to always
happen in CoinOneMessage. It seems to happen if we call CLP to solve an LP
problem
frequently enough: to test this, I ran a loop to repeatedly solve the same
LP problem (bell3a,
but I don't think it matters too much what problem I am solving), and the
crash happens
anywhere from after a few hundred solves, to after 10,000+ solves.

I am wondering if the problem is somehow connected with filling some
buffers -- the assertion
test seem to suggest some memory problem, and the way the output in our
tests is that they
are sent through a (Unix) pipe -- is there some buffer that may be full?

Unfortunately, I have not been able to reproduce the test without running
the problem embedded
in Java, but I am hoping that someone can give me some help on why this is
happening, or
suggest how I can find out more about the problem. Does the assertion
failure provide useful
information? I have been trying to track down this problem for several
weeks now, and for the
moment it is proving a blocker and I have to revert to using CBC source
from last year (which
does not seem to have this problem).

Thanks in advance for any help!

Yours sincerely,

Kish Shen
_______________________________________________
Coin-discuss mailing list
Coin-discuss at list.coin-or.org
http://list.coin-or.org/mailman/listinfo/coin-discuss





More information about the Coin-discuss mailing list