[Coin-discuss] problem with CoinMessage while running CLP/CBC

Kish Shen kish.shen at crosscoreop.com
Wed Jun 13 22:43:38 EDT 2007


Hi,

I have been running into this problem with CLP/CBC where I get a SIGSEGV that seems to happen
in the CoinMessage routines while solving an LP problem using CLP.  I am not sure exactly 
what the problem is (CLP, CBC, my own code, other system software, or a combination...), so
I thought this is the best list to post to, and hopefully I can get some help with the problem. 

My use of CLP/CBC is to provide an interface to these solver (eplex) in our programming language
ECLiPSe: the user would specify their problem in our language, and can then invoke CLP/CBC
(via the eplex interface) to solve their problem. As part of our testing procedures, we run nightly
build and tests, and recently, some of these tests, associated with testing the eplex interface, 
would produce SIGSEGV.

The problem only occur on x86_64, running Linux, and only when we run ECLiPSe embedded in
Java. Here is an example error log from Java:

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0x00002aaabbebf026, pid=28950, tid=47940664790736
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.5.0_07-b03 mixed mode)
# Problematic frame:
# C  [seosiclpcbc.so+0x335026]  _ZN14CoinOneMessageaSERKS_+0x1e
#

---------------  T H R E A D  ---------------

Current thread (0x0000000040115c70):  JavaThread "main" [_thread_in_native, id=2
....
Stack: [0x00007fff9c959000,0x00007fff9cb59000),  sp=0x00007fff9cb51ac0,  free sp
ace=2018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [seosiclpcbc.so+0x335026]  _ZN14CoinOneMessageaSERKS_+0x1e
C  [seosiclpcbc.so+0x3352ab]  _ZN18CoinMessageHandler7messageEiRK12CoinMessages+
0x6b
C  [seosiclpcbc.so+0x1a17bb]  _ZN11ClpPresolve9postsolveEb+0x88f
C  [seosiclpcbc.so+0xdc70b]  _ZN21OsiClpSolverInterface12initialSolveEv+0x959

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  com.parctechnologies.eclipse.NativeEclipse.HandleEvents(Ljava/lang/Integer;)I
+0
j  com.parctechnologies.eclipse.EmbeddedEclipse.getNextControlSignal(ZZ)Lcom/par

---
So the problem seems to be in CoinOneMessage.

I found that if I compile CBC with --enable-debug, then I can get an assertion error instead:

java: ../../../CoinUtils/src/CoinMessageHandler.cpp:123: CoinMessages::CoinMessages(const CoinMessages&): Assertion `newAddress-temp<lengthMessages_' failed.

This problem only occur if I compile my code against relatively recent CBC source: we use the
trunk branch of CBC, and I don't have this problem with a CBC trunk branch I downloaded 
5 Dec 2006, but I do get the problem with the next trunk branch I downloaded on 5 March 2007.
I have also tried with a trunk branch I downloaded 11 June, and I still have the same problem.
[thanks to John for fixing the problem with compiling the trunk branch CBC last week!]

I cannot compile my code against the stable branch (missing functionality). 

In our tests, we repeatedly run CLP/CBC to solve many test problems. The SIGSEGV seem to 
happen only occasionally, and not in the same place in the test, although it seems to always 
happen in CoinOneMessage. It seems to happen if we call CLP to solve an LP problem
frequently enough: to test this, I ran a loop to repeatedly solve the same LP problem (bell3a,
but I don't think it matters too much what problem I am solving), and the crash happens 
anywhere from after a few hundred solves, to after 10,000+ solves. 

I am wondering if the problem is somehow connected with filling some buffers -- the assertion
test seem to suggest some memory problem, and the way the output in our tests is that they
are sent through a (Unix) pipe -- is there some buffer that may be full?

Unfortunately, I have not been able to reproduce the test without running the problem embedded
in Java, but I am hoping that someone can give me some help on why this is happening, or
suggest how I can find out more about the problem. Does the assertion failure provide useful
information? I have been trying to track down this problem for several weeks now, and for the
moment it is proving a blocker and I have to revert to using CBC source from last year (which 
does not seem to have this problem).

Thanks in advance for any help!

Yours sincerely,

Kish Shen



More information about the Coin-discuss mailing list