[Coin-discuss] problem with CoinMessage while running CLP/CBC
John J Forrest
jjforre at us.ibm.com
Thu Jun 14 04:52:35 EDT 2007
Kish,
I had in fact found that bug a week or go, but forgot to commit fix. It
only occurs on 64 bit computers and if copies of Cbc/Cbc models are made in
certain situations. An offset into an array was being computed and this
should have been a long int not just an int.
Change committed.
John
Kish Shen
<kish.shen at crossc
oreop.com> To
Sent by: coin-discuss at list.coin-or.org
coin-discuss-boun cc
ces at list.coin-or. Kish Shen <kish at crosscoreop.com>
org Subject
[Coin-discuss] problem with
CoinMessage while running CLP/CBC
06/13/07 10:43 PM
Please respond to
Discussions about
open source
software for
Operations
Research
<coin-discuss at lis
t.coin-or.org>
Hi,
I have been running into this problem with CLP/CBC where I get a SIGSEGV
that seems to happen
in the CoinMessage routines while solving an LP problem using CLP. I am
not sure exactly
what the problem is (CLP, CBC, my own code, other system software, or a
combination...), so
I thought this is the best list to post to, and hopefully I can get some
help with the problem.
My use of CLP/CBC is to provide an interface to these solver (eplex) in our
programming language
ECLiPSe: the user would specify their problem in our language, and can then
invoke CLP/CBC
(via the eplex interface) to solve their problem. As part of our testing
procedures, we run nightly
build and tests, and recently, some of these tests, associated with testing
the eplex interface,
would produce SIGSEGV.
The problem only occur on x86_64, running Linux, and only when we run
ECLiPSe embedded in
Java. Here is an example error log from Java:
# An unexpected error has been detected by HotSpot Virtual Machine:
#
# SIGSEGV (0xb) at pc=0x00002aaabbebf026, pid=28950, tid=47940664790736
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.5.0_07-b03 mixed mode)
# Problematic frame:
# C [seosiclpcbc.so+0x335026] _ZN14CoinOneMessageaSERKS_+0x1e
#
--------------- T H R E A D ---------------
Current thread (0x0000000040115c70): JavaThread "main" [_thread_in_native,
id=2
....
Stack: [0x00007fff9c959000,0x00007fff9cb59000), sp=0x00007fff9cb51ac0,
free sp
ace=2018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
C [seosiclpcbc.so+0x335026] _ZN14CoinOneMessageaSERKS_+0x1e
C [seosiclpcbc.so+0x3352ab]
_ZN18CoinMessageHandler7messageEiRK12CoinMessages+
0x6b
C [seosiclpcbc.so+0x1a17bb] _ZN11ClpPresolve9postsolveEb+0x88f
C [seosiclpcbc.so+0xdc70b] _ZN21OsiClpSolverInterface12initialSolveEv
+0x959
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j com.parctechnologies.eclipse.NativeEclipse.HandleEvents
(Ljava/lang/Integer;)I
+0
j com.parctechnologies.eclipse.EmbeddedEclipse.getNextControlSignal
(ZZ)Lcom/par
---
So the problem seems to be in CoinOneMessage.
I found that if I compile CBC with --enable-debug, then I can get an
assertion error instead:
java: ../../../CoinUtils/src/CoinMessageHandler.cpp:123:
CoinMessages::CoinMessages(const CoinMessages&): Assertion
`newAddress-temp<lengthMessages_' failed.
This problem only occur if I compile my code against relatively recent CBC
source: we use the
trunk branch of CBC, and I don't have this problem with a CBC trunk branch
I downloaded
5 Dec 2006, but I do get the problem with the next trunk branch I
downloaded on 5 March 2007.
I have also tried with a trunk branch I downloaded 11 June, and I still
have the same problem.
[thanks to John for fixing the problem with compiling the trunk branch CBC
last week!]
I cannot compile my code against the stable branch (missing functionality).
In our tests, we repeatedly run CLP/CBC to solve many test problems. The
SIGSEGV seem to
happen only occasionally, and not in the same place in the test, although
it seems to always
happen in CoinOneMessage. It seems to happen if we call CLP to solve an LP
problem
frequently enough: to test this, I ran a loop to repeatedly solve the same
LP problem (bell3a,
but I don't think it matters too much what problem I am solving), and the
crash happens
anywhere from after a few hundred solves, to after 10,000+ solves.
I am wondering if the problem is somehow connected with filling some
buffers -- the assertion
test seem to suggest some memory problem, and the way the output in our
tests is that they
are sent through a (Unix) pipe -- is there some buffer that may be full?
Unfortunately, I have not been able to reproduce the test without running
the problem embedded
in Java, but I am hoping that someone can give me some help on why this is
happening, or
suggest how I can find out more about the problem. Does the assertion
failure provide useful
information? I have been trying to track down this problem for several
weeks now, and for the
moment it is proving a blocker and I have to revert to using CBC source
from last year (which
does not seem to have this problem).
Thanks in advance for any help!
Yours sincerely,
Kish Shen
_______________________________________________
Coin-discuss mailing list
Coin-discuss at list.coin-or.org
http://list.coin-or.org/mailman/listinfo/coin-discuss
More information about the Coin-discuss
mailing list