[Clp] Possible bug in setObjCoeff

Aleksandr M. Kazachkov akazachk at cmu.edu
Fri Oct 6 02:12:41 EDT 2017


Hi John (and the rest of the mailing list),

I am having a hard time tracking down the bug exactly, as it disappears
after seemingly irrelevant changes to my code. I do have some more
information and a guess, and maybe, with this information, you (or someone
else) will be able to help point me in the right direction for further
debugging.

What I am seeing now is a memory error that occurs during postsolve, at
line 3041 of ClpPrimalColumnSteepest.cpp, which is
"alternateWeights_->clear();".

Specifically, in this part of the code, there are two ClpSimplex*
variables: model and model_. After the call to alternateWeights_->clear(),
model_ becomes NULL, while model remains as it was. As a result,
afterwards, the following piece of code (in ClpPrimalColumnSteepest.cpp)
causes a memory error:

3044 // Save size of factorization
3045 if (!model->factorization()->pivots())
3046   sizeFactorization_ = model_->factorization()->numberElements();

This is because the condition at line 3045 gets satisfied, but at 3046, the
reference is to model_, not model.

I am not clear on why the call to clear() affects model_, but here is what
I see in gdb:

(gdb) p alternateWeights_->nElements_
$158 = 16
(gdb) p alternateWeights_->capacity_
$159 = 481
(gdb) p *alternateWeights_->indices_ at 16
$160 = {979, 2, 573, 585, 703, 704, 582, 571, 586, 708, 581, 576, 588, 575,
589, 574}

We see that the first index in alternateWeights_ is 979. It is during the
call to clear() from ClpPrimalColumnSteepest.cpp:3041, when we are at line
50 of CoinIndexedVector.cpp ("elements_[i0]=0.0;"), that the pointer to
model_ gets nulled (where "int i0 = indices_[i]" = 979). It seems there is
an error here, in that indices_ is referring to the index of something
outside the size allocated for elements_.

The reason that alternateWeights_->indices_ contains 979 is in lines 2902
to 2904 of ClpPrimalColumnSteepest.cpp:

2902 // save pivot order
2903 CoinMemcpyN(pivotVariable,
2904     numberRows, alternateWeights_->getIndices());

Before that call, alternateWeights_->indices_ = {125, 70, 44, 78, 55, 7,
45, 29, 27, 34, 42, 56, 25, 31, 60, 0} (all within capacity_). What I
understand is that the pivotVariable in row 0 is 979, which is a valid
index; for this instance, # rows = 281, # cols = 699 (though only 223
columns are non-empty). However, this is not a valid index for the vector
alternateWeights->elements_, which I think is what causes the problem
during the clear() call. The reason that alternateWeights_->capacity_ = 481
is that this is # rows + model_->factorization()->maximumPivots()
(ClpPrimalColumnSteepest.cpp:2939).

If my analysis is incomplete, I would appreciate any pointers on what
further information I can provide / debugging I can pursue.

Thank you for your time,
Aleksandr Kazachkov

P.S. In terms of code to reproduce the behavior: I have tried making a
minimal working example by pulling the relevant parts out of my larger code
base, but the bug then disappears. I would be happy to give you access to
my code (containing my dissertation work), but I would understand if you
prefer not to parse someone else’s code.

P.P.S. Here is the abridged stack trace at the moment the error occurs:
#0  CoinIndexedVector::clear (this=0x1532610) at
../../../CoinUtils/src/CoinIndexedVector.cpp:50
#1  0x00000000008b7402 in ClpPrimalColumnSteepest::saveWeights
(this=0x109ff90, model=0x10a8580, mode=2) at
../../../Clp/src/ClpPrimalColumnSteepest.cpp:3041
#2  0x0000000000959372 in ClpSimplexPrimal::statusOfProblemInPrimal
(this=0x10a8580, lastCleaned=@0x7fffffff826c: 0, type=1,
progress=0x10a8af8, doFactorization=true, ifValuesPass=0, originalModel=0x0)
    at ../../../Clp/src/ClpSimplexPrimal.cpp:1636
#3  0x0000000000953fa3 in ClpSimplexPrimal::primal (this=0x10a8580,
ifValuesPass=0, startFinishOptions=0) at
../../../Clp/src/ClpSimplexPrimal.cpp:361
#4  0x00000000008dc423 in ClpSimplex::primal (this=0x10a8580,
ifValuesPass=1, startFinishOptions=0) at
../../../Clp/src/ClpSimplex.cpp:5971
#5  0x000000000070a3bb in OsiClpSolverInterface::resolve (this=0x1359510)
at ../../../../Clp/src/OsiClp/OsiClpSolverInterface.cpp:1056

On Tue, Oct 3, 2017 at 4:40 AM John Forrest <john.forrest at fastercoin.com>
wrote:

> Aleksandr,
>
> There may not be any bug in disableFactorization - it is used in
> CglRedSplit cuts.  It is just that it looks like the culprit.  For Gomory
> cuts, I used CoinFactorization directly as I thought that was cleaner.  I
> can find out the problem if I have code that reproduces it.
>
>
> John Forrest
>
>
> On 02/10/17 19:56, Aleksandr M. Kazachkov wrote:
>
> Hi John, thank you for the quick response. One follow-up, regarding
> disableFactorization: are you suggesting not to use this at all? I was
> under the impression that if I use enableFactorization to access getBInv
> and such, then I should call disableFactorization before making any changes
> to the model, resolving, etc. You say it is rarely used, so I wonder if my
> impression was wrong. I use disableFactorization in other parts of my code
> too, because I think I had run into "strange" behavior without it, but I
> may be misremembering.
>
> On Mon, Oct 2, 2017 at 2:19 PM John Forrest <john.forrest at fastercoin.com>
> wrote:
>
>> Aleksandr,
>>
>> I put
>>
>> modelPtr_->whatsChanged_ &= (0xffff&~64);
>>
>> into code anyway to make it match with other calls.
>>
>> I would think it was the disableFactorization that was the problem -
>> there could easily be a bug and it is rarely used.
>>
>> As to the second part of your original post,  I would think that normally
>> preprocessing it once would normally be faster.
>>
>> John Forrest
>>
>>
>> On 02/10/17 18:25, Aleksandr M. Kazachkov wrote:
>>
>> I am not 100% sure of what the error was, but I believe I have solved my
>> memory issue (at least valgrind says so), and maybe someone will see where
>> is/was the bug based on my fix. Please let me know if you have an idea, as
>> I would like to know to prevent future mistakes on my part, and it may also
>> help others.
>>
>> My hunch is that my mistake had to do with some interaction between
>> factorization and other parts of the code.
>>
>> My process was, when inputting / solving the problem:
>> 1. Set up a new row-ordered CoinPackedMatrix. I initially have
>> setDimensions(0, num_cols), where num_cols is the known total # of columns.
>> I also reserve space for the rows using the "reserve" method. I have an
>> estimate for the max number of rows and maxSize.
>> 2. Input rows one at a time into the matrix by "appendRow" (the reason
>> for this, instead of putting the matrix in all at once, is that the rows
>> will be sorted in a special order that is useful to me).
>> 3. "New" an OsiClpSolverInterface* instance and use "loadProblem" to load
>> the problem from the constructed matrix and lower and upper bounds on the
>> rows/columns.
>> 4. Call the method disableFactorization().
>> 5. Call the method "getModelPtr()->cleanMatrix()" to clean the matrix.
>> 6. With the 0 objective function still, call initialSolve() to check
>> feasibility.
>> 7. Next, set each objective coefficient, one at a time, to 1 (what I
>> actually do is set only the coefficients of the columns for which
>> getVectorSize is > 0, but the memory corruption happened as long as all the
>> coefficients were being set one at a time).
>> 8. Call resolve().
>> 9. Delete the solver we created and exit out.
>>
>> This process caused some memory problem. Any (i.e., just one) of the
>> following changes made valgrind happy:
>> 1. Do not call disableFactorization. That seems unnecessary anyway, as
>> factorization would be disabled by default, I think. This was probably left
>> from some earlier version of my code. Though I don't quite understand why
>> it would cause a problem.
>> 2. Make a call to getMatrixByRow() after step 5. (I have no idea why this
>> helps.)
>> 3. Replace step 7 by a call to setObjective(...) where we set up in
>> advance a non-sparse vector for inputting the objective coefficients.
>> 4. Replace step 8 by an initialSolve().
>> 5. Instead of step 5, call cleanMatrix() directly on the row-ordered
>> matrix before inputting it into the OsiClpSolverInterface instance. This
>> makes more sense, in any case.
>>
>> I would guess fixes #1 or #5 are the important ones, with regards to
>> understanding the problem.
>>
>> Again, if anyone has an idea on where in particular I went wrong, and/or
>> why it was wrong, please let me know.
>>
>> Thanks again, and sorry for the barrage of emails,
>> Aleksandr Kazachkov
>>
>> On Mon, Oct 2, 2017 at 11:16 AM Aleksandr M. Kazachkov <akazachk at cmu.edu>
>> wrote:
>>
>>> I apologize; I am not sure my report is a bug. In the case of changing a
>>> single objective coefficient (at a time), the proper modification to
>>> whatsChanged_ seems to be done in ClpSimplex (I had been looking at
>>> ClpModel). I am still getting a memory error, and I am trying to figure out
>>> how it happens.
>>>
>>> In case someone has any suggestions, below is the (abridged) valgrind
>>> output, which says that memory is being written to after it has been
>>> deleted. In particular, the issue appears to be with the call
>>> "alternateWeights_->clear();" at ClpPrimalColumnSteepest.cpp:3041, which
>>> seems to be accessing memory freed via a conditionalDelete() of
>>> "nextCount_" at CoinFactorization1.cpp:1734 (and 1735, for lastCount_). I
>>> am not sure how these arrays are connected.
>>>
>>> I would appreciate any advice, and thank you,
>>> Alex
>>>
>>> ==22103== 7 errors in context 1 of 2:
>>> ==22103== Invalid write of size 8
>>> ==22103==    at 0xA39E10: CoinIndexedVector::clear()
>>> (CoinIndexedVector.cpp:51)
>>> ==22103==    by 0x8B742D:
>>> ClpPrimalColumnSteepest::saveWeights(ClpSimplex*, int)
>>> (ClpPrimalColumnSteepest.cpp:3041)
>>> ==22103==    by 0x95939D:
>>> ClpSimplexPrimal::statusOfProblemInPrimal(int&, int, ClpSimplexProgress*,
>>> bool, int, ClpSimplex*) (ClpSimplexPrimal.cpp:1636)
>>> ==22103==    by 0x953FCE: ClpSimplexPrimal::primal(int, int)
>>> (ClpSimplexPrimal.cpp:361)
>>> ==22103==    by 0x8DC44E: ClpSimplex::primal(int, int)
>>> (ClpSimplex.cpp:5971)
>>> ==22103==    by 0x70A3E6: OsiClpSolverInterface::resolve()
>>> (OsiClpSolverInterface.cpp:1056)
>>> // abridged
>>> ==22103==  Address 0x784dbc8 is 744 bytes inside a block of size 2,248
>>> free'd
>>> ==22103==    at 0x4A07D8E: operator delete[](void*) (in
>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>> ==22103==    by 0xA42F12: CoinArrayWithLength::conditionalDelete()
>>> (CoinIndexedVector.cpp:1841)
>>> ==22103==    by 0x9E9CE2: CoinFactorization::cleanup()
>>> (CoinFactorization1.cpp:1734)
>>> ==22103==    by 0x9E7E63: CoinFactorization::factor()
>>> (CoinFactorization1.cpp:1184)
>>> ==22103==    by 0x8575AD: ClpFactorization::factorize(ClpSimplex*, int,
>>> bool) (ClpFactorization.cpp:2255)
>>> ==22103==    by 0x8C8254: ClpSimplex::internalFactorize(int)
>>> (ClpSimplex.cpp:1992)
>>> ==22103==    by 0x9554CF:
>>> ClpSimplexPrimal::statusOfProblemInPrimal(int&, int, ClpSimplexProgress*,
>>> bool, int, ClpSimplex*) (ClpSimplexPrimal.cpp:855)
>>> ==22103==    by 0x953FCE: ClpSimplexPrimal::primal(int, int)
>>> (ClpSimplexPrimal.cpp:361)
>>> ==22103==    by 0x8DC44E: ClpSimplex::primal(int, int)
>>> (ClpSimplex.cpp:5971)
>>> ==22103==    by 0x70A3E6: OsiClpSolverInterface::resolve()
>>> (OsiClpSolverInterface.cpp:1056)
>>>
>>> ==22103== 8 errors in context 2 of 2:
>>> ==22103== Invalid write of size 8
>>> ==22103==    at 0xA39DF3: CoinIndexedVector::clear()
>>> (CoinIndexedVector.cpp:50)
>>> ==22103==    by 0x8B742D:
>>> ClpPrimalColumnSteepest::saveWeights(ClpSimplex*, int)
>>> (ClpPrimalColumnSteepest.cpp:3041)
>>> ==22103==    by 0x95939D:
>>> ClpSimplexPrimal::statusOfProblemInPrimal(int&, int, ClpSimplexProgress*,
>>> bool, int, ClpSimplex*) (ClpSimplexPrimal.cpp:1636)
>>> ==22103==    by 0x953FCE: ClpSimplexPrimal::primal(int, int)
>>> (ClpSimplexPrimal.cpp:361)
>>> ==22103==    by 0x8DC44E: ClpSimplex::primal(int, int)
>>> (ClpSimplex.cpp:5971)
>>> ==22103==    by 0x70A3E6: OsiClpSolverInterface::resolve()
>>> (OsiClpSolverInterface.cpp:1056)
>>> // abridged
>>> ==22103==  Address 0x784e818 is 1,576 bytes inside a block of size 2,248
>>> free'd
>>> ==22103==    at 0x4A07D8E: operator delete[](void*) (in
>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>> ==22103==    by 0xA42F12: CoinArrayWithLength::conditionalDelete()
>>> (CoinIndexedVector.cpp:1841)
>>> ==22103==    by 0x9E9CF7: CoinFactorization::cleanup()
>>> (CoinFactorization1.cpp:1735)
>>> ==22103==    by 0x9E7E63: CoinFactorization::factor()
>>> (CoinFactorization1.cpp:1184)
>>> ==22103==    by 0x8575AD: ClpFactorization::factorize(ClpSimplex*, int,
>>> bool) (ClpFactorization.cpp:2255)
>>> ==22103==    by 0x8C8254: ClpSimplex::internalFactorize(int)
>>> (ClpSimplex.cpp:1992)
>>> ==22103==    by 0x9554CF:
>>> ClpSimplexPrimal::statusOfProblemInPrimal(int&, int, ClpSimplexProgress*,
>>> bool, int, ClpSimplex*) (ClpSimplexPrimal.cpp:855)
>>> ==22103==    by 0x953FCE: ClpSimplexPrimal::primal(int, int)
>>> (ClpSimplexPrimal.cpp:361)
>>> ==22103==    by 0x8DC44E: ClpSimplex::primal(int, int)
>>> (ClpSimplex.cpp:5971)
>>> ==22103==    by 0x70A3E6: OsiClpSolverInterface::resolve()
>>> (OsiClpSolverInterface.cpp:1056)
>>>
>>> On Mon, Oct 2, 2017 at 2:11 AM Aleksandr M. Kazachkov <akazachk at cmu.edu>
>>> wrote:
>>>
>>>> Hi all, I have a possible bug report, as well as a (related) question.
>>>>
>>>> 1. In OsiClpSolverInterface::setObjCoeff (when setting just one
>>>> coefficient), I think (unless I am misunderstanding something, in which
>>>> case I apologize) that line 6125
>>>>
>>>>   modelPtr_->whatsChanged_ &= 0xffff;
>>>>
>>>> should be
>>>>
>>>>   modelPtr_->whatsChanged_ &= (0xffff&~64);
>>>>
>>>> same as in OsiClpSolverInterface::setObjective, as the 64 bit
>>>> corresponds to "OBJECTIVE_SAME". This was (ultimately) causing a memory
>>>> corruption error for me after I would set the objective (coefficient by
>>>> coefficient, because my objective is sparse), resolve, then delete my
>>>> solver object.
>>>>
>>>> 2. I am working with an instance in n-dimensional space, but the
>>>> majority of these columns are empty. In my context, I will be solving the
>>>> instance repeatedly with different objective functions. The first solve is
>>>> an "initialSolve" and subsequent solves, unless some issue is encountered,
>>>> are "resolve" calls.
>>>>
>>>> Is it better (faster in the long run, given the multiple resolves) to
>>>> preprocess the instance in advance to have no empty columns, or is that a
>>>> waste of time? My first thought was that it would not make much difference
>>>> since internally the matrix is kept in sparse form and anyway presolve
>>>> would catch this, but I am not sure I am right.
>>>>
>>>> Thank you in advance for your input,
>>>> Aleksandr Kazachkov
>>>>
>>>
>>
>> _______________________________________________
>> Clp mailing listClp at list.coin-or.orghttps://urldefense.proofpoint.com/v2/url?u=https-3A__list.coin-2Dor.org_mailman_listinfo_clp&d=DwICAg&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=js2M0T-3OIMIVDvokcKjokJbk0F8QOCd0mT4FsVFE88&m=44uzzR183Kli2FgqxthADCaew--5xHJeS3nKJLYUVZI&s=8eJH_mllKWgOQUaXosOa-DyBp4vzagFhEkszZeSTGBA&e=
>>
>>
>> _______________________________________________
>> Clp mailing list
>> Clp at list.coin-or.org
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__list.coin-2Dor.org_mailman_listinfo_clp&d=DwICAg&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=S0ppFBpGWf1xOsmm_XdTdA&m=7zj5sW5Y0KTFAvNS6yD-HKZ67pNaJ4klIXuT4GwR_ms&s=qS69eiS-NyaJZsAvZYMIx8TuxS215bVfQ7Bb-RnK1ls&e=
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.coin-or.org/pipermail/clp/attachments/20171006/20926baa/attachment-0001.html>


More information about the Clp mailing list