[Clp] Possible bug in setObjCoeff

Fri Oct 6 06:12:00 EDT 2017

Aleksandr,

In primal steepest edge, some weights for basic variables need to be 
saved round a re-factorization.  As the basic variables will not pivot 
on same rows the row indices are changed to the model sequence numbers 
before factorization and then converted back after.

Somehow this was interrupted, so the indices were used to zero out 
vector.  This vector is normally fairly dense so being in the wrong 
state would not matter as the entire vector is zeroed. However in your 
case the number of entries must have been less than a third of the 
capacity so the indices were used.

What is the value of savedPivotSequence_ when the error occurs? If >= 0 
then I can use that to decide that the indices are incorrect.

I will also send you an updated ClpPrimalColumnSteepest.cpp which should 
also fix the problem - does it? If so, I will probably check that in.

John Forrest
On 06/10/17 07:12, Aleksandr M. Kazachkov wrote:
> Hi John (and the rest of the mailing list),
>
> I am having a hard time tracking down the bug exactly, as it 
> disappears after seemingly irrelevant changes to my code. I do have 
> some more information and a guess, and maybe, with this information, 
> you (or someone else) will be able to help point me in the right 
> direction for further debugging.
>
> What I am seeing now is a memory error that occurs during postsolve, 
> at line 3041 of ClpPrimalColumnSteepest.cpp, which is 
> "alternateWeights_->clear();".
>
> Specifically, in this part of the code, there are two ClpSimplex* 
> variables: model and model_. After the call to 
> alternateWeights_->clear(), model_ becomes NULL, while model remains 
> as it was. As a result, afterwards, the following piece of code (in 
> ClpPrimalColumnSteepest.cpp) causes a memory error:
>
> 3044 // Save size of factorization
> 3045 if (!model->factorization()->pivots())
> 3046   sizeFactorization_ = model_->factorization()->numberElements();
>
> This is because the condition at line 3045 gets satisfied, but at 
> 3046, the reference is to model_, not model.
>
> I am not clear on why the call to clear() affects model_, but here is 
> what I see in gdb:
>
> (gdb) p alternateWeights_->nElements_
> $158 = 16
> (gdb) p alternateWeights_->capacity_
> $159 = 481
> (gdb) p *alternateWeights_->indices_ at 16
> $160 = {979, 2, 573, 585, 703, 704, 582, 571, 586, 708, 581, 576, 588, 
> 575, 589, 574}
>
> We see that the first index in alternateWeights_ is 979. It is during 
> the call to clear() from ClpPrimalColumnSteepest.cpp:3041, when we are 
> at line 50 of CoinIndexedVector.cpp ("elements_[i0]=0.0;"), that the 
> pointer to model_ gets nulled (where "int i0 = indices_[i]" = 979). It 
> seems there is an error here, in that indices_ is referring to the 
> index of something outside the size allocated for elements_.
>
> The reason that alternateWeights_->indices_ contains 979 is in lines 
> 2902 to 2904 of ClpPrimalColumnSteepest.cpp:
>
> 2902 // save pivot order
> 2903 CoinMemcpyN(pivotVariable,
> 2904     numberRows, alternateWeights_->getIndices());
>
> Before that call, alternateWeights_->indices_ = {125, 70, 44, 78, 55, 
> 7, 45, 29, 27, 34, 42, 56, 25, 31, 60, 0} (all within capacity_). What 
> I understand is that the pivotVariable in row 0 is 979, which is a 
> valid index; for this instance, # rows = 281, # cols = 699 (though 
> only 223 columns are non-empty). However, this is not a valid index 
> for the vector alternateWeights->elements_, which I think is what 
> causes the problem during the clear() call. The reason that 
> alternateWeights_->capacity_ = 481 is that this is # rows + 
> model_->factorization()->maximumPivots() 
> (ClpPrimalColumnSteepest.cpp:2939).
>
> If my analysis is incomplete, I would appreciate any pointers on what 
> further information I can provide / debugging I can pursue.
>
> Thank you for your time,
> Aleksandr Kazachkov
>
> P.S. In terms of code to reproduce the behavior: I have tried making a 
> minimal working example by pulling the relevant parts out of my larger 
> code base, but the bug then disappears. I would be happy to give you 
> access to my code (containing my dissertation work), but I would 
> understand if you prefer not to parse someone else’s code.
>
> P.P.S. Here is the abridged stack trace at the moment the error occurs:
> #0  CoinIndexedVector::clear (this=0x1532610) at 
> ../../../CoinUtils/src/CoinIndexedVector.cpp:50
> #1  0x00000000008b7402 in ClpPrimalColumnSteepest::saveWeights 
> (this=0x109ff90, model=0x10a8580, mode=2) at 
> ../../../Clp/src/ClpPrimalColumnSteepest.cpp:3041
> #2  0x0000000000959372 in ClpSimplexPrimal::statusOfProblemInPrimal 
> (this=0x10a8580, lastCleaned=@0x7fffffff826c: 0, type=1, 
> progress=0x10a8af8, doFactorization=true, ifValuesPass=0, 
> originalModel=0x0)
>     at ../../../Clp/src/ClpSimplexPrimal.cpp:1636
> #3  0x0000000000953fa3 in ClpSimplexPrimal::primal (this=0x10a8580, 
> ifValuesPass=0, startFinishOptions=0) at 
> ../../../Clp/src/ClpSimplexPrimal.cpp:361
> #4  0x00000000008dc423 in ClpSimplex::primal (this=0x10a8580, 
> ifValuesPass=1, startFinishOptions=0) at 
> ../../../Clp/src/ClpSimplex.cpp:5971
> #5  0x000000000070a3bb in OsiClpSolverInterface::resolve 
> (this=0x1359510) at 
> ../../../../Clp/src/OsiClp/OsiClpSolverInterface.cpp:1056
>
> On Tue, Oct 3, 2017 at 4:40 AM John Forrest 
> <john.forrest at fastercoin.com <mailto:john.forrest at fastercoin.com>> wrote:
>
>     Aleksandr,
>
>     There may not be any bug in disableFactorization - it is used in
>     CglRedSplit cuts.  It is just that it looks like the culprit.  For
>     Gomory cuts, I used CoinFactorization directly as I thought that
>     was cleaner.  I can find out the problem if I have code that
>     reproduces it.
>
>
>     John Forrest
>
>
>     On 02/10/17 19:56, Aleksandr M. Kazachkov wrote:
>>     Hi John, thank you for the quick response. One follow-up,
>>     regarding disableFactorization: are you suggesting not to use
>>     this at all? I was under the impression that if I use
>>     enableFactorization to access getBInv and such, then I should
>>     call disableFactorization before making any changes to the model,
>>     resolving, etc. You say it is rarely used, so I wonder if my
>>     impression was wrong. I use disableFactorization in other parts
>>     of my code too, because I think I had run into "strange" behavior
>>     without it, but I may be misremembering.
>>
>>     On Mon, Oct 2, 2017 at 2:19 PM John Forrest
>>     <john.forrest at fastercoin.com
>>     <mailto:john.forrest at fastercoin.com>> wrote:
>>
>>         Aleksandr,
>>
>>         I put
>>
>>         modelPtr_->whatsChanged_ &= (0xffff&~64);
>>
>>         into code anyway to make it match with other calls.
>>
>>         I would think it was the disableFactorization that was the
>>         problem - there could easily be a bug and it is rarely used.
>>
>>         As to the second part of your original post,  I would think
>>         that normally preprocessing it once would normally be faster.
>>
>>         John Forrest
>>
>>
>>         On 02/10/17 18:25, Aleksandr M. Kazachkov wrote:
>>>         I am not 100% sure of what the error was, but I believe I
>>>         have solved my memory issue (at least valgrind says so), and
>>>         maybe someone will see where is/was the bug based on my fix.
>>>         Please let me know if you have an idea, as I would like to
>>>         know to prevent future mistakes on my part, and it may also
>>>         help others.
>>>
>>>         My hunch is that my mistake had to do with some interaction
>>>         between factorization and other parts of the code.
>>>
>>>         My process was, when inputting / solving the problem:
>>>         1. Set up a new row-ordered CoinPackedMatrix. I initially
>>>         have setDimensions(0, num_cols), where num_cols is the known
>>>         total # of columns. I also reserve space for the rows using
>>>         the "reserve" method. I have an estimate for the max number
>>>         of rows and maxSize.
>>>         2. Input rows one at a time into the matrix by "appendRow"
>>>         (the reason for this, instead of putting the matrix in all
>>>         at once, is that the rows will be sorted in a special order
>>>         that is useful to me).
>>>         3. "New" an OsiClpSolverInterface* instance and use
>>>         "loadProblem" to load the problem from the constructed
>>>         matrix and lower and upper bounds on the rows/columns.
>>>         4. Call the method disableFactorization().
>>>         5. Call the method "getModelPtr()->cleanMatrix()" to clean
>>>         the matrix.
>>>         6. With the 0 objective function still, call initialSolve()
>>>         to check feasibility.
>>>         7. Next, set each objective coefficient, one at a time, to 1
>>>         (what I actually do is set only the coefficients of the
>>>         columns for which getVectorSize is > 0, but the memory
>>>         corruption happened as long as all the coefficients were
>>>         being set one at a time).
>>>         8. Call resolve().
>>>         9. Delete the solver we created and exit out.
>>>
>>>         This process caused some memory problem. Any (i.e., just
>>>         one) of the following changes made valgrind happy:
>>>         1. Do not call disableFactorization. That seems unnecessary
>>>         anyway, as factorization would be disabled by default, I
>>>         think. This was probably left from some earlier version of
>>>         my code. Though I don't quite understand why it would cause
>>>         a problem.
>>>         2. Make a call to getMatrixByRow() after step 5. (I have no
>>>         idea why this helps.)
>>>         3. Replace step 7 by a call to setObjective(...) where we
>>>         set up in advance a non-sparse vector for inputting the
>>>         objective coefficients.
>>>         4. Replace step 8 by an initialSolve().
>>>         5. Instead of step 5, call cleanMatrix() directly on the
>>>         row-ordered matrix before inputting it into the
>>>         OsiClpSolverInterface instance. This makes more sense, in
>>>         any case.
>>>
>>>         I would guess fixes #1 or #5 are the important ones, with
>>>         regards to understanding the problem.
>>>
>>>         Again, if anyone has an idea on where in particular I went
>>>         wrong, and/or why it was wrong, please let me know.
>>>
>>>         Thanks again, and sorry for the barrage of emails,
>>>         Aleksandr Kazachkov
>>>
>>>         On Mon, Oct 2, 2017 at 11:16 AM Aleksandr M. Kazachkov
>>>         <akazachk at cmu.edu <mailto:akazachk at cmu.edu>> wrote:
>>>
>>>             I apologize; I am not sure my report is a bug. In the
>>>             case of changing a single objective coefficient (at a
>>>             time), the proper modification to whatsChanged_ seems to
>>>             be done in ClpSimplex (I had been looking at ClpModel).
>>>             I am still getting a memory error, and I am trying to
>>>             figure out how it happens.
>>>
>>>             In case someone has any suggestions, below is the
>>>             (abridged) valgrind output, which says that memory is
>>>             being written to after it has been deleted. In
>>>             particular, the issue appears to be with the call
>>>             "alternateWeights_->clear();" at
>>>             ClpPrimalColumnSteepest.cpp:3041, which seems to be
>>>             accessing memory freed via a conditionalDelete() of
>>>             "nextCount_" at CoinFactorization1.cpp:1734 (and 1735,
>>>             for lastCount_). I am not sure how these arrays are
>>>             connected.
>>>
>>>             I would appreciate any advice, and thank you,
>>>             Alex
>>>
>>>             ==22103== 7 errors in context 1 of 2:
>>>             ==22103== Invalid write of size 8
>>>             ==22103==    at 0xA39E10: CoinIndexedVector::clear()
>>>             (CoinIndexedVector.cpp:51)
>>>             ==22103==    by 0x8B742D:
>>>             ClpPrimalColumnSteepest::saveWeights(ClpSimplex*, int)
>>>             (ClpPrimalColumnSteepest.cpp:3041)
>>>             ==22103==    by 0x95939D:
>>>             ClpSimplexPrimal::statusOfProblemInPrimal(int&, int,
>>>             ClpSimplexProgress*, bool, int, ClpSimplex*)
>>>             (ClpSimplexPrimal.cpp:1636)
>>>             ==22103==    by 0x953FCE: ClpSimplexPrimal::primal(int,
>>>             int) (ClpSimplexPrimal.cpp:361)
>>>             ==22103==    by 0x8DC44E: ClpSimplex::primal(int, int)
>>>             (ClpSimplex.cpp:5971)
>>>             ==22103==    by 0x70A3E6:
>>>             OsiClpSolverInterface::resolve()
>>>             (OsiClpSolverInterface.cpp:1056)
>>>             // abridged
>>>             ==22103==  Address 0x784dbc8 is 744 bytes inside a block
>>>             of size 2,248 free'd
>>>             ==22103==    at 0x4A07D8E: operator delete[](void*) (in
>>>             /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>             ==22103==    by 0xA42F12:
>>>             CoinArrayWithLength::conditionalDelete()
>>>             (CoinIndexedVector.cpp:1841)
>>>             ==22103==    by 0x9E9CE2: CoinFactorization::cleanup()
>>>             (CoinFactorization1.cpp:1734)
>>>             ==22103==    by 0x9E7E63: CoinFactorization::factor()
>>>             (CoinFactorization1.cpp:1184)
>>>             ==22103==    by 0x8575AD:
>>>             ClpFactorization::factorize(ClpSimplex*, int, bool)
>>>             (ClpFactorization.cpp:2255)
>>>             ==22103==    by 0x8C8254:
>>>             ClpSimplex::internalFactorize(int) (ClpSimplex.cpp:1992)
>>>             ==22103==    by 0x9554CF:
>>>             ClpSimplexPrimal::statusOfProblemInPrimal(int&, int,
>>>             ClpSimplexProgress*, bool, int, ClpSimplex*)
>>>             (ClpSimplexPrimal.cpp:855)
>>>             ==22103==    by 0x953FCE: ClpSimplexPrimal::primal(int,
>>>             int) (ClpSimplexPrimal.cpp:361)
>>>             ==22103==    by 0x8DC44E: ClpSimplex::primal(int, int)
>>>             (ClpSimplex.cpp:5971)
>>>             ==22103==    by 0x70A3E6:
>>>             OsiClpSolverInterface::resolve()
>>>             (OsiClpSolverInterface.cpp:1056)
>>>
>>>             ==22103== 8 errors in context 2 of 2:
>>>             ==22103== Invalid write of size 8
>>>             ==22103==    at 0xA39DF3: CoinIndexedVector::clear()
>>>             (CoinIndexedVector.cpp:50)
>>>             ==22103==    by 0x8B742D:
>>>             ClpPrimalColumnSteepest::saveWeights(ClpSimplex*, int)
>>>             (ClpPrimalColumnSteepest.cpp:3041)
>>>             ==22103==    by 0x95939D:
>>>             ClpSimplexPrimal::statusOfProblemInPrimal(int&, int,
>>>             ClpSimplexProgress*, bool, int, ClpSimplex*)
>>>             (ClpSimplexPrimal.cpp:1636)
>>>             ==22103==    by 0x953FCE: ClpSimplexPrimal::primal(int,
>>>             int) (ClpSimplexPrimal.cpp:361)
>>>             ==22103==    by 0x8DC44E: ClpSimplex::primal(int, int)
>>>             (ClpSimplex.cpp:5971)
>>>             ==22103==    by 0x70A3E6:
>>>             OsiClpSolverInterface::resolve()
>>>             (OsiClpSolverInterface.cpp:1056)
>>>             // abridged
>>>             ==22103==  Address 0x784e818 is 1,576 bytes inside a
>>>             block of size 2,248 free'd
>>>             ==22103==    at 0x4A07D8E: operator delete[](void*) (in
>>>             /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>             ==22103==    by 0xA42F12:
>>>             CoinArrayWithLength::conditionalDelete()
>>>             (CoinIndexedVector.cpp:1841)
>>>             ==22103==    by 0x9E9CF7: CoinFactorization::cleanup()
>>>             (CoinFactorization1.cpp:1735)
>>>             ==22103==    by 0x9E7E63: CoinFactorization::factor()
>>>             (CoinFactorization1.cpp:1184)
>>>             ==22103==    by 0x8575AD:
>>>             ClpFactorization::factorize(ClpSimplex*, int, bool)
>>>             (ClpFactorization.cpp:2255)
>>>             ==22103==    by 0x8C8254:
>>>             ClpSimplex::internalFactorize(int) (ClpSimplex.cpp:1992)
>>>             ==22103==    by 0x9554CF:
>>>             ClpSimplexPrimal::statusOfProblemInPrimal(int&, int,
>>>             ClpSimplexProgress*, bool, int, ClpSimplex*)
>>>             (ClpSimplexPrimal.cpp:855)
>>>             ==22103==    by 0x953FCE: ClpSimplexPrimal::primal(int,
>>>             int) (ClpSimplexPrimal.cpp:361)
>>>             ==22103==    by 0x8DC44E: ClpSimplex::primal(int, int)
>>>             (ClpSimplex.cpp:5971)
>>>             ==22103==    by 0x70A3E6:
>>>             OsiClpSolverInterface::resolve()
>>>             (OsiClpSolverInterface.cpp:1056)
>>>
>>>             On Mon, Oct 2, 2017 at 2:11 AM Aleksandr M. Kazachkov
>>>             <akazachk at cmu.edu <mailto:akazachk at cmu.edu>> wrote:
>>>
>>>                 Hi all, I have a possible bug report, as well as a
>>>                 (related) question.
>>>
>>>                 1. In OsiClpSolverInterface::setObjCoeff (when
>>>                 setting just one coefficient), I think (unless I am
>>>                 misunderstanding something, in which case I
>>>                 apologize) that line 6125
>>>
>>>                 modelPtr_->whatsChanged_ &= 0xffff;
>>>
>>>                 should be
>>>
>>>                 modelPtr_->whatsChanged_ &= (0xffff&~64);
>>>
>>>                 same as in OsiClpSolverInterface::setObjective, as
>>>                 the 64 bit corresponds to "OBJECTIVE_SAME". This was
>>>                 (ultimately) causing a memory corruption error for
>>>                 me after I would set the objective (coefficient by
>>>                 coefficient, because my objective is sparse),
>>>                 resolve, then delete my solver object.
>>>
>>>                 2. I am working with an instance in n-dimensional
>>>                 space, but the majority of these columns are empty.
>>>                 In my context, I will be solving the instance
>>>                 repeatedly with different objective functions. The
>>>                 first solve is an "initialSolve" and subsequent
>>>                 solves, unless some issue is encountered, are
>>>                 "resolve" calls.
>>>
>>>                 Is it better (faster in the long run, given the
>>>                 multiple resolves) to preprocess the instance in
>>>                 advance to have no empty columns, or is that a waste
>>>                 of time? My first thought was that it would not make
>>>                 much difference since internally the matrix is kept
>>>                 in sparse form and anyway presolve would catch this,
>>>                 but I am not sure I am right.
>>>
>>>                 Thank you in advance for your input,
>>>                 Aleksandr Kazachkov
>>>
>>>
>>>
>>>         _______________________________________________
>>>         Clp mailing list
>>>         Clp at list.coin-or.org <mailto:Clp at list.coin-or.org>
>>>         https://urldefense.proofpoint.com/v2/url?u=https-3A__list.coin-2Dor.org_mailman_listinfo_clp&d=DwICAg&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=js2M0T-3OIMIVDvokcKjokJbk0F8QOCd0mT4FsVFE88&m=44uzzR183Kli2FgqxthADCaew--5xHJeS3nKJLYUVZI&s=8eJH_mllKWgOQUaXosOa-DyBp4vzagFhEkszZeSTGBA&e=  
>>
>>
>>         _______________________________________________
>>         Clp mailing list
>>         Clp at list.coin-or.org <mailto:Clp at list.coin-or.org>
>>         https://urldefense.proofpoint.com/v2/url?u=https-3A__list.coin-2Dor.org_mailman_listinfo_clp&d=DwICAg&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=S0ppFBpGWf1xOsmm_XdTdA&m=7zj5sW5Y0KTFAvNS6yD-HKZ67pNaJ4klIXuT4GwR_ms&s=qS69eiS-NyaJZsAvZYMIx8TuxS215bVfQ7Bb-RnK1ls&e=
>>
>
>
>
> _______________________________________________
> Clp mailing list
> Clp at list.coin-or.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__list.coin-2Dor.org_mailman_listinfo_clp&d=DwICAg&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=js2M0T-3OIMIVDvokcKjokJbk0F8QOCd0mT4FsVFE88&m=REb6dr9M4_AkavjZBds1JXEwx3jWXX1ku6wfqC2qGFA&s=xNyqghQrborScSimwjtzyBqZGS9yBm1jcEmDIgrmf1s&e=

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.coin-or.org/pipermail/clp/attachments/20171006/a7007322/attachment-0001.html>