<div dir="ltr"><div>Hi John (and the rest of the mailing list),</div><div><br></div><div>I am having a hard time tracking down the bug exactly, as it disappears after seemingly irrelevant changes to my code. I do have some more information and a guess, and maybe, with this information, you (or someone else) will be able to help point me in the right direction for further debugging. </div><div><br></div><div>What I am seeing now is a memory error that occurs during postsolve, at line 3041 of ClpPrimalColumnSteepest.cpp, which is "alternateWeights_->clear();".</div><div><br></div><div>Specifically, in this part of the code, there are two ClpSimplex* variables: model and model_. After the call to alternateWeights_->clear(), model_ becomes NULL, while model remains as it was. As a result, afterwards, the following piece of code (in ClpPrimalColumnSteepest.cpp) causes a memory error:</div><div><br></div><div>3044 // Save size of factorization</div><div>3045 if (!model->factorization()->pivots())</div><div>3046   sizeFactorization_ = model_->factorization()->numberElements();</div><div><br></div><div>This is because the condition at line 3045 gets satisfied, but at 3046, the reference is to model_, not model.</div><div><br></div><div>I am not clear on why the call to clear() affects model_, but here is what I see in gdb:</div><div><br></div><div>(gdb) p alternateWeights_->nElements_</div><div>$158 = 16</div><div>(gdb) p alternateWeights_->capacity_</div><div>$159 = 481</div><div>(gdb) p *alternateWeights_->indices_@16</div><div>$160 = {979, 2, 573, 585, 703, 704, 582, 571, 586, 708, 581, 576, 588, 575, 589, 574}</div><div><br></div><div>We see that the first index in alternateWeights_ is 979. It is during the call to clear() from ClpPrimalColumnSteepest.cpp:3041, when we are at line 50 of CoinIndexedVector.cpp ("elements_[i0]=0.0;"), that the pointer to model_ gets nulled (where "int i0 = indices_[i]" = 979). It seems there is an error here, in that indices_ is referring to the index of something outside the size allocated for elements_.</div><div><br></div><div>The reason that alternateWeights_->indices_ contains 979 is in lines 2902 to 2904 of ClpPrimalColumnSteepest.cpp:</div><div><br></div><div>2902 // save pivot order</div><div>2903 CoinMemcpyN(pivotVariable,</div><div>2904     numberRows, alternateWeights_->getIndices());</div><div><br></div><div>Before that call, alternateWeights_->indices_ = {125, 70, 44, 78, 55, 7, 45, 29, 27, 34, 42, 56, 25, 31, 60, 0} (all within capacity_). What I understand is that the pivotVariable in row 0 is 979, which is a valid index; for this instance, # rows = 281, # cols = 699 (though only 223 columns are non-empty). However, this is not a valid index for the vector alternateWeights->elements_, which I think is what causes the problem during the clear() call. The reason that alternateWeights_->capacity_ = 481 is that this is # rows + model_->factorization()->maximumPivots() (ClpPrimalColumnSteepest.cpp:2939).</div><div><br></div><div>If my analysis is incomplete, I would appreciate any pointers on what further information I can provide / debugging I can pursue.</div><div><br></div><div>Thank you for your time,</div><div>Aleksandr Kazachkov</div><div><br></div><div>P.S. In terms of code to reproduce the behavior: I have tried making a minimal working example by pulling the relevant parts out of my larger code base, but the bug then disappears. I would be happy to give you access to my code (containing my dissertation work), but I would understand if you prefer not to parse someone else’s code.</div><div><br></div><div>P.P.S. Here is the abridged stack trace at the moment the error occurs:</div><div>#0  CoinIndexedVector::clear (this=0x1532610) at ../../../CoinUtils/src/CoinIndexedVector.cpp:50</div><div>#1  0x00000000008b7402 in ClpPrimalColumnSteepest::saveWeights (this=0x109ff90, model=0x10a8580, mode=2) at ../../../Clp/src/ClpPrimalColumnSteepest.cpp:3041</div><div>#2  0x0000000000959372 in ClpSimplexPrimal::statusOfProblemInPrimal (this=0x10a8580, lastCleaned=@0x7fffffff826c: 0, type=1, progress=0x10a8af8, doFactorization=true, ifValuesPass=0, originalModel=0x0)</div><div>    at ../../../Clp/src/ClpSimplexPrimal.cpp:1636</div><div>#3  0x0000000000953fa3 in ClpSimplexPrimal::primal (this=0x10a8580, ifValuesPass=0, startFinishOptions=0) at ../../../Clp/src/ClpSimplexPrimal.cpp:361</div><div>#4  0x00000000008dc423 in ClpSimplex::primal (this=0x10a8580, ifValuesPass=1, startFinishOptions=0) at ../../../Clp/src/ClpSimplex.cpp:5971</div><div>#5  0x000000000070a3bb in OsiClpSolverInterface::resolve (this=0x1359510) at ../../../../Clp/src/OsiClp/OsiClpSolverInterface.cpp:1056</div><br><div class="gmail_quote"><div dir="ltr">On Tue, Oct 3, 2017 at 4:40 AM John Forrest <<a href="mailto:john.forrest@fastercoin.com">john.forrest@fastercoin.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    <div class="m_1109080980157113429moz-cite-prefix">Aleksandr,<br>
      <br>
      There may not be any bug in disableFactorization - it is used in
      CglRedSplit cuts.  It is just that it looks like the culprit.  For
      Gomory cuts, I used CoinFactorization directly as I thought that
      was cleaner.  I can find out the problem if I have code that
      reproduces it.</div></div><div text="#000000" bgcolor="#FFFFFF"><div class="m_1109080980157113429moz-cite-prefix"><br>
      <br>
      John Forrest</div></div><div text="#000000" bgcolor="#FFFFFF"><div class="m_1109080980157113429moz-cite-prefix"><br>
      <br>
      On 02/10/17 19:56, Aleksandr M. Kazachkov wrote:<br>
    </div></div><div text="#000000" bgcolor="#FFFFFF">
    <blockquote type="cite">
      <div dir="ltr">
        <div>Hi John, thank you for the quick response. One follow-up,
          regarding disableFactorization: are you suggesting not to use
          this at all? I was under the impression that if I use
          enableFactorization to access getBInv and such, then I should
          call disableFactorization before making any changes to the
          model, resolving, etc. You say it is rarely used, so I wonder
          if my impression was wrong. I use disableFactorization in
          other parts of my code too, because I think I had run into
          "strange" behavior without it, but I may be misremembering.</div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr">On Mon, Oct 2, 2017 at 2:19 PM John Forrest <<a href="mailto:john.forrest@fastercoin.com" target="_blank">john.forrest@fastercoin.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div text="#000000" bgcolor="#FFFFFF">
            <div class="m_1109080980157113429m_-6758304210812810348moz-cite-prefix">Aleksandr,<br>
              <br>
              I put <br>
              <br>
              modelPtr_->whatsChanged_ &= (0xffff&~64);<br>
              <br>
              into code anyway to make it match with other calls.<br>
              <br>
              I would think it was the disableFactorization that was the
              problem - there could easily be a bug and it is rarely
              used.<br>
              <br>
              As to the second part of your original post,  I would
              think that normally preprocessing it once would normally
              be faster.<br>
              <br>
              John Forrest</div>
          </div>
          <div text="#000000" bgcolor="#FFFFFF">
            <div class="m_1109080980157113429m_-6758304210812810348moz-cite-prefix"><br>
              <br>
              On 02/10/17 18:25, Aleksandr M. Kazachkov wrote:<br>
            </div>
          </div>
          <div text="#000000" bgcolor="#FFFFFF">
            <blockquote type="cite">
              <div dir="ltr">I am not 100% sure of what the error was,
                but I believe I have solved my memory issue (at least
                valgrind says so), and maybe someone will see where
                is/was the bug based on my fix. Please let me know if
                you have an idea, as I would like to know to prevent
                future mistakes on my part, and it may also help
                others. 
                <div><br>
                </div>
                <div>My hunch is that my mistake had to do with some
                  interaction between factorization and other parts of
                  the code.
                  <div><br>
                  </div>
                  <div>My process was, when inputting / solving the
                    problem:</div>
                  <div>1. Set up a new row-ordered CoinPackedMatrix. I
                    initially have setDimensions(0, num_cols), where
                    num_cols is the known total # of columns. I also
                    reserve space for the rows using the "reserve"
                    method. I have an estimate for the max number of
                    rows and maxSize.</div>
                  <div>2. Input rows one at a time into the matrix by
                    "appendRow" (the reason for this, instead of putting
                    the matrix in all at once, is that the rows will be
                    sorted in a special order that is useful to me).</div>
                  <div>3. "New" an OsiClpSolverInterface* instance and
                    use "loadProblem" to load the problem from the
                    constructed matrix and lower and upper bounds on the
                    rows/columns.</div>
                  <div>4. Call the method disableFactorization().</div>
                  <div>5. Call the method
                    "getModelPtr()->cleanMatrix()" to clean the
                    matrix.</div>
                  <div>6. With the 0 objective function still, call
                    initialSolve() to check feasibility.</div>
                  <div>7. Next, set each objective coefficient, one at a
                    time, to 1 (what I actually do is set only the
                    coefficients of the columns for which getVectorSize
                    is > 0, but the memory corruption happened as
                    long as all the coefficients were being set one at a
                    time).</div>
                  <div>8. Call resolve().</div>
                  <div>9. Delete the solver we created and exit out.</div>
                  <div><br>
                  </div>
                </div>
                <div>This process caused some memory problem. Any (i.e.,
                  just one) of the following changes made valgrind
                  happy:</div>
                <div>1. Do not call disableFactorization. That seems
                  unnecessary anyway, as factorization would be disabled
                  by default, I think. This was probably left from some
                  earlier version of my code. Though I don't quite
                  understand why it would cause a problem.</div>
                <div>2. Make a call to getMatrixByRow() after step 5. (I
                  have no idea why this helps.)<br>
                </div>
                <div>3. Replace step 7 by a call to setObjective(...)
                  where we set up in advance a non-sparse vector for
                  inputting the objective coefficients.</div>
                <div>4. Replace step 8 by an initialSolve().</div>
                <div>5. Instead of step 5, call cleanMatrix() directly
                  on the row-ordered matrix before inputting it into the
                  OsiClpSolverInterface instance. This makes more sense,
                  in any case.</div>
                <div><br>
                </div>
                <div>I would guess fixes #1 or #5 are the important
                  ones, with regards to understanding the problem.</div>
                <div><br>
                </div>
                <div>Again, if anyone has an idea on where in particular
                  I went wrong, and/or why it was wrong, please let me
                  know.</div>
                <div><br>
                </div>
                <div>Thanks again, and sorry for the barrage of emails,</div>
                <div>Aleksandr Kazachkov</div>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr">On Mon, Oct 2, 2017 at 11:16 AM Aleksandr
                  M. Kazachkov <<a href="mailto:akazachk@cmu.edu" target="_blank">akazachk@cmu.edu</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div dir="ltr">I apologize; I am not sure my report is
                    a bug. In the case of changing a single objective
                    coefficient (at a time), the proper modification to
                    whatsChanged_ seems to be done in ClpSimplex (I had
                    been looking at ClpModel). I am still getting a
                    memory error, and I am trying to figure out how it
                    happens.
                    <div><br>
                    </div>
                    <div>In case someone has any suggestions, below is
                      the (abridged) valgrind output, which says that
                      memory is being written to after it has been
                      deleted. In particular, the issue appears to be
                      with the call "alternateWeights_->clear();" at
                      ClpPrimalColumnSteepest.cpp:3041, which seems to
                      be accessing memory freed via a
                      conditionalDelete() of "nextCount_" at
                      CoinFactorization1.cpp:1734 (and 1735, for
                      lastCount_). I am not sure how these arrays are
                      connected. </div>
                    <div><br>
                    </div>
                    <div>I would appreciate any advice, and thank you,</div>
                    <div>Alex</div>
                    <div><br>
                    </div>
                    <div>
                      <div>==22103== 7 errors in context 1 of 2:</div>
                      <div>==22103== Invalid write of size 8</div>
                      <div>==22103==    at 0xA39E10:
                        CoinIndexedVector::clear()
                        (CoinIndexedVector.cpp:51)</div>
                      <div>==22103==    by 0x8B742D:
                        ClpPrimalColumnSteepest::saveWeights(ClpSimplex*,
                        int) (ClpPrimalColumnSteepest.cpp:3041)</div>
                      <div>==22103==    by 0x95939D:
                        ClpSimplexPrimal::statusOfProblemInPrimal(int&,
                        int, ClpSimplexProgress*, bool, int,
                        ClpSimplex*) (ClpSimplexPrimal.cpp:1636)</div>
                      <div>==22103==    by 0x953FCE:
                        ClpSimplexPrimal::primal(int, int)
                        (ClpSimplexPrimal.cpp:361)</div>
                      <div>==22103==    by 0x8DC44E:
                        ClpSimplex::primal(int, int)
                        (ClpSimplex.cpp:5971)</div>
                      <div>==22103==    by 0x70A3E6:
                        OsiClpSolverInterface::resolve()
                        (OsiClpSolverInterface.cpp:1056)</div>
                      <div>// abridged</div>
                      <div>==22103==  Address 0x784dbc8 is 744 bytes
                        inside a block of size 2,248 free'd</div>
                      <div>==22103==    at 0x4A07D8E: operator
                        delete[](void*) (in
                        /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)</div>
                      <div>==22103==    by 0xA42F12:
                        CoinArrayWithLength::conditionalDelete()
                        (CoinIndexedVector.cpp:1841)</div>
                      <div>==22103==    by 0x9E9CE2:
                        CoinFactorization::cleanup()
                        (CoinFactorization1.cpp:1734)</div>
                      <div>==22103==    by 0x9E7E63:
                        CoinFactorization::factor()
                        (CoinFactorization1.cpp:1184)</div>
                      <div>==22103==    by 0x8575AD:
                        ClpFactorization::factorize(ClpSimplex*, int,
                        bool) (ClpFactorization.cpp:2255)</div>
                      <div>==22103==    by 0x8C8254:
                        ClpSimplex::internalFactorize(int)
                        (ClpSimplex.cpp:1992)</div>
                      <div>==22103==    by 0x9554CF:
                        ClpSimplexPrimal::statusOfProblemInPrimal(int&,
                        int, ClpSimplexProgress*, bool, int,
                        ClpSimplex*) (ClpSimplexPrimal.cpp:855)</div>
                      <div>==22103==    by 0x953FCE:
                        ClpSimplexPrimal::primal(int, int)
                        (ClpSimplexPrimal.cpp:361)</div>
                      <div>==22103==    by 0x8DC44E:
                        ClpSimplex::primal(int, int)
                        (ClpSimplex.cpp:5971)</div>
                      <div>==22103==    by 0x70A3E6:
                        OsiClpSolverInterface::resolve()
                        (OsiClpSolverInterface.cpp:1056)</div>
                    </div>
                    <div><br>
                    </div>
                    <div>
                      <div>==22103== 8 errors in context 2 of 2:</div>
                      <div>==22103== Invalid write of size 8</div>
                      <div>==22103==    at 0xA39DF3:
                        CoinIndexedVector::clear()
                        (CoinIndexedVector.cpp:50)</div>
                      <div>==22103==    by 0x8B742D:
                        ClpPrimalColumnSteepest::saveWeights(ClpSimplex*,
                        int) (ClpPrimalColumnSteepest.cpp:3041)</div>
                      <div>==22103==    by 0x95939D:
                        ClpSimplexPrimal::statusOfProblemInPrimal(int&,
                        int, ClpSimplexProgress*, bool, int,
                        ClpSimplex*) (ClpSimplexPrimal.cpp:1636)</div>
                      <div>==22103==    by 0x953FCE:
                        ClpSimplexPrimal::primal(int, int)
                        (ClpSimplexPrimal.cpp:361)</div>
                      <div>==22103==    by 0x8DC44E:
                        ClpSimplex::primal(int, int)
                        (ClpSimplex.cpp:5971)</div>
                      <div>==22103==    by 0x70A3E6:
                        OsiClpSolverInterface::resolve()
                        (OsiClpSolverInterface.cpp:1056)</div>
                      <div>// abridged</div>
                      <div>==22103==  Address 0x784e818 is 1,576 bytes
                        inside a block of size 2,248 free'd</div>
                      <div>==22103==    at 0x4A07D8E: operator
                        delete[](void*) (in
                        /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)</div>
                      <div>==22103==    by 0xA42F12:
                        CoinArrayWithLength::conditionalDelete()
                        (CoinIndexedVector.cpp:1841)</div>
                      <div>==22103==    by 0x9E9CF7:
                        CoinFactorization::cleanup()
                        (CoinFactorization1.cpp:1735)</div>
                      <div>==22103==    by 0x9E7E63:
                        CoinFactorization::factor()
                        (CoinFactorization1.cpp:1184)</div>
                      <div>==22103==    by 0x8575AD:
                        ClpFactorization::factorize(ClpSimplex*, int,
                        bool) (ClpFactorization.cpp:2255)</div>
                      <div>==22103==    by 0x8C8254:
                        ClpSimplex::internalFactorize(int)
                        (ClpSimplex.cpp:1992)</div>
                      <div>==22103==    by 0x9554CF:
                        ClpSimplexPrimal::statusOfProblemInPrimal(int&,
                        int, ClpSimplexProgress*, bool, int,
                        ClpSimplex*) (ClpSimplexPrimal.cpp:855)</div>
                      <div>==22103==    by 0x953FCE:
                        ClpSimplexPrimal::primal(int, int)
                        (ClpSimplexPrimal.cpp:361)</div>
                      <div>==22103==    by 0x8DC44E:
                        ClpSimplex::primal(int, int)
                        (ClpSimplex.cpp:5971)</div>
                      <div>==22103==    by 0x70A3E6:
                        OsiClpSolverInterface::resolve()
                        (OsiClpSolverInterface.cpp:1056)</div>
                    </div>
                  </div>
                  <div dir="ltr">
                    <div>
                      <div><br>
                        <div class="gmail_quote">
                          <div dir="ltr">On Mon, Oct 2, 2017 at 2:11 AM
                            Aleksandr M. Kazachkov <<a href="mailto:akazachk@cmu.edu" target="_blank">akazachk@cmu.edu</a>>
                            wrote:<br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                            <div dir="ltr">Hi all, I have a possible bug
                              report, as well as a (related) question.
                              <div><br>
                              </div>
                              <div>1. In
                                OsiClpSolverInterface::setObjCoeff (when
                                setting just one coefficient), I think
                                (unless I am misunderstanding something,
                                in which case I apologize) that line
                                6125</div>
                              <div><br>
                              </div>
                              <div>  modelPtr_->whatsChanged_ &=
                                0xffff;</div>
                              <div><br>
                              </div>
                              <div>should be</div>
                              <div><br>
                              </div>
                              <div>  modelPtr_->whatsChanged_ &=
                                (0xffff&~64);<br>
                              </div>
                              <div><br>
                              </div>
                              <div>same as in
                                OsiClpSolverInterface::setObjective, as
                                the 64 bit corresponds to
                                "OBJECTIVE_SAME". This was (ultimately)
                                causing a memory corruption error for me
                                after I would set the objective
                                (coefficient by coefficient, because my
                                objective is sparse), resolve, then
                                delete my solver object.</div>
                              <div><br>
                              </div>
                              <div>2. I am working with an instance in
                                n-dimensional space, but the majority of
                                these columns are empty. In my context,
                                I will be solving the instance
                                repeatedly with different objective
                                functions. The first solve is an
                                "initialSolve" and subsequent solves,
                                unless some issue is encountered, are
                                "resolve" calls.</div>
                              <div><br>
                              </div>
                              <div>Is it better (faster in the long run,
                                given the multiple resolves) to
                                preprocess the instance in advance to
                                have no empty columns, or is that a
                                waste of time? My first thought was that
                                it would not make much difference since
                                internally the matrix is kept in sparse
                                form and anyway presolve would catch
                                this, but I am not sure I am right.</div>
                              <div><br>
                              </div>
                              <div>Thank you in advance for your input,</div>
                              <div>Aleksandr Kazachkov</div>
                            </div>
                          </blockquote>
                        </div>
                      </div>
                    </div>
                  </div>
                </blockquote>
              </div>
              <br>
              <fieldset class="m_1109080980157113429m_-6758304210812810348mimeAttachmentHeader"></fieldset>
              <br>
            </blockquote>
          </div>
          <div text="#000000" bgcolor="#FFFFFF">
            <blockquote type="cite">
              <pre>_______________________________________________
Clp mailing list
<a class="m_1109080980157113429m_-6758304210812810348moz-txt-link-abbreviated" href="mailto:Clp@list.coin-or.org" target="_blank">Clp@list.coin-or.org</a>
<a class="m_1109080980157113429m_-6758304210812810348moz-txt-link-freetext" href="https://urldefense.proofpoint.com/v2/url?u=https-3A__list.coin-2Dor.org_mailman_listinfo_clp&d=DwICAg&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=js2M0T-3OIMIVDvokcKjokJbk0F8QOCd0mT4FsVFE88&m=44uzzR183Kli2FgqxthADCaew--5xHJeS3nKJLYUVZI&s=8eJH_mllKWgOQUaXosOa-DyBp4vzagFhEkszZeSTGBA&e=" target="_blank">https://urldefense.proofpoint.com/v2/url?u=https-3A__list.coin-2Dor.org_mailman_listinfo_clp&d=DwICAg&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=js2M0T-3OIMIVDvokcKjokJbk0F8QOCd0mT4FsVFE88&m=44uzzR183Kli2FgqxthADCaew--5xHJeS3nKJLYUVZI&s=8eJH_mllKWgOQUaXosOa-DyBp4vzagFhEkszZeSTGBA&e=</a> 
</pre>
            </blockquote>
            <p><br>
            </p>
          </div>
          _______________________________________________<br>
          Clp mailing list<br>
          <a href="mailto:Clp@list.coin-or.org" target="_blank">Clp@list.coin-or.org</a><br>
          <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__list.coin-2Dor.org_mailman_listinfo_clp&d=DwICAg&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=S0ppFBpGWf1xOsmm_XdTdA&m=7zj5sW5Y0KTFAvNS6yD-HKZ67pNaJ4klIXuT4GwR_ms&s=qS69eiS-NyaJZsAvZYMIx8TuxS215bVfQ7Bb-RnK1ls&e=" rel="noreferrer" target="_blank">https://urldefense.proofpoint.com/v2/url?u=https-3A__list.coin-2Dor.org_mailman_listinfo_clp&d=DwICAg&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=S0ppFBpGWf1xOsmm_XdTdA&m=7zj5sW5Y0KTFAvNS6yD-HKZ67pNaJ4klIXuT4GwR_ms&s=qS69eiS-NyaJZsAvZYMIx8TuxS215bVfQ7Bb-RnK1ls&e=</a><br>
        </blockquote>
      </div>
    </blockquote>
    <p><br>
    </p>
  </div></blockquote></div></div>