[Clp] question about performance on a strange problem

Wed Dec 12 08:04:35 EST 2018

John, 
  Thank you for this. I think that I linked MKL in my build, however I didn't see much (if any) improvements (which may mean that I have not linked things properly, will check in more details).

  I also ran a VS profiler session (on a much smaller version of the problem, actual solver call passes in a minute or something) in order to get an idea on where most of the time is spent and I got the following:  1. I ignore the time it takes to read the lp file  
  2. Most calculation time is (as expected) spent in ClpSimplexDual::gutsOfDual method and is split into 2 of the calls - ClpSimplexDual::whileIterating (45%) and ClpSimplexDual::statusOfProblemInDual (27%)  
  3. ClpSimplexDual::whileIterating spends majority of the time in ClpDualRowSteepest::updateWeights (36%) -> ClpFactorization::updateTwoColumnsFT (35%) -> CoinFactorization::updateColumnU (34%) -> CoinFactorization::updateColumnUSparsish (15%) + CoinFactorization::updateColumnUDensish(15%)  
4. ClpSimplexDual::statusOfProblemInDual spends the majority of the time in ClpFactorization::insternalFactorize (25%) -> ClpFactorization::factorSparseSmall (21%) -> CoinFactorization::pivotColumnSingleton (21%)  
  Does this potentially mean that I have not linked up MKL properly or maybe I am not calling the solver in the right way (or maybe, nothing can be done for this particular problem and that's how the calculation is expected to pass) ?
  Thank you very much, my understanding of the library has improved a lot thanks to your comments so far !  

  Best,  Ivo 

 >-------- Оригинално писмо --------

 >От: John Forrest john.forrest at fastercoin.com

 >Относно: Re: [Clp] question about performance on a strange problem

 >До: Ivo Stefanov  

 >Изпратено на: 11.12.2018 13:12

   Ivo,

   If you define COIN_FACTORIZATION_DENSE_CODE=1 in your VisualStudio build and tell it about MKL that should give you a good dense factorization.
 I am not sure whether you would get much extra by going parallel - I just tried with mkl_sequential.

   Also try -barrier for your solve.
 It may be faster, but even if slower that does go parallel well.
 To use MKL there you have to use Clp trunk and use the Pardiso code in MKL.
 I have not tried getting that working on Windows.

   John Forrest

   On 09/12/2018 18:15, Ivo Stefanov wrote:

    Hi John, 

     Thank you for your comments 
     .

     After some help from

     J 
     ulian Hall it turned out that my main problem was the presolve - it somehow failed and was followed by primal instead of dual simplex, which seems to be quite slower for my particular problem. Now that this is fixed it seems to be going much faster (still going in hours for the 10 000 size problem, but it's at least tractable), however you raise another very interesting point.

     I am actually not seeing the mentioned log, so my initial guess is that I should be able to speed things up even more. However, the way I am using the solver (and the only way I am at least partially familiar with it) is like that: 

     1. I have my main project (which is related mostly to the actual generation of the "dense block") in a visual studio project 

     2. I have built the .lib files for the solver from the existing visual studio projects and have linked them to my main project 

     3. I am using the c++ api in the following way: 

      ClpSimplex model;

      // do some stuff related to loading the model from the raw arrays I have generated - loadProblem, etc.

      ClpSolve options;

      options.setSolveType(ClpSolve::SolveType::useDual);

      options.setPresolveType(ClpSolve::PresolveType::presolveOff);

     model.initialSolve(options);

    Could you please give me some additional details on the following:

    1. What changes should I be doing to a visual studio build to replicate the --disable-blas and use_openblas behavior and rebuild the lib files? I can use a 32+ core machine for my experiments, so it should be possible to significantly speed things up if I am able to get this going.

    2. Is it possible to use any external implementation for those - for example, I am pretty sure I can get access to Intel's MKL for example, that should be pretty solid in terms of performance. Is that compatible with CLP or the trick works with openblas only ?

    On the other question - I do need the exact solution and I can definitely build a smaller version, however if you can give me tips on the above I prefer to not waste your time additionally and first try to implement the openblas setup and see how it works and maybe after that come back with more information on the problem.

    Thank you very much for your time!

    Best,

    Ivo

 >-------- Оригинално писмо -------- 

 >От: John Forrest 
     john.forrest at fastercoin.com  

 >Относно: Re: [Clp] question about performance on a strange problem 

 >До: 
     clp at list.coin-or.org  

 >Изпратено на: 04.12.2018 16:03 

      Ivo, 

      It may not be possible to do much, but here are a few suggestions. 

      First make sure you have dense factorization switched on.
 On a typical problem run for a short while with -loglevel 3.
 You should see output like - 

      length of U 24097, length of L 23279 plus 92416 from 304 dense rows 

      If not then make sure you get Lapack and Blas. 

      You can do better using openblas - add --disable-blas to configure and add -DCLP_USE_OPENBLAS=1 to CXXDEFS (If dense part very large might even be better to do DCLP_USE_OPENBLAS=4 to use 4 cpus). 

If you can send me a smaller version which shows some of the same characteristics, I can look at it and see if there are other approaches.
 Do you need an exact solution?  

John Forrest

      On 02/12/2018 22:08, Ivo Stefanov wrote: 

      Hello, 

       I am trying to use Clp to solve a linear problem, but it seems to take too long (I have not actually received a solution in several days). 

       The problem is not too big (~30 000 variables and about the same number of constraints), but it has a fairly large dense square block of about 10 000 x 10 000 variables. 

       Because of this block the thing takes huge amount of memory, but that's manageable for now - the problem is that I don't seem to be getting anywhere near a solution. 

       I have tried both the primal() and dual() algorithms with no success. 

        I know it is hard to tell anything without the exact problem being available, but since the .lp file is 700mb+ I do not have it uploaded anywhere at the moment (and it takes quite a lot time to load anyways). 

        What I have noticed so far while working with this problem is that the performance of the dual algorithm gets worse with the increase of the size of the dense block, however not always in the same way.

        Certain types of input data may increase the block in terms of rows (number of constraints) and others - in terms of columns (number of variables). The increase in columns seems to be the more problematic part (for example, 10 000 x 500 is fairly trivial, while 10 000 x 10 000 is impossible so far; on the other hand, 50 000 x 500 is still solve-able in a very reasonable timeframe - faster than, for example, 10 000 x 2500). 

       I am wondering which one of the 2 options is more likely to be correct: 

       1. I am using the solver improperly and there is a way to actually have this passing much faster. 

       2. I should focus on ways to reformulate the problem (maybe find a way to break the dense block into something more sparse?) rather than trying to fine-tune the settings of the solver. 

       Any tips on the solver usage for such a problem will be greatly appreciated. 

       Thank you very much! 

      _______________________________________________
Clp mailing list
 Clp at list.coin-or.org 
 https://list.coin-or.org/mailman/listinfo/clp 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.coin-or.org/pipermail/clp/attachments/20181212/7bcd422b/attachment.html>