[Ipopt] IPOPT performance (and impact of BLAS library)

Tue Sep 9 06:38:39 EDT 2014

If you're using IPOPT via a modelling system with good support for
algorithmic differentiation (such as our tool CasADi, available for
Python), you should be able to speed up function evaluation over
hand-written code (especially if you are generating C-code for
Jacobian/Hessian).

Also, if you use exact Hessian (default in CasADi), you might also see a
lot of speed up, thanks to fewer iterations, no logic for BFGS and more
sparse linear system.

Best regards,
Joel

2014-09-09 11:08 GMT+02:00 Jonathan Hogg <jonathan.hogg at stfc.ac.uk>:

>  As has been pointed out - your function evaluations are expensive.
> Of the 6.8 seconds wallclock in Ipopt below it breaks down as:
> 2.4 in function evaluations
> 1.5 in sparse linear factor
> 1.2 in sparse linear solve
> 1.7 elsewhere
>
> An observation is that you're spending almost as much time in the solve as
> in the factorization, and throwing more threads at that is unlikely to help
> as its constrained by memory throughput - we've found in the past that a
> single core is often capable of saturating the available bandwidth, and
> adding more doesn't get you much. You need to tackle the memory usage at a
> algorithmic level by fiddling with the ordering (try both amd and metis)
> and supernode amalgamation strategy (try doubling or halving nemin
> [ma97_nemin in ipopt naming I think]). If you're getting a lot of delayed
> pivots reported you can try fiddling with scaling strategies too.
>
> Solver-wise for small problems I'd expect ma27 to win on small problems
> and ma97/metis/threading to win on big ones. If you use the
> ma97_dump_matrix option to output .rb files I'm happy to take a quick look
> at a few (for a typical sized problem, go for an iteration at the start,
> middle and end of factorization) and advise on parameters that might help.
>
> Regards,
>
> Jonathan.
>
>
> On 08/09/14 22:24, Jon Herman wrote:
>
> I've copied below the timing output from one of the moderately sized
> examples I've looked at, using ma27. I haven't taken a look at these
> outputs before (thanks for the recommendation!), so I'll study this a
> little more, but any thoughts are welcome.
> This solves in 130 iterations (142 objective/constraint evaluations, 131
> gradient evaluations), so about 0.2 CPU seconds per iteration (this is
> running on 4 cores).
>
> Using metis ordering doesn't seem to significantly affect performance. I
> haven't tried using ma86 or ma97 with OpenMP enabled, I'll go and give that
> a shot.
>
> For Tony Kelman: what do you mean by "unless my function evaluations are
> implemented inefficiently"? At this point they are a minority of the
> run-time, so any efficiency there does not seem to be the problem? Or are
> you getting at something else?
>
> Thank you for the quick responses so far!
>
> Timing Statistics:
>
> OverallAlgorithm....................:     26.471 (sys:      0.922
> wall:      6.861)
>  PrintProblemStatistics.............:      0.001 (sys:      0.000
> wall:      0.000)
>  InitializeIterates.................:      0.175 (sys:      0.004
> wall:      0.062)
>  UpdateHessian......................:      0.467 (sys:      0.013
> wall:      0.120)
>  OutputIteration....................:      0.005 (sys:      0.001
> wall:      0.002)
>  UpdateBarrierParameter.............:      8.311 (sys:      0.309
> wall:      2.153)
>  ComputeSearchDirection.............:      6.042 (sys:      0.191
> wall:      1.557)
>  ComputeAcceptableTrialPoint........:      1.658 (sys:      0.059
> wall:      0.429)
>  AcceptTrialPoint...................:      1.943 (sys:      0.063
> wall:      0.501)
>  CheckConvergence...................:      7.860 (sys:      0.282
> wall:      2.034)
> PDSystemSolverTotal.................:     12.647 (sys:      0.417
> wall:      3.264)
>  PDSystemSolverSolveOnce............:     11.446 (sys:      0.378
> wall:      2.954)
>  ComputeResiduals...................:      0.997 (sys:      0.030
> wall:      0.257)
>  StdAugSystemSolverMultiSolve.......:     10.953 (sys:      0.379
> wall:      2.831)
>  LinearSystemScaling................:      0.000 (sys:      0.000
> wall:      0.000)
>  LinearSystemSymbolicFactorization..:      0.018 (sys:      0.000
> wall:      0.005)
>  LinearSystemFactorization..........:      5.611 (sys:      0.195
> wall:      1.451)
>  LinearSystemBackSolve..............:      4.692 (sys:      0.169
> wall:      1.215)
>  LinearSystemStructureConverter.....:      0.000 (sys:      0.000
> wall:      0.000)
>   LinearSystemStructureConverterInit:      0.000 (sys:      0.000
> wall:      0.000)
> QualityFunctionSearch...............:      1.581 (sys:      0.077
> wall:      0.414)
> TryCorrector........................:      0.000 (sys:      0.000
> wall:      0.000)
> Task1...............................:      0.363 (sys:      0.018
> wall:      0.096)
> Task2...............................:      0.567 (sys:      0.022
> wall:      0.147)
> Task3...............................:      0.076 (sys:      0.005
> wall:      0.020)
> Task4...............................:      0.000 (sys:      0.000
> wall:      0.000)
> Task5...............................:      0.507 (sys:      0.020
> wall:      0.132)
> Function Evaluations................:      9.348 (sys:      0.328
> wall:      2.417)
>  Objective function.................:      0.240 (sys:      0.009
> wall:      0.062)
>  Objective function gradient........:      4.316 (sys:      0.150
> wall:      1.116)
>  Equality constraints...............:      0.316 (sys:      0.012
> wall:      0.082)
>  Inequality constraints.............:      0.000 (sys:      0.000
> wall:      0.000)
>  Equality constraint Jacobian.......:      4.477 (sys:      0.157
> wall:      1.157)
>  Inequality constraint Jacobian.....:      0.000 (sys:      0.000
> wall:      0.000)
>  Lagrangian Hessian.................:      0.000 (sys:      0.000
> wall:      0.000)
>
>
>
> On 09/08/2014 03:02 PM, Greg Horn wrote:
>
> My usual answer to increasing efficiency is using HSL (ma86/ma97) with
> metis ordering and openmp. How expensive are your function evaluations?
> What is your normal time per iteration, and how many iterations does it
> take to solve? What sort of problem are you solving?
>
> On Mon, Sep 8, 2014 at 10:53 PM, Jon Herman <jon.herman at colorado.edu>
> wrote:
>
>>  Hello,
>>
>> I am working on implementing IPOPT in a piece of software that has a need
>> for very good performance. Unfortunately, it seems that right now my total
>> run-time is about 80% in IPOPT (that number excludes the function
>> evaluations, as well as any time setting up the problem, etc.). For me to
>> put IPOPT to good use, I'm hoping to make it run more efficiently, and even
>> out the workload between IPOPT and the function evaluations, preferably
>> shifting the work to the function evaluations as much as possible.
>>
>> Originally, I was using the BLAS/LAPACK that can be installed with IPOPT.
>> In an attempt to improve performance, I switched to OpenBLAS. To my
>> confusion, performance did not change at all. This is leading me to believe
>> that something other than the BLAS library is dominating the cost. (I am
>> certain I properly removed the old libraries when switching BLAS
>> implementation) I'm not sure how to effectively narrow down where IPOPT is
>> spending most of it's time, and how to subsequently improve that
>> performance.
>>
>> I've made sure to try the ma27, ma57, ma77, ma86, ma97, and mumps
>> solvers. Performance varies among them, but 80% of the time spent in IPOPT
>> is the best result I achieve (which is typically with ma27 or ma57, the
>> other solvers are closer to 90%). I've also made sure to try problems as
>> small as 500 variables and 400 constraints, to as large as 110 000
>> variables and 80 000 constraints (and many points in between those
>> extremes). Performance is very consistent across that range (for a given
>> solver), again regardless of the BLAS library being used. I've been doing
>> this using the quasi-Newton approximation for the Hessian, which I was
>> hoping to get away with, but I suppose this may put a lot of work into
>> IPOPT's side of the court. I'll also mention that I'm calling IPOPT through
>> the PyIPOPT module (though I'm expecting this to create only a small, fixed
>> overhead).
>>
>> If you have any thoughts on why IPOPT might be hogging such a large
>> fraction of my total run-time, and/or how I could improve this (or
>> determining if this might be entirely unavoidable), I would greatly
>> appreciate it! (and of course I'd be happy to provide additional
>> information if that would be useful)
>>
>> Best regards,
>>
>> Jon
>>
>> _______________________________________________
>> Ipopt mailing list
>> Ipopt at list.coin-or.org
>> http://list.coin-or.org/mailman/listinfo/ipopt
>>
>>
>
>
>
> _______________________________________________
> Ipopt mailing listIpopt at list.coin-or.orghttp://list.coin-or.org/mailman/listinfo/ipopt
>
>
>
> --
> Scanned by iCritical.
>
>
> _______________________________________________
> Ipopt mailing list
> Ipopt at list.coin-or.org
> http://list.coin-or.org/mailman/listinfo/ipopt
>
>

-- 
-- 
Joel Andersson, PhD
Ptge. Busquets 11-13, atico 3
E-08940 Cornella de Llobregat, Spain
Home: +34-93-6034011
Mobile: +32-486-672874 (Belgium) / +34-63-4408800 (Spain) / +46-707-360512
(Sweden)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.coin-or.org/pipermail/ipopt/attachments/20140909/f004b6c9/attachment-0001.html>