<div dir="ltr">If you're using IPOPT via a modelling system with good support for algorithmic differentiation (such as our tool CasADi, available for Python), you should be able to speed up function evaluation over hand-written code (especially if you are generating C-code for Jacobian/Hessian).<div><br></div><div>Also, if you use exact Hessian (default in CasADi), you might also see a lot of speed up, thanks to fewer iterations, no logic for BFGS and more sparse linear system.</div><div><br></div><div>Best regards,</div><div>Joel</div><div class="gmail_extra"><br><div class="gmail_quote">2014-09-09 11:08 GMT+02:00 Jonathan Hogg <span dir="ltr"><<a href="mailto:jonathan.hogg@stfc.ac.uk" target="_blank">jonathan.hogg@stfc.ac.uk</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>As has been pointed out - your function
evaluations are expensive.<br>
Of the 6.8 seconds wallclock in Ipopt below it breaks down as:<br>
2.4 in function evaluations<br>
1.5 in sparse linear factor<br>
1.2 in sparse linear solve<br>
1.7 elsewhere<br>
<br>
An observation is that you're spending almost as much time in the
solve as in the factorization, and throwing more threads at that
is unlikely to help as its constrained by memory throughput -
we've found in the past that a single core is often capable of
saturating the available bandwidth, and adding more doesn't get
you much. You need to tackle the memory usage at a algorithmic
level by fiddling with the ordering (try both amd and metis) and
supernode amalgamation strategy (try doubling or halving nemin
[ma97_nemin in ipopt naming I think]). If you're getting a lot of
delayed pivots reported you can try fiddling with scaling
strategies too.<br>
<br>
Solver-wise for small problems I'd expect ma27 to win on small
problems and ma97/metis/threading to win on big ones. If you use
the ma97_dump_matrix option to output .rb files I'm happy to take
a quick look at a few (for a typical sized problem, go for an
iteration at the start, middle and end of factorization) and
advise on parameters that might help.<br>
<br>
Regards,<br>
<br>
Jonathan.<div><div class="h5"><br>
<br>
On 08/09/14 22:24, Jon Herman wrote:<br>
</div></div></div><div><div class="h5">
<blockquote type="cite">
I've copied below the timing output from one of the moderately
sized examples I've looked at, using ma27. I haven't taken a look
at these outputs before (thanks for the recommendation!), so I'll
study this a little more, but any thoughts are welcome.<br>
This solves in 130 iterations (142 objective/constraint
evaluations, 131 gradient evaluations)<big>, so about 0.2 CPU
seconds per iteration (this is running on 4 cores)</big>.<br>
<br>
Using metis ordering doesn't seem to significantly affect
performance. I haven't tried using ma86 or ma97 with OpenMP
enabled, I'll go and give that a shot.<br>
<br>
For Tony Kelman: what do you mean by "unless my function
evaluations are implemented inefficiently"? At this point they are
a minority of the run-time, so any efficiency there does not seem
to be the problem? Or are you getting at something else?<br>
<br>
Thank you for the quick responses so far!<br>
<br>
Timing Statistics:<br>
<br>
OverallAlgorithm....................: 26.471 (sys: 0.922
wall: 6.861)<br>
PrintProblemStatistics.............: 0.001 (sys: 0.000
wall: 0.000)<br>
InitializeIterates.................: 0.175 (sys: 0.004
wall: 0.062)<br>
UpdateHessian......................: 0.467 (sys: 0.013
wall: 0.120)<br>
OutputIteration....................: 0.005 (sys: 0.001
wall: 0.002)<br>
UpdateBarrierParameter.............: 8.311 (sys: 0.309
wall: 2.153)<br>
ComputeSearchDirection.............: 6.042 (sys: 0.191
wall: 1.557)<br>
ComputeAcceptableTrialPoint........: 1.658 (sys: 0.059
wall: 0.429)<br>
AcceptTrialPoint...................: 1.943 (sys: 0.063
wall: 0.501)<br>
CheckConvergence...................: 7.860 (sys: 0.282
wall: 2.034)<br>
PDSystemSolverTotal.................: 12.647 (sys: 0.417
wall: 3.264)<br>
PDSystemSolverSolveOnce............: 11.446 (sys: 0.378
wall: 2.954)<br>
ComputeResiduals...................: 0.997 (sys: 0.030
wall: 0.257)<br>
StdAugSystemSolverMultiSolve.......: 10.953 (sys: 0.379
wall: 2.831)<br>
LinearSystemScaling................: 0.000 (sys: 0.000
wall: 0.000)<br>
LinearSystemSymbolicFactorization..: 0.018 (sys: 0.000
wall: 0.005)<br>
LinearSystemFactorization..........: 5.611 (sys: 0.195
wall: 1.451)<br>
LinearSystemBackSolve..............: 4.692 (sys: 0.169
wall: 1.215)<br>
LinearSystemStructureConverter.....: 0.000 (sys: 0.000
wall: 0.000)<br>
LinearSystemStructureConverterInit: 0.000 (sys: 0.000
wall: 0.000)<br>
QualityFunctionSearch...............: 1.581 (sys: 0.077
wall: 0.414)<br>
TryCorrector........................: 0.000 (sys: 0.000
wall: 0.000)<br>
Task1...............................: 0.363 (sys: 0.018
wall: 0.096)<br>
Task2...............................: 0.567 (sys: 0.022
wall: 0.147)<br>
Task3...............................: 0.076 (sys: 0.005
wall: 0.020)<br>
Task4...............................: 0.000 (sys: 0.000
wall: 0.000)<br>
Task5...............................: 0.507 (sys: 0.020
wall: 0.132)<br>
Function Evaluations................: 9.348 (sys: 0.328
wall: 2.417)<br>
Objective function.................: 0.240 (sys: 0.009
wall: 0.062)<br>
Objective function gradient........: 4.316 (sys: 0.150
wall: 1.116)<br>
Equality constraints...............: 0.316 (sys: 0.012
wall: 0.082)<br>
Inequality constraints.............: 0.000 (sys: 0.000
wall: 0.000)<br>
Equality constraint Jacobian.......: 4.477 (sys: 0.157
wall: 1.157)<br>
Inequality constraint Jacobian.....: 0.000 (sys: 0.000
wall: 0.000)<br>
Lagrangian Hessian.................: 0.000 (sys: 0.000
wall: 0.000)<br>
<br>
<br>
<br>
<div>On 09/08/2014 03:02 PM, Greg Horn
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">My usual answer to increasing efficiency is using
HSL (ma86/ma97) with metis ordering and openmp. How expensive
are your function evaluations? What is your normal time per
iteration, and how many iterations does it take to solve? What
sort of problem are you solving?</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Sep 8, 2014 at 10:53 PM, Jon
Herman <span dir="ltr"><<a href="mailto:jon.herman@colorado.edu" target="_blank">jon.herman@colorado.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Hello,<br>
<br>
I am working on implementing IPOPT in a piece of
software that has a need for very good performance.
Unfortunately, it seems that right now my total run-time
is about 80% in IPOPT (that number excludes the function
evaluations, as well as any time setting up the problem,
etc.). For me to put IPOPT to good use, I'm hoping to
make it run more efficiently, and even out the workload
between IPOPT and the function evaluations, preferably
shifting the work to the function evaluations as much as
possible.<br>
<br>
Originally, I was using the BLAS/LAPACK that can be
installed with IPOPT. In an attempt to improve
performance, I switched to OpenBLAS. To my confusion,
performance did not change at all. This is leading me to
believe that something other than the BLAS library is
dominating the cost. (I am certain I properly removed
the old libraries when switching BLAS implementation)
I'm not sure how to effectively narrow down where IPOPT
is spending most of it's time, and how to subsequently
improve that performance.<br>
<br>
I've made sure to try the ma27, ma57, ma77, ma86, ma97,
and mumps solvers. Performance varies among them, but
80% of the time spent in IPOPT is the best result I
achieve (which is typically with ma27 or ma57, the other
solvers are closer to 90%). I've also made sure to try
problems as small as 500 variables and 400 constraints,
to as large as 110 000 variables and 80 000 constraints
(and many points in between those extremes). Performance
is very consistent across that range (for a given
solver), again regardless of the BLAS library being
used. I've been doing this using the quasi-Newton
approximation for the Hessian, which I was hoping to get
away with, but I suppose this may put a lot of work into
IPOPT's side of the court. I'll also mention that I'm
calling IPOPT through the PyIPOPT module (though I'm
expecting this to create only a small, fixed overhead).
<br>
<br>
If you have any thoughts on why IPOPT might be hogging
such a large fraction of my total run-time, and/or how I
could improve this (or determining if this might be
entirely unavoidable), I would greatly appreciate it!
(and of course I'd be happy to provide additional
information if that would be useful)<br>
<br>
Best regards,<br>
<br>
Jon<br>
</div>
<br>
_______________________________________________<br>
Ipopt mailing list<br>
<a href="mailto:Ipopt@list.coin-or.org" target="_blank">Ipopt@list.coin-or.org</a><br>
<a href="http://list.coin-or.org/mailman/listinfo/ipopt" target="_blank">http://list.coin-or.org/mailman/listinfo/ipopt</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Ipopt mailing list
<a href="mailto:Ipopt@list.coin-or.org" target="_blank">Ipopt@list.coin-or.org</a>
<a href="http://list.coin-or.org/mailman/listinfo/ipopt" target="_blank">http://list.coin-or.org/mailman/listinfo/ipopt</a>
</pre>
</blockquote>
<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888"><p>--
<br>Scanned by iCritical.
</p><br>
</font></span></div>
<br>_______________________________________________<br>
Ipopt mailing list<br>
<a href="mailto:Ipopt@list.coin-or.org">Ipopt@list.coin-or.org</a><br>
<a href="http://list.coin-or.org/mailman/listinfo/ipopt" target="_blank">http://list.coin-or.org/mailman/listinfo/ipopt</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>-- <br>Joel Andersson, PhD<br>Ptge. Busquets 11-13, atico 3<br>E-08940 Cornella de Llobregat, Spain<br>Home: +34-93-6034011<br>Mobile: +32-486-672874 (Belgium) / +34-63-4408800 (Spain) / +46-707-360512 <br>(Sweden)<br><br>
</div></div>