<div dir="ltr">If you&#39;re using IPOPT via a modelling system with good support for algorithmic differentiation (such as our tool CasADi, available for Python), you should be able to speed up function evaluation over hand-written code (especially if you are generating C-code for Jacobian/Hessian).<div><br></div><div>Also, if you use exact Hessian (default in CasADi), you might also see a lot of speed up, thanks to fewer iterations, no logic for BFGS and more sparse linear system.</div><div><br></div><div>Best regards,</div><div>Joel</div><div class="gmail_extra"><br><div class="gmail_quote">2014-09-09 11:08 GMT+02:00 Jonathan Hogg <span dir="ltr">&lt;<a href="mailto:jonathan.hogg@stfc.ac.uk" target="_blank">jonathan.hogg@stfc.ac.uk</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    <div>As has been pointed out - your function

      evaluations are expensive.<br>

      Of the 6.8 seconds wallclock in Ipopt below it breaks down as:<br>

      2.4 in function evaluations<br>

      1.5 in sparse linear factor<br>

      1.2 in sparse linear solve<br>

      1.7 elsewhere<br>

      <br>

      An observation is that you&#39;re spending almost as much time in the

      solve as in the factorization, and throwing more threads at that

      is unlikely to help as its constrained by memory throughput -

      we&#39;ve found in the past that a single core is often capable of

      saturating the available bandwidth, and adding more doesn&#39;t get

      you much. You need to tackle the memory usage at a algorithmic

      level by fiddling with the ordering (try both amd and metis) and

      supernode amalgamation strategy (try doubling or halving nemin

      [ma97_nemin in ipopt naming I think]). If you&#39;re getting a lot of

      delayed pivots reported you can try fiddling with scaling

      strategies too.<br>

      <br>

      Solver-wise for small problems I&#39;d expect ma27 to win on small

      problems and ma97/metis/threading to win on big ones. If you use

      the ma97_dump_matrix option to output .rb files I&#39;m happy to take

      a quick look at a few (for a typical sized problem, go for an

      iteration at the start, middle and end of factorization) and

      advise on parameters that might help.<br>

      <br>

      Regards,<br>

      <br>

      Jonathan.<div><div class="h5"><br>

      <br>

      On 08/09/14 22:24, Jon Herman wrote:<br>

    </div></div></div><div><div class="h5">

    <blockquote type="cite">

      I&#39;ve copied below the timing output from one of the moderately

      sized examples I&#39;ve looked at, using ma27. I haven&#39;t taken a look

      at these outputs before (thanks for the recommendation!), so I&#39;ll

      study this a little more, but any thoughts are welcome.<br>

      This solves in 130 iterations (142 objective/constraint

      evaluations, 131 gradient evaluations)<big>, so about 0.2 CPU

        seconds per iteration (this is running on 4 cores)</big>.<br>

      <br>

      Using metis ordering doesn&#39;t seem to significantly affect

      performance. I haven&#39;t tried using ma86 or ma97 with OpenMP

      enabled, I&#39;ll go and give that a shot.<br>

      <br>

      For Tony Kelman: what do you mean by &quot;unless my function

      evaluations are implemented inefficiently&quot;? At this point they are

      a minority of the run-time, so any efficiency there does not seem

      to be the problem? Or are you getting at something else?<br>

      <br>

      Thank you for the quick responses so far!<br>

      <br>

      Timing Statistics:<br>

      <br>

      OverallAlgorithm....................:     26.471 (sys:      0.922

      wall:      6.861)<br>

       PrintProblemStatistics.............:      0.001 (sys:      0.000

      wall:      0.000)<br>

       InitializeIterates.................:      0.175 (sys:      0.004

      wall:      0.062)<br>

       UpdateHessian......................:      0.467 (sys:      0.013

      wall:      0.120)<br>

       OutputIteration....................:      0.005 (sys:      0.001

      wall:      0.002)<br>

       UpdateBarrierParameter.............:      8.311 (sys:      0.309

      wall:      2.153)<br>

       ComputeSearchDirection.............:      6.042 (sys:      0.191

      wall:      1.557)<br>

       ComputeAcceptableTrialPoint........:      1.658 (sys:      0.059

      wall:      0.429)<br>

       AcceptTrialPoint...................:      1.943 (sys:      0.063

      wall:      0.501)<br>

       CheckConvergence...................:      7.860 (sys:      0.282

      wall:      2.034)<br>

      PDSystemSolverTotal.................:     12.647 (sys:      0.417

      wall:      3.264)<br>

       PDSystemSolverSolveOnce............:     11.446 (sys:      0.378

      wall:      2.954)<br>

       ComputeResiduals...................:      0.997 (sys:      0.030

      wall:      0.257)<br>

       StdAugSystemSolverMultiSolve.......:     10.953 (sys:      0.379

      wall:      2.831)<br>

       LinearSystemScaling................:      0.000 (sys:      0.000

      wall:      0.000)<br>

       LinearSystemSymbolicFactorization..:      0.018 (sys:      0.000

      wall:      0.005)<br>

       LinearSystemFactorization..........:      5.611 (sys:      0.195

      wall:      1.451)<br>

       LinearSystemBackSolve..............:      4.692 (sys:      0.169

      wall:      1.215)<br>

       LinearSystemStructureConverter.....:      0.000 (sys:      0.000

      wall:      0.000)<br>

        LinearSystemStructureConverterInit:      0.000 (sys:      0.000

      wall:      0.000)<br>

      QualityFunctionSearch...............:      1.581 (sys:      0.077

      wall:      0.414)<br>

      TryCorrector........................:      0.000 (sys:      0.000

      wall:      0.000)<br>

      Task1...............................:      0.363 (sys:      0.018

      wall:      0.096)<br>

      Task2...............................:      0.567 (sys:      0.022

      wall:      0.147)<br>

      Task3...............................:      0.076 (sys:      0.005

      wall:      0.020)<br>

      Task4...............................:      0.000 (sys:      0.000

      wall:      0.000)<br>

      Task5...............................:      0.507 (sys:      0.020

      wall:      0.132)<br>

      Function Evaluations................:      9.348 (sys:      0.328

      wall:      2.417)<br>

       Objective function.................:      0.240 (sys:      0.009

      wall:      0.062)<br>

       Objective function gradient........:      4.316 (sys:      0.150

      wall:      1.116)<br>

       Equality constraints...............:      0.316 (sys:      0.012

      wall:      0.082)<br>

       Inequality constraints.............:      0.000 (sys:      0.000

      wall:      0.000)<br>

       Equality constraint Jacobian.......:      4.477 (sys:      0.157

      wall:      1.157)<br>

       Inequality constraint Jacobian.....:      0.000 (sys:      0.000

      wall:      0.000)<br>

       Lagrangian Hessian.................:      0.000 (sys:      0.000

      wall:      0.000)<br>

      <br>

      <br>

      <br>

      <div>On 09/08/2014 03:02 PM, Greg Horn

        wrote:<br>

      </div>

      <blockquote type="cite">

        <div dir="ltr">My usual answer to increasing efficiency is using

          HSL (ma86/ma97) with metis ordering and openmp. How expensive

          are your function evaluations? What is your normal time per

          iteration, and how many iterations does it take to solve? What

          sort of problem are you solving?</div>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Mon, Sep 8, 2014 at 10:53 PM, Jon

            Herman <span dir="ltr">&lt;<a href="mailto:jon.herman@colorado.edu" target="_blank">jon.herman@colorado.edu</a>&gt;</span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> Hello,<br>

                <br>

                I am working on implementing IPOPT in a piece of

                software that has a need for very good performance.

                Unfortunately, it seems that right now my total run-time

                is about 80% in IPOPT (that number excludes the function

                evaluations, as well as any time setting up the problem,

                etc.). For me to put IPOPT to good use, I&#39;m hoping to

                make it run more efficiently, and even out the workload

                between IPOPT and the function evaluations, preferably

                shifting the work to the function evaluations as much as

                possible.<br>

                <br>

                Originally, I was using the BLAS/LAPACK that can be

                installed with IPOPT. In an attempt to improve

                performance, I switched to OpenBLAS. To my confusion,

                performance did not change at all. This is leading me to

                believe that something other than the BLAS library is

                dominating the cost. (I am certain I properly removed

                the old libraries when switching BLAS implementation)

                I&#39;m not sure how to effectively narrow down where IPOPT

                is spending most of it&#39;s time, and how to subsequently

                improve that performance.<br>

                <br>

                I&#39;ve made sure to try the ma27, ma57, ma77, ma86, ma97,

                and mumps solvers. Performance varies among them, but

                80% of the time spent in IPOPT is the best result I

                achieve (which is typically with ma27 or ma57, the other

                solvers are closer to 90%). I&#39;ve also made sure to try

                problems as small as 500 variables and 400 constraints,

                to as large as 110 000 variables and 80 000 constraints

                (and many points in between those extremes). Performance

                is very consistent across that range (for a given

                solver), again regardless of the BLAS library being

                used. I&#39;ve been doing this using the quasi-Newton

                approximation for the Hessian, which I was hoping to get

                away with, but I suppose this may put a lot of work into

                IPOPT&#39;s side of the court. I&#39;ll also mention that I&#39;m

                calling IPOPT through the PyIPOPT module (though I&#39;m

                expecting this to create only a small, fixed overhead).

                <br>

                <br>

                If you have any thoughts on why IPOPT might be hogging

                such a large fraction of my total run-time, and/or how I

                could improve this (or determining if this might be

                entirely unavoidable), I would greatly appreciate it!

                (and of course I&#39;d be happy to provide additional

                information if that would be useful)<br>

                <br>

                Best regards,<br>

                <br>

                Jon<br>

              </div>

              <br>

              _______________________________________________<br>

              Ipopt mailing list<br>

              <a href="mailto:Ipopt@list.coin-or.org" target="_blank">Ipopt@list.coin-or.org</a><br>

              <a href="http://list.coin-or.org/mailman/listinfo/ipopt" target="_blank">http://list.coin-or.org/mailman/listinfo/ipopt</a><br>

              <br>

            </blockquote>

          </div>

          <br>

        </div>

      </blockquote>

      <br>

      <br>

      <fieldset></fieldset>

      <br>

      <pre>_______________________________________________

Ipopt mailing list

<a href="mailto:Ipopt@list.coin-or.org" target="_blank">Ipopt@list.coin-or.org</a>

<a href="http://list.coin-or.org/mailman/listinfo/ipopt" target="_blank">http://list.coin-or.org/mailman/listinfo/ipopt</a>

</pre>

    </blockquote>

    <br>

<br>

</div></div><span class="HOEnZb"><font color="#888888"><p>-- 

<br>Scanned by iCritical.

</p><br>

</font></span></div>

<br>_______________________________________________<br>

Ipopt mailing list<br>

<a href="mailto:Ipopt@list.coin-or.org">Ipopt@list.coin-or.org</a><br>

<a href="http://list.coin-or.org/mailman/listinfo/ipopt" target="_blank">http://list.coin-or.org/mailman/listinfo/ipopt</a><br>

<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>-- <br>Joel Andersson, PhD<br>Ptge. Busquets 11-13, atico 3<br>E-08940 Cornella de Llobregat, Spain<br>Home: +34-93-6034011<br>Mobile: +32-486-672874 (Belgium) / +34-63-4408800 (Spain) / +46-707-360512 <br>(Sweden)<br><br>

</div></div>