<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    The problem I'm working with is very sparse, so your suggestion was
    very applicable. I re-compiled OpenBLAS to be single-threaded (and
    then IPOPT with OpenMP). Performance remains identical,
    unfortunately.<br>
    <br>
    To clarify: I see the software I'm writing as having two distinct
    parts: (1) IPOPT, which is code I did not write and can not very
    easily adapt (other than using input options), and (2) the problem
    specific code that I wrote which computes objective, constraints,
    Jacobian, etc., which I can very easily adjust. Without going into
    the details of my own code, I already know that my code has room for
    substantial improvement (a factor of 10 at least, but through a
    substantial effort). However, before I direct my efforts here
    (again, this is a large amount of work), I want to make sure that
    IPOPT itself can also improve over its current performance.
    Improving my own code by a factor of 10 only buys me a ~30%
    improvement in total performance in the current situation (rather
    than the 90% I'm looking for!). My first objective is to cut a good
    chunk out of those 4.4 seconds that are not happening in the problem
    specific code.<br>
    I apologize if my terminology is confusing, but I hope this is more
    clear.<br>
    <br>
    Thank you again for your patient responses!<br>
    <br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 09/08/2014 04:51 PM, Tony Kelman
      wrote:<br>
    </div>
    <blockquote cite="mid:A5303125C52C4A6DA343A2DBA99E5987@TKsamsung"
      type="cite">
      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
      <div dir="ltr">
        <div style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR:
          #000000">
          <div>If your functions are actually in C, then there’s not
            much use in going through the Python interface to Ipopt, it
            adds more moving parts and there could be some strange
            threading interaction with the Python runtime libraries for
            all I know. Still, your function evaluations took wall:
            2.417 out of OverallAlgorithm wall: 6.861. So there’s some
            room for improvement there.</div>
          <div> </div>
          <div>I’m confused by why you’re focusing on the “fraction of
            the run-time” being spent in Ipopt. I think we’re both
            getting confused using the same terms to refer to different
            things. We have no idea what your application is doing
            outside of Ipopt - let’s just talk about absolute time
            required by Ipopt to solve a given optimization problem. The
            breakdown within the time taken by Ipopt to solve an
            optimization problem can vary, but there is a normal
            expectation for what it should look like in most cases.</div>
          <div> </div>
          <div>OpenBLAS can have significant overhead for starting up
            its threading system, especially on small problems. It’s
            probably best to set OPENBLAS_NUM_THREADS to 1, and allocate
            threads instead to the multithreaded sparse solvers (MA86,
            MA97, WSMP, etc). An optimized BLAS doesn’t really help with
            Ipopt as much as you might hope based on the difference in
            dense performance between reference and optimized BLAS. MA57
            and Mumps and newer sparse solvers do aggregation of small
            dense sub-blocks during the sparse factorization and send
            those off to BLAS. Unless your problem is very dense to
            start with, those blocks that get sent to BLAS are rarely
            all that large. Multithreading in Blas really only helps for
            large dense problems that do enough work on each thread to
            make up for the synchronization overhead.</div>
          <div> </div>
          <div> </div>
          <div style="FONT-SIZE: small; TEXT-DECORATION: none;
            FONT-FAMILY: &quot;Calibri&quot;; FONT-WEIGHT: normal;
            COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline">
            <div style="FONT: 10pt tahoma">
              <div> </div>
              <div style="BACKGROUND: #f5f5f5">
                <div style="font-color: black"><b>From:</b> <a
                    moz-do-not-send="true"
                    title="jon.herman@Colorado.EDU"
                    href="mailto:jon.herman@Colorado.EDU">Jon Herman</a>
                </div>
                <div><b>Sent:</b> Monday, September 08, 2014 3:19 PM</div>
                <div><b>To:</b> <a moz-do-not-send="true"
                    title="kelman@berkeley.edu"
                    href="mailto:kelman@berkeley.edu">Tony Kelman</a> ;
                  <a moz-do-not-send="true"
                    title="gregmainland@gmail.com"
                    href="mailto:gregmainland@gmail.com">Greg Horn</a> ;
                  <a moz-do-not-send="true"
                    title="jon.herman@colorado.edu"
                    href="mailto:jon.herman@colorado.edu">Jon Herman</a>
                </div>
                <div><b>Cc:</b> <a moz-do-not-send="true"
                    title="ipopt@list.coin-or.org"
                    href="mailto:ipopt@list.coin-or.org">ipopt mailing
                    list</a> </div>
                <div><b>Subject:</b> Re: [Ipopt] IPOPT performance (and
                  impact of BLAS library)</div>
              </div>
            </div>
            <div> </div>
          </div>
          <div style="FONT-SIZE: small; TEXT-DECORATION: none;
            FONT-FAMILY: &quot;Calibri&quot;; FONT-WEIGHT: normal;
            COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline">Actually,
            that's a misunderstanding. The user functions are in C,
            Python is just used as a top layer, outside of the
            optimization (but I do initialize IPOPT through this
            interface).<br>
            <br>
            I'm now running on multiple cores through OpenBLAS, and from
            what I understand the ma86 solver accomplishes this through
            OpenMP. I can see on the system monitor that all cores are
            indeed being used, though it again hasn't had a significant
            impact on the total run-time...this does not seem to be
            where the hold-up was in the first place.<br>
            <br>
            Are my expectations unreasonable, and would IPOPT only take
            a lower fraction of the run-time for a system requiring more
            costly function evaluations?<br>
            And what do you mean by those processes taking so much time
            not making sense? Is there any chance this is due to me
            incorrectly utilizing IPOPT?<br>
            <br>
            <br>
            <div class="moz-cite-prefix">On 09/08/2014 03:41 PM, Tony
              Kelman wrote:<br>
            </div>
            <blockquote
              cite="mid:EA9948BCAAFF46B9ADFC41503ECF627A@TKsamsung"
              type="cite">
              <div dir="ltr">
                <div style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri';
                  COLOR: #000000">
                  <div>If you’re using PyIpopt, then presumably you’re
                    writing your function callbacks in Python, which is
                    not exactly a recipe for speed. According to that
                    timing they’re not completely negligible, the
                    gradient and Jacobian are taking almost as much time
                    as LinearSystemFactorization and
                    LinearSystemBacksolve. I’m surprised to see
                    UpdateBarrierParameter through CheckConvergence
                    taking that much time, that doesn’t make much sense.</div>
                  <div> </div>
                  <div>In what way are you running on 4 cores? Openblas?
                    MA27 doesn’t even use Blas.</div>
                  <div> </div>
                  <div> </div>
                  <div style="FONT-SIZE: small; TEXT-DECORATION: none;
                    FONT-FAMILY: &quot;Calibri&quot;; FONT-WEIGHT:
                    normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY:
                    inline">
                    <div style="FONT: 10pt tahoma">
                      <div> </div>
                      <div style="BACKGROUND: #f5f5f5">
                        <div style="font-color: black"><b>From:</b> <a
                            title="jon.herman@colorado.edu"
                            href="mailto:jon.herman@colorado.edu"
                            moz-do-not-send="true">Jon Herman</a> </div>
                        <div><b>Sent:</b> Monday, September 08, 2014
                          2:24 PM</div>
                        <div><b>To:</b> <a
                            title="gregmainland@gmail.com"
                            href="mailto:gregmainland@gmail.com"
                            moz-do-not-send="true">Greg Horn</a> ; <a
                            title="jon.herman@colorado.edu"
                            href="mailto:jon.herman@colorado.edu"
                            moz-do-not-send="true">Jon Herman</a> </div>
                        <div><b>Cc:</b> <a
                            title="ipopt@list.coin-or.org"
                            href="mailto:ipopt@list.coin-or.org"
                            moz-do-not-send="true">ipopt mailing list</a>
                        </div>
                        <div><b>Subject:</b> Re: [Ipopt] IPOPT
                          performance (and impact of BLAS library)</div>
                      </div>
                    </div>
                    <div> </div>
                  </div>
                  <div style="FONT-SIZE: small; TEXT-DECORATION: none;
                    FONT-FAMILY: &quot;Calibri&quot;; FONT-WEIGHT:
                    normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY:
                    inline">I've copied below the timing output from one
                    of the moderately sized examples I've looked at,
                    using ma27. I haven't taken a look at these outputs
                    before (thanks for the recommendation!), so I'll
                    study this a little more, but any thoughts are
                    welcome.<br>
                    This solves in 130 iterations (142
                    objective/constraint evaluations, 131 gradient
                    evaluations)<big>, so about 0.2 CPU seconds per
                      iteration (this is running on 4 cores)</big>.<br>
                    <br>
                    Using metis ordering doesn't seem to significantly
                    affect performance. I haven't tried using ma86 or
                    ma97 with OpenMP enabled, I'll go and give that a
                    shot.<br>
                    <br>
                    For Tony Kelman: what do you mean by "unless my
                    function evaluations are implemented inefficiently"?
                    At this point they are a minority of the run-time,
                    so any efficiency there does not seem to be the
                    problem? Or are you getting at something else?<br>
                    <br>
                    Thank you for the quick responses so far!<br>
                    <br>
                    Timing Statistics:<br>
                    <br>
                    OverallAlgorithm....................:     26.471
                    (sys:      0.922 wall:      6.861)<br>
                    PrintProblemStatistics.............:      0.001
                    (sys:      0.000 wall:      0.000)<br>
                    InitializeIterates.................:      0.175
                    (sys:      0.004 wall:      0.062)<br>
                    UpdateHessian......................:      0.467
                    (sys:      0.013 wall:      0.120)<br>
                    OutputIteration....................:      0.005
                    (sys:      0.001 wall:      0.002)<br>
                    UpdateBarrierParameter.............:      8.311
                    (sys:      0.309 wall:      2.153)<br>
                    ComputeSearchDirection.............:      6.042
                    (sys:      0.191 wall:      1.557)<br>
                    ComputeAcceptableTrialPoint........:      1.658
                    (sys:      0.059 wall:      0.429)<br>
                    AcceptTrialPoint...................:      1.943
                    (sys:      0.063 wall:      0.501)<br>
                    CheckConvergence...................:      7.860
                    (sys:      0.282 wall:      2.034)<br>
                    PDSystemSolverTotal.................:     12.647
                    (sys:      0.417 wall:      3.264)<br>
                    PDSystemSolverSolveOnce............:     11.446
                    (sys:      0.378 wall:      2.954)<br>
                    ComputeResiduals...................:      0.997
                    (sys:      0.030 wall:      0.257)<br>
                    StdAugSystemSolverMultiSolve.......:     10.953
                    (sys:      0.379 wall:      2.831)<br>
                    LinearSystemScaling................:      0.000
                    (sys:      0.000 wall:      0.000)<br>
                    LinearSystemSymbolicFactorization..:      0.018
                    (sys:      0.000 wall:      0.005)<br>
                    LinearSystemFactorization..........:      5.611
                    (sys:      0.195 wall:      1.451)<br>
                    LinearSystemBackSolve..............:      4.692
                    (sys:      0.169 wall:      1.215)<br>
                    LinearSystemStructureConverter.....:      0.000
                    (sys:      0.000 wall:      0.000)<br>
                      LinearSystemStructureConverterInit:      0.000
                    (sys:      0.000 wall:      0.000)<br>
                    QualityFunctionSearch...............:      1.581
                    (sys:      0.077 wall:      0.414)<br>
                    TryCorrector........................:      0.000
                    (sys:      0.000 wall:      0.000)<br>
                    Task1...............................:      0.363
                    (sys:      0.018 wall:      0.096)<br>
                    Task2...............................:      0.567
                    (sys:      0.022 wall:      0.147)<br>
                    Task3...............................:      0.076
                    (sys:      0.005 wall:      0.020)<br>
                    Task4...............................:      0.000
                    (sys:      0.000 wall:      0.000)<br>
                    Task5...............................:      0.507
                    (sys:      0.020 wall:      0.132)<br>
                    Function Evaluations................:      9.348
                    (sys:      0.328 wall:      2.417)<br>
                    Objective function.................:      0.240
                    (sys:      0.009 wall:      0.062)<br>
                    Objective function gradient........:      4.316
                    (sys:      0.150 wall:      1.116)<br>
                    Equality constraints...............:      0.316
                    (sys:      0.012 wall:      0.082)<br>
                    Inequality constraints.............:      0.000
                    (sys:      0.000 wall:      0.000)<br>
                    Equality constraint Jacobian.......:      4.477
                    (sys:      0.157 wall:      1.157)<br>
                    Inequality constraint Jacobian.....:      0.000
                    (sys:      0.000 wall:      0.000)<br>
                    Lagrangian Hessian.................:      0.000
                    (sys:      0.000 wall:      0.000)<br>
                    <br>
                    <br>
                    <br>
                    <div class="moz-cite-prefix">On 09/08/2014 03:02 PM,
                      Greg Horn wrote:<br>
                    </div>
                    <blockquote
cite="mid:CAAr-h4um4EE5L8fFw7bsARAiEZte29MYuKjO+nntvYLMSiYdmg@mail.gmail.com"
                      type="cite">
                      <div dir="ltr">My usual answer to increasing
                        efficiency is using HSL (ma86/ma97) with metis
                        ordering and openmp. How expensive are your
                        function evaluations? What is your normal time
                        per iteration, and how many iterations does it
                        take to solve? What sort of problem are you
                        solving?</div>
                      <div class="gmail_extra">
                        <div> </div>
                        <div class="gmail_quote">On Mon, Sep 8, 2014 at
                          10:53 PM, Jon Herman <span dir="ltr">&lt;<a
                              href="mailto:jon.herman@colorado.edu"
                              target="_blank" moz-do-not-send="true">jon.herman@colorado.edu</a>&gt;</span>
                          wrote:<br>
                          <blockquote class="gmail_quote"
                            style="PADDING-LEFT: 1ex; MARGIN: 0px 0px
                            0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
                            <div text="#000000" bgcolor="#FFFFFF">Hello,<br>
                              <br>
                              I am working on implementing IPOPT in a
                              piece of software that has a need for very
                              good performance. Unfortunately, it seems
                              that right now my total run-time is about
                              80% in IPOPT (that number excludes the
                              function evaluations, as well as any time
                              setting up the problem, etc.). For me to
                              put IPOPT to good use, I'm hoping to make
                              it run more efficiently, and even out the
                              workload between IPOPT and the function
                              evaluations, preferably shifting the work
                              to the function evaluations as much as
                              possible.<br>
                              <br>
                              Originally, I was using the BLAS/LAPACK
                              that can be installed with IPOPT. In an
                              attempt to improve performance, I switched
                              to OpenBLAS. To my confusion, performance
                              did not change at all. This is leading me
                              to believe that something other than the
                              BLAS library is dominating the cost. (I am
                              certain I properly removed the old
                              libraries when switching BLAS
                              implementation) I'm not sure how to
                              effectively narrow down where IPOPT is
                              spending most of it's time, and how to
                              subsequently improve that performance.<br>
                              <br>
                              I've made sure to try the ma27, ma57,
                              ma77, ma86, ma97, and mumps solvers.
                              Performance varies among them, but 80% of
                              the time spent in IPOPT is the best result
                              I achieve (which is typically with ma27 or
                              ma57, the other solvers are closer to
                              90%). I've also made sure to try problems
                              as small as 500 variables and 400
                              constraints, to as large as 110 000
                              variables and 80 000 constraints (and many
                              points in between those extremes).
                              Performance is very consistent across that
                              range (for a given solver), again
                              regardless of the BLAS library being used.
                              I've been doing this using the
                              quasi-Newton approximation for the
                              Hessian, which I was hoping to get away
                              with, but I suppose this may put a lot of
                              work into IPOPT's side of the court. I'll
                              also mention that I'm calling IPOPT
                              through the PyIPOPT module (though I'm
                              expecting this to create only a small,
                              fixed overhead). <br>
                              <br>
                              If you have any thoughts on why IPOPT
                              might be hogging such a large fraction of
                              my total run-time, and/or how I could
                              improve this (or determining if this might
                              be entirely unavoidable), I would greatly
                              appreciate it! (and of course I'd be happy
                              to provide additional information if that
                              would be useful)<br>
                              <br>
                              Best regards,<br>
                              <br>
                              Jon<br>
                            </div>
                            <br>
_______________________________________________<br>
                            Ipopt mailing list<br>
                            <a href="mailto:Ipopt@list.coin-or.org"
                              moz-do-not-send="true">Ipopt@list.coin-or.org</a><br>
                            <a
                              href="http://list.coin-or.org/mailman/listinfo/ipopt"
                              target="_blank" moz-do-not-send="true">http://list.coin-or.org/mailman/listinfo/ipopt</a><br>
                            <br>
                          </blockquote>
                        </div>
                        <div> </div>
                      </div>
                    </blockquote>
                    <br>
                    <hr> _______________________________________________<br>
                    Ipopt mailing list<br>
                    <a moz-do-not-send="true"
                      class="moz-txt-link-abbreviated"
                      href="mailto:Ipopt@list.coin-or.org">Ipopt@list.coin-or.org</a><br>
                    <a moz-do-not-send="true"
                      class="moz-txt-link-freetext"
                      href="http://list.coin-or.org/mailman/listinfo/ipopt">http://list.coin-or.org/mailman/listinfo/ipopt</a><br>
                  </div>
                </div>
              </div>
            </blockquote>
            <br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>