<HTML><HEAD>

<META content="text/html; charset=utf-8" http-equiv=Content-Type></HEAD>

<BODY dir=ltr bgColor=#ffffff text=#000000>

<DIV dir=ltr>

<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">

<DIV>If your functions are actually in C, then there’s not much use in going 

through the Python interface to Ipopt, it adds more moving parts and there could 

be some strange threading interaction with the Python runtime libraries for all 

I know. Still, your function evaluations took wall: 2.417 out of 

OverallAlgorithm wall: 6.861. So there’s some room for improvement there.</DIV>

<DIV>&nbsp;</DIV>

<DIV>I’m confused by why you’re focusing on the “fraction of the run-time” being 

spent in Ipopt. I think we’re both getting confused using the same terms to 

refer to different things. We have no idea what your application is doing 

outside of Ipopt - let’s just talk about absolute time required by Ipopt to 

solve a given optimization problem. The breakdown within the time taken by Ipopt 

to solve an optimization problem can vary, but there is a normal expectation for 

what it should look like in most cases.</DIV>

<DIV>&nbsp;</DIV>

<DIV>OpenBLAS can have significant overhead for starting up its threading 

system, especially on small problems. It’s probably best to set 

OPENBLAS_NUM_THREADS to 1, and allocate threads instead to the multithreaded 

sparse solvers (MA86, MA97, WSMP, etc). An optimized BLAS doesn’t really help 

with Ipopt as much as you might hope based on the difference in dense 

performance between reference and optimized BLAS. MA57 and Mumps and newer 

sparse solvers do aggregation of small dense sub-blocks during the sparse 

factorization and send those off to BLAS. Unless your problem is very dense to 

start with, those blocks that get sent to BLAS are rarely all that large. 

Multithreading in Blas really only helps for large dense problems that do enough 

work on each thread to make up for the synchronization overhead.</DIV>

<DIV>&nbsp;</DIV>

<DIV>&nbsp;</DIV>

<DIV 

style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>

<DIV style="FONT: 10pt tahoma">

<DIV><FONT size=3 face=Calibri></FONT>&nbsp;</DIV>

<DIV style="BACKGROUND: #f5f5f5">

<DIV style="font-color: black"><B>From:</B> <A title=jon.herman@Colorado.EDU 

href="mailto:jon.herman@Colorado.EDU">Jon Herman</A> </DIV>

<DIV><B>Sent:</B> Monday, September 08, 2014 3:19 PM</DIV>

<DIV><B>To:</B> <A title=kelman@berkeley.edu 

href="mailto:kelman@berkeley.edu">Tony Kelman</A> ; <A 

title=gregmainland@gmail.com href="mailto:gregmainland@gmail.com">Greg Horn</A> 

; <A title=jon.herman@colorado.edu href="mailto:jon.herman@colorado.edu">Jon 

Herman</A> </DIV>

<DIV><B>Cc:</B> <A title=ipopt@list.coin-or.org 

href="mailto:ipopt@list.coin-or.org">ipopt mailing list</A> </DIV>

<DIV><B>Subject:</B> Re: [Ipopt] IPOPT performance (and impact of BLAS 

library)</DIV></DIV></DIV>

<DIV>&nbsp;</DIV></DIV>

<DIV 

style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>Actually, 

that's a misunderstanding. The user functions are in C, Python is just used as a 

top layer, outside of the optimization (but I do initialize IPOPT through this 

interface).<BR><BR>I'm now running on multiple cores through OpenBLAS, and from 

what I understand the ma86 solver accomplishes this through OpenMP. I can see on 

the system monitor that all cores are indeed being used, though it again hasn't 

had a significant impact on the total run-time...this does not seem to be where 

the hold-up was in the first place.<BR><BR>Are my expectations unreasonable, and 

would IPOPT only take a lower fraction of the run-time for a system requiring 

more costly function evaluations?<BR>And what do you mean by those processes 

taking so much time not making sense? Is there any chance this is due to me 

incorrectly utilizing IPOPT?<BR><BR><BR>

<DIV class=moz-cite-prefix>On 09/08/2014 03:41 PM, Tony Kelman wrote:<BR></DIV>

<BLOCKQUOTE cite=mid:EA9948BCAAFF46B9ADFC41503ECF627A@TKsamsung type="cite">

  <DIV dir=ltr>

  <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">

  <DIV>If you’re using PyIpopt, then presumably you’re writing your function 

  callbacks in Python, which is not exactly a recipe for speed. According to 

  that timing they’re not completely negligible, the gradient and Jacobian are 

  taking almost as much time as LinearSystemFactorization and 

  LinearSystemBacksolve. I’m surprised to see UpdateBarrierParameter through 

  CheckConvergence taking that much time, that doesn’t make much sense.</DIV>

  <DIV>&nbsp;</DIV>

  <DIV>In what way are you running on 4 cores? Openblas? MA27 doesn’t even use 

  Blas.</DIV>

  <DIV>&nbsp;</DIV>

  <DIV>&nbsp;</DIV>

  <DIV 

  style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>

  <DIV style="FONT: 10pt tahoma">

  <DIV><FONT size=3 face=Calibri></FONT>&nbsp;</DIV>

  <DIV style="BACKGROUND: #f5f5f5">

  <DIV style="font-color: black"><B>From:</B> <A title=jon.herman@colorado.edu 

  href="mailto:jon.herman@colorado.edu" moz-do-not-send="true">Jon Herman</A> 

  </DIV>

  <DIV><B>Sent:</B> Monday, September 08, 2014 2:24 PM</DIV>

  <DIV><B>To:</B> <A title=gregmainland@gmail.com 

  href="mailto:gregmainland@gmail.com" moz-do-not-send="true">Greg Horn</A> ; <A 

  title=jon.herman@colorado.edu href="mailto:jon.herman@colorado.edu" 

  moz-do-not-send="true">Jon Herman</A> </DIV>

  <DIV><B>Cc:</B> <A title=ipopt@list.coin-or.org 

  href="mailto:ipopt@list.coin-or.org" moz-do-not-send="true">ipopt mailing 

  list</A> </DIV>

  <DIV><B>Subject:</B> Re: [Ipopt] IPOPT performance (and impact of BLAS 

  library)</DIV></DIV></DIV>

  <DIV>&nbsp;</DIV></DIV>

  <DIV 

  style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>I've 

  copied below the timing output from one of the moderately sized examples I've 

  looked at, using ma27. I haven't taken a look at these outputs before (thanks 

  for the recommendation!), so I'll study this a little more, but any thoughts 

  are welcome.<BR>This solves in 130 iterations (142 objective/constraint 

  evaluations, 131 gradient evaluations)<BIG>, so about 0.2 CPU seconds per 

  iteration (this is running on 4 cores)</BIG>.<BR><BR>Using metis ordering 

  doesn't seem to significantly affect performance. I haven't tried using ma86 

  or ma97 with OpenMP enabled, I'll go and give that a shot.<BR><BR>For Tony 

  Kelman: what do you mean by "unless my function evaluations are implemented 

  inefficiently"? At this point they are a minority of the run-time, so any 

  efficiency there does not seem to be the problem? Or are you getting at 

  something else?<BR><BR>Thank you for the quick responses so far!<BR><BR>Timing 

  Statistics:<BR><BR>OverallAlgorithm....................:&nbsp;&nbsp;&nbsp;&nbsp; 

  26.471 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.922 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  6.861)<BR>PrintProblemStatistics.............:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.001 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000)<BR>InitializeIterates.................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.175 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.004 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.062)<BR>UpdateHessian......................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.467 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.013 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.120)<BR>OutputIteration....................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.005 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.001 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.002)<BR>UpdateBarrierParameter.............:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  8.311 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.309 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  2.153)<BR>ComputeSearchDirection.............:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  6.042 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.191 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  1.557)<BR>ComputeAcceptableTrialPoint........:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  1.658 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.059 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.429)<BR>AcceptTrialPoint...................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  1.943 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.063 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.501)<BR>CheckConvergence...................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  7.860 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.282 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  2.034)<BR>PDSystemSolverTotal.................:&nbsp;&nbsp;&nbsp;&nbsp; 12.647 

  (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.417 wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  3.264)<BR>PDSystemSolverSolveOnce............:&nbsp;&nbsp;&nbsp;&nbsp; 11.446 

  (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.378 wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  2.954)<BR>ComputeResiduals...................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.997 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.030 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.257)<BR>StdAugSystemSolverMultiSolve.......:&nbsp;&nbsp;&nbsp;&nbsp; 10.953 

  (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.379 wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  2.831)<BR>LinearSystemScaling................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000)<BR>LinearSystemSymbolicFactorization..:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.018 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.005)<BR>LinearSystemFactorization..........:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  5.611 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.195 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  1.451)<BR>LinearSystemBackSolve..............:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  4.692 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.169 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  1.215)<BR>LinearSystemStructureConverter.....:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000)<BR>&nbsp; 

  LinearSystemStructureConverterInit:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 

  (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000)<BR>QualityFunctionSearch...............:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  1.581 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.077 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.414)<BR>TryCorrector........................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000)<BR>Task1...............................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.363 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.018 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.096)<BR>Task2...............................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.567 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.022 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.147)<BR>Task3...............................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.076 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.005 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.020)<BR>Task4...............................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000)<BR>Task5...............................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.507 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.020 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.132)<BR>Function 

  Evaluations................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 9.348 

  (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.328 wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  2.417)<BR>Objective function.................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.240 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.009 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.062)<BR>Objective function 

  gradient........:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4.316 

  (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.150 wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  1.116)<BR>Equality constraints...............:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.316 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.012 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.082)<BR>Inequality 

  constraints.............:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 

  (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000)<BR>Equality constraint Jacobian.......:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  4.477 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.157 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1.157)<BR>Inequality constraint 

  Jacobian.....:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 

  (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000)<BR>Lagrangian Hessian.................:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  0.000 (sys:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000 

  wall:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0.000)<BR><BR><BR><BR>

  <DIV class=moz-cite-prefix>On 09/08/2014 03:02 PM, Greg Horn wrote:<BR></DIV>

  <BLOCKQUOTE 

  cite=mid:CAAr-h4um4EE5L8fFw7bsARAiEZte29MYuKjO+nntvYLMSiYdmg@mail.gmail.com 

  type="cite">

    <DIV dir=ltr>My usual answer to increasing efficiency is using HSL 

    (ma86/ma97) with metis ordering and openmp. How expensive are your function 

    evaluations? What is your normal time per iteration, and how many iterations 

    does it take to solve? What sort of problem are you solving?</DIV>

    <DIV class=gmail_extra>

    <DIV>&nbsp;</DIV>

    <DIV class=gmail_quote>On Mon, Sep 8, 2014 at 10:53 PM, Jon Herman <SPAN 

    dir=ltr>&lt;<A href="mailto:jon.herman@colorado.edu" target=_blank 

    moz-do-not-send="true">jon.herman@colorado.edu</A>&gt;</SPAN> wrote:<BR>

    <BLOCKQUOTE class=gmail_quote 

    style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">

      <DIV text="#000000" bgcolor="#FFFFFF">Hello,<BR><BR>I am working on 

      implementing IPOPT in a piece of software that has a need for very good 

      performance. Unfortunately, it seems that right now my total run-time is 

      about 80% in IPOPT (that number excludes the function evaluations, as well 

      as any time setting up the problem, etc.). For me to put IPOPT to good 

      use, I'm hoping to make it run more efficiently, and even out the workload 

      between IPOPT and the function evaluations, preferably shifting the work 

      to the function evaluations as much as possible.<BR><BR>Originally, I was 

      using the BLAS/LAPACK that can be installed with IPOPT. In an attempt to 

      improve performance, I switched to OpenBLAS. To my confusion, performance 

      did not change at all. This is leading me to believe that something other 

      than the BLAS library is dominating the cost. (I am certain I properly 

      removed the old libraries when switching BLAS implementation) I'm not sure 

      how to effectively narrow down where IPOPT is spending most of it's time, 

      and how to subsequently improve that performance.<BR><BR>I've made sure to 

      try the ma27, ma57, ma77, ma86, ma97, and mumps solvers. Performance 

      varies among them, but 80% of the time spent in IPOPT is the best result I 

      achieve (which is typically with ma27 or ma57, the other solvers are 

      closer to 90%). I've also made sure to try problems as small as 500 

      variables and 400 constraints, to as large as 110 000 variables and 80 000 

      constraints (and many points in between those extremes). Performance is 

      very consistent across that range (for a given solver), again regardless 

      of the BLAS library being used. I've been doing this using the 

      quasi-Newton approximation for the Hessian, which I was hoping to get away 

      with, but I suppose this may put a lot of work into IPOPT's side of the 

      court. I'll also mention that I'm calling IPOPT through the PyIPOPT module 

      (though I'm expecting this to create only a small, fixed overhead). 

      <BR><BR>If you have any thoughts on why IPOPT might be hogging such a 

      large fraction of my total run-time, and/or how I could improve this (or 

      determining if this might be entirely unavoidable), I would greatly 

      appreciate it! (and of course I'd be happy to provide additional 

      information if that would be useful)<BR><BR>Best 

      regards,<BR><BR>Jon<BR></DIV><BR>_______________________________________________<BR>Ipopt 

      mailing list<BR><A href="mailto:Ipopt@list.coin-or.org" 

      moz-do-not-send="true">Ipopt@list.coin-or.org</A><BR><A 

      href="http://list.coin-or.org/mailman/listinfo/ipopt" target=_blank 

      moz-do-not-send="true">http://list.coin-or.org/mailman/listinfo/ipopt</A><BR><BR></BLOCKQUOTE></DIV>

    <DIV>&nbsp;</DIV></DIV></BLOCKQUOTE><BR>

  <HR>

  _______________________________________________<BR>Ipopt mailing list<BR><A 

  class=moz-txt-link-abbreviated 

  href="mailto:Ipopt@list.coin-or.org">Ipopt@list.coin-or.org</A><BR><A 

  class=moz-txt-link-freetext 

  href="http://list.coin-or.org/mailman/listinfo/ipopt">http://list.coin-or.org/mailman/listinfo/ipopt</A><BR></DIV></DIV></DIV></BLOCKQUOTE><BR></DIV></DIV></DIV></BODY></HTML>