[Ipopt] IPOPT performance (and impact of BLAS library)

Tony Kelman kelman at berkeley.edu
Mon Sep 8 18:51:29 EDT 2014


If your functions are actually in C, then there’s not much use in going through the Python interface to Ipopt, it adds more moving parts and there could be some strange threading interaction with the Python runtime libraries for all I know. Still, your function evaluations took wall: 2.417 out of OverallAlgorithm wall: 6.861. So there’s some room for improvement there.

I’m confused by why you’re focusing on the “fraction of the run-time” being spent in Ipopt. I think we’re both getting confused using the same terms to refer to different things. We have no idea what your application is doing outside of Ipopt - let’s just talk about absolute time required by Ipopt to solve a given optimization problem. The breakdown within the time taken by Ipopt to solve an optimization problem can vary, but there is a normal expectation for what it should look like in most cases.

OpenBLAS can have significant overhead for starting up its threading system, especially on small problems. It’s probably best to set OPENBLAS_NUM_THREADS to 1, and allocate threads instead to the multithreaded sparse solvers (MA86, MA97, WSMP, etc). An optimized BLAS doesn’t really help with Ipopt as much as you might hope based on the difference in dense performance between reference and optimized BLAS. MA57 and Mumps and newer sparse solvers do aggregation of small dense sub-blocks during the sparse factorization and send those off to BLAS. Unless your problem is very dense to start with, those blocks that get sent to BLAS are rarely all that large. Multithreading in Blas really only helps for large dense problems that do enough work on each thread to make up for the synchronization overhead.



From: Jon Herman 
Sent: Monday, September 08, 2014 3:19 PM
To: Tony Kelman ; Greg Horn ; Jon Herman 
Cc: ipopt mailing list 
Subject: Re: [Ipopt] IPOPT performance (and impact of BLAS library)

Actually, that's a misunderstanding. The user functions are in C, Python is just used as a top layer, outside of the optimization (but I do initialize IPOPT through this interface).

I'm now running on multiple cores through OpenBLAS, and from what I understand the ma86 solver accomplishes this through OpenMP. I can see on the system monitor that all cores are indeed being used, though it again hasn't had a significant impact on the total run-time...this does not seem to be where the hold-up was in the first place.

Are my expectations unreasonable, and would IPOPT only take a lower fraction of the run-time for a system requiring more costly function evaluations?
And what do you mean by those processes taking so much time not making sense? Is there any chance this is due to me incorrectly utilizing IPOPT?



On 09/08/2014 03:41 PM, Tony Kelman wrote:

  If you’re using PyIpopt, then presumably you’re writing your function callbacks in Python, which is not exactly a recipe for speed. According to that timing they’re not completely negligible, the gradient and Jacobian are taking almost as much time as LinearSystemFactorization and LinearSystemBacksolve. I’m surprised to see UpdateBarrierParameter through CheckConvergence taking that much time, that doesn’t make much sense.

  In what way are you running on 4 cores? Openblas? MA27 doesn’t even use Blas.



  From: Jon Herman 
  Sent: Monday, September 08, 2014 2:24 PM
  To: Greg Horn ; Jon Herman 
  Cc: ipopt mailing list 
  Subject: Re: [Ipopt] IPOPT performance (and impact of BLAS library)

  I've copied below the timing output from one of the moderately sized examples I've looked at, using ma27. I haven't taken a look at these outputs before (thanks for the recommendation!), so I'll study this a little more, but any thoughts are welcome.
  This solves in 130 iterations (142 objective/constraint evaluations, 131 gradient evaluations), so about 0.2 CPU seconds per iteration (this is running on 4 cores).

  Using metis ordering doesn't seem to significantly affect performance. I haven't tried using ma86 or ma97 with OpenMP enabled, I'll go and give that a shot.

  For Tony Kelman: what do you mean by "unless my function evaluations are implemented inefficiently"? At this point they are a minority of the run-time, so any efficiency there does not seem to be the problem? Or are you getting at something else?

  Thank you for the quick responses so far!

  Timing Statistics:

  OverallAlgorithm....................:     26.471 (sys:      0.922 wall:      6.861)
  PrintProblemStatistics.............:      0.001 (sys:      0.000 wall:      0.000)
  InitializeIterates.................:      0.175 (sys:      0.004 wall:      0.062)
  UpdateHessian......................:      0.467 (sys:      0.013 wall:      0.120)
  OutputIteration....................:      0.005 (sys:      0.001 wall:      0.002)
  UpdateBarrierParameter.............:      8.311 (sys:      0.309 wall:      2.153)
  ComputeSearchDirection.............:      6.042 (sys:      0.191 wall:      1.557)
  ComputeAcceptableTrialPoint........:      1.658 (sys:      0.059 wall:      0.429)
  AcceptTrialPoint...................:      1.943 (sys:      0.063 wall:      0.501)
  CheckConvergence...................:      7.860 (sys:      0.282 wall:      2.034)
  PDSystemSolverTotal.................:     12.647 (sys:      0.417 wall:      3.264)
  PDSystemSolverSolveOnce............:     11.446 (sys:      0.378 wall:      2.954)
  ComputeResiduals...................:      0.997 (sys:      0.030 wall:      0.257)
  StdAugSystemSolverMultiSolve.......:     10.953 (sys:      0.379 wall:      2.831)
  LinearSystemScaling................:      0.000 (sys:      0.000 wall:      0.000)
  LinearSystemSymbolicFactorization..:      0.018 (sys:      0.000 wall:      0.005)
  LinearSystemFactorization..........:      5.611 (sys:      0.195 wall:      1.451)
  LinearSystemBackSolve..............:      4.692 (sys:      0.169 wall:      1.215)
  LinearSystemStructureConverter.....:      0.000 (sys:      0.000 wall:      0.000)
    LinearSystemStructureConverterInit:      0.000 (sys:      0.000 wall:      0.000)
  QualityFunctionSearch...............:      1.581 (sys:      0.077 wall:      0.414)
  TryCorrector........................:      0.000 (sys:      0.000 wall:      0.000)
  Task1...............................:      0.363 (sys:      0.018 wall:      0.096)
  Task2...............................:      0.567 (sys:      0.022 wall:      0.147)
  Task3...............................:      0.076 (sys:      0.005 wall:      0.020)
  Task4...............................:      0.000 (sys:      0.000 wall:      0.000)
  Task5...............................:      0.507 (sys:      0.020 wall:      0.132)
  Function Evaluations................:      9.348 (sys:      0.328 wall:      2.417)
  Objective function.................:      0.240 (sys:      0.009 wall:      0.062)
  Objective function gradient........:      4.316 (sys:      0.150 wall:      1.116)
  Equality constraints...............:      0.316 (sys:      0.012 wall:      0.082)
  Inequality constraints.............:      0.000 (sys:      0.000 wall:      0.000)
  Equality constraint Jacobian.......:      4.477 (sys:      0.157 wall:      1.157)
  Inequality constraint Jacobian.....:      0.000 (sys:      0.000 wall:      0.000)
  Lagrangian Hessian.................:      0.000 (sys:      0.000 wall:      0.000)




  On 09/08/2014 03:02 PM, Greg Horn wrote:

    My usual answer to increasing efficiency is using HSL (ma86/ma97) with metis ordering and openmp. How expensive are your function evaluations? What is your normal time per iteration, and how many iterations does it take to solve? What sort of problem are you solving?

    On Mon, Sep 8, 2014 at 10:53 PM, Jon Herman <jon.herman at colorado.edu> wrote:

      Hello,

      I am working on implementing IPOPT in a piece of software that has a need for very good performance. Unfortunately, it seems that right now my total run-time is about 80% in IPOPT (that number excludes the function evaluations, as well as any time setting up the problem, etc.). For me to put IPOPT to good use, I'm hoping to make it run more efficiently, and even out the workload between IPOPT and the function evaluations, preferably shifting the work to the function evaluations as much as possible.

      Originally, I was using the BLAS/LAPACK that can be installed with IPOPT. In an attempt to improve performance, I switched to OpenBLAS. To my confusion, performance did not change at all. This is leading me to believe that something other than the BLAS library is dominating the cost. (I am certain I properly removed the old libraries when switching BLAS implementation) I'm not sure how to effectively narrow down where IPOPT is spending most of it's time, and how to subsequently improve that performance.

      I've made sure to try the ma27, ma57, ma77, ma86, ma97, and mumps solvers. Performance varies among them, but 80% of the time spent in IPOPT is the best result I achieve (which is typically with ma27 or ma57, the other solvers are closer to 90%). I've also made sure to try problems as small as 500 variables and 400 constraints, to as large as 110 000 variables and 80 000 constraints (and many points in between those extremes). Performance is very consistent across that range (for a given solver), again regardless of the BLAS library being used. I've been doing this using the quasi-Newton approximation for the Hessian, which I was hoping to get away with, but I suppose this may put a lot of work into IPOPT's side of the court. I'll also mention that I'm calling IPOPT through the PyIPOPT module (though I'm expecting this to create only a small, fixed overhead). 

      If you have any thoughts on why IPOPT might be hogging such a large fraction of my total run-time, and/or how I could improve this (or determining if this might be entirely unavoidable), I would greatly appreciate it! (and of course I'd be happy to provide additional information if that would be useful)

      Best regards,

      Jon


      _______________________________________________
      Ipopt mailing list
      Ipopt at list.coin-or.org
      http://list.coin-or.org/mailman/listinfo/ipopt





------------------------------------------------------------------------------
  _______________________________________________
  Ipopt mailing list
  Ipopt at list.coin-or.org
  http://list.coin-or.org/mailman/listinfo/ipopt


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.coin-or.org/pipermail/ipopt/attachments/20140908/7bd74e1b/attachment-0001.html>


More information about the Ipopt mailing list