[Ipopt] IPOPT performance (and impact of BLAS library)

Jon Herman jon.herman at colorado.edu
Mon Sep 8 19:55:20 EDT 2014


The problem I'm working with is very sparse, so your suggestion was very 
applicable. I re-compiled OpenBLAS to be single-threaded (and then IPOPT 
with OpenMP). Performance remains identical, unfortunately.

To clarify: I see the software I'm writing as having two distinct parts: 
(1) IPOPT, which is code I did not write and can not very easily adapt 
(other than using input options), and (2) the problem specific code that 
I wrote which computes objective, constraints, Jacobian, etc., which I 
can very easily adjust. Without going into the details of my own code, I 
already know that my code has room for substantial improvement (a factor 
of 10 at least, but through a substantial effort). However, before I 
direct my efforts here (again, this is a large amount of work), I want 
to make sure that IPOPT itself can also improve over its current 
performance. Improving my own code by a factor of 10 only buys me a ~30% 
improvement in total performance in the current situation (rather than 
the 90% I'm looking for!). My first objective is to cut a good chunk out 
of those 4.4 seconds that are not happening in the problem specific code.
I apologize if my terminology is confusing, but I hope this is more clear.

Thank you again for your patient responses!



On 09/08/2014 04:51 PM, Tony Kelman wrote:
> If your functions are actually in C, then there’s not much use in 
> going through the Python interface to Ipopt, it adds more moving parts 
> and there could be some strange threading interaction with the Python 
> runtime libraries for all I know. Still, your function evaluations 
> took wall: 2.417 out of OverallAlgorithm wall: 6.861. So there’s some 
> room for improvement there.
> I’m confused by why you’re focusing on the “fraction of the run-time” 
> being spent in Ipopt. I think we’re both getting confused using the 
> same terms to refer to different things. We have no idea what your 
> application is doing outside of Ipopt - let’s just talk about absolute 
> time required by Ipopt to solve a given optimization problem. The 
> breakdown within the time taken by Ipopt to solve an optimization 
> problem can vary, but there is a normal expectation for what it should 
> look like in most cases.
> OpenBLAS can have significant overhead for starting up its threading 
> system, especially on small problems. It’s probably best to set 
> OPENBLAS_NUM_THREADS to 1, and allocate threads instead to the 
> multithreaded sparse solvers (MA86, MA97, WSMP, etc). An optimized 
> BLAS doesn’t really help with Ipopt as much as you might hope based on 
> the difference in dense performance between reference and optimized 
> BLAS. MA57 and Mumps and newer sparse solvers do aggregation of small 
> dense sub-blocks during the sparse factorization and send those off to 
> BLAS. Unless your problem is very dense to start with, those blocks 
> that get sent to BLAS are rarely all that large. Multithreading in 
> Blas really only helps for large dense problems that do enough work on 
> each thread to make up for the synchronization overhead.
> *From:* Jon Herman <mailto:jon.herman at Colorado.EDU>
> *Sent:* Monday, September 08, 2014 3:19 PM
> *To:* Tony Kelman <mailto:kelman at berkeley.edu> ; Greg Horn 
> <mailto:gregmainland at gmail.com> ; Jon Herman 
> <mailto:jon.herman at colorado.edu>
> *Cc:* ipopt mailing list <mailto:ipopt at list.coin-or.org>
> *Subject:* Re: [Ipopt] IPOPT performance (and impact of BLAS library)
> Actually, that's a misunderstanding. The user functions are in C, 
> Python is just used as a top layer, outside of the optimization (but I 
> do initialize IPOPT through this interface).
>
> I'm now running on multiple cores through OpenBLAS, and from what I 
> understand the ma86 solver accomplishes this through OpenMP. I can see 
> on the system monitor that all cores are indeed being used, though it 
> again hasn't had a significant impact on the total run-time...this 
> does not seem to be where the hold-up was in the first place.
>
> Are my expectations unreasonable, and would IPOPT only take a lower 
> fraction of the run-time for a system requiring more costly function 
> evaluations?
> And what do you mean by those processes taking so much time not making 
> sense? Is there any chance this is due to me incorrectly utilizing IPOPT?
>
>
> On 09/08/2014 03:41 PM, Tony Kelman wrote:
>> If you’re using PyIpopt, then presumably you’re writing your function 
>> callbacks in Python, which is not exactly a recipe for speed. 
>> According to that timing they’re not completely negligible, the 
>> gradient and Jacobian are taking almost as much time as 
>> LinearSystemFactorization and LinearSystemBacksolve. I’m surprised to 
>> see UpdateBarrierParameter through CheckConvergence taking that much 
>> time, that doesn’t make much sense.
>> In what way are you running on 4 cores? Openblas? MA27 doesn’t even 
>> use Blas.
>> *From:* Jon Herman <mailto:jon.herman at colorado.edu>
>> *Sent:* Monday, September 08, 2014 2:24 PM
>> *To:* Greg Horn <mailto:gregmainland at gmail.com> ; Jon Herman 
>> <mailto:jon.herman at colorado.edu>
>> *Cc:* ipopt mailing list <mailto:ipopt at list.coin-or.org>
>> *Subject:* Re: [Ipopt] IPOPT performance (and impact of BLAS library)
>> I've copied below the timing output from one of the moderately sized 
>> examples I've looked at, using ma27. I haven't taken a look at these 
>> outputs before (thanks for the recommendation!), so I'll study this a 
>> little more, but any thoughts are welcome.
>> This solves in 130 iterations (142 objective/constraint evaluations, 
>> 131 gradient evaluations), so about 0.2 CPU seconds per iteration 
>> (this is running on 4 cores).
>>
>> Using metis ordering doesn't seem to significantly affect 
>> performance. I haven't tried using ma86 or ma97 with OpenMP enabled, 
>> I'll go and give that a shot.
>>
>> For Tony Kelman: what do you mean by "unless my function evaluations 
>> are implemented inefficiently"? At this point they are a minority of 
>> the run-time, so any efficiency there does not seem to be the 
>> problem? Or are you getting at something else?
>>
>> Thank you for the quick responses so far!
>>
>> Timing Statistics:
>>
>> OverallAlgorithm....................:     26.471 (sys:      0.922 
>> wall:      6.861)
>> PrintProblemStatistics.............:      0.001 (sys:      0.000 
>> wall:      0.000)
>> InitializeIterates.................:      0.175 (sys:      0.004 
>> wall:      0.062)
>> UpdateHessian......................:      0.467 (sys:      0.013 
>> wall:      0.120)
>> OutputIteration....................:      0.005 (sys:      0.001 
>> wall:      0.002)
>> UpdateBarrierParameter.............:      8.311 (sys:      0.309 
>> wall:      2.153)
>> ComputeSearchDirection.............:      6.042 (sys:      0.191 
>> wall:      1.557)
>> ComputeAcceptableTrialPoint........:      1.658 (sys:      0.059 
>> wall:      0.429)
>> AcceptTrialPoint...................:      1.943 (sys:      0.063 
>> wall:      0.501)
>> CheckConvergence...................:      7.860 (sys:      0.282 
>> wall:      2.034)
>> PDSystemSolverTotal.................:     12.647 (sys:      0.417 
>> wall:      3.264)
>> PDSystemSolverSolveOnce............:     11.446 (sys:      0.378 
>> wall:      2.954)
>> ComputeResiduals...................:      0.997 (sys:      0.030 
>> wall:      0.257)
>> StdAugSystemSolverMultiSolve.......:     10.953 (sys:      0.379 
>> wall:      2.831)
>> LinearSystemScaling................:      0.000 (sys:      0.000 
>> wall:      0.000)
>> LinearSystemSymbolicFactorization..:      0.018 (sys:      0.000 
>> wall:      0.005)
>> LinearSystemFactorization..........:      5.611 (sys:      0.195 
>> wall:      1.451)
>> LinearSystemBackSolve..............:      4.692 (sys:      0.169 
>> wall:      1.215)
>> LinearSystemStructureConverter.....:      0.000 (sys:      0.000 
>> wall:      0.000)
>>   LinearSystemStructureConverterInit:      0.000 (sys:      0.000 
>> wall:      0.000)
>> QualityFunctionSearch...............:      1.581 (sys:      0.077 
>> wall:      0.414)
>> TryCorrector........................:      0.000 (sys:      0.000 
>> wall:      0.000)
>> Task1...............................:      0.363 (sys:      0.018 
>> wall:      0.096)
>> Task2...............................:      0.567 (sys:      0.022 
>> wall:      0.147)
>> Task3...............................:      0.076 (sys:      0.005 
>> wall:      0.020)
>> Task4...............................:      0.000 (sys:      0.000 
>> wall:      0.000)
>> Task5...............................:      0.507 (sys:      0.020 
>> wall:      0.132)
>> Function Evaluations................:      9.348 (sys:      0.328 
>> wall:      2.417)
>> Objective function.................:      0.240 (sys:      0.009 
>> wall:      0.062)
>> Objective function gradient........:      4.316 (sys:      0.150 
>> wall:      1.116)
>> Equality constraints...............:      0.316 (sys:      0.012 
>> wall:      0.082)
>> Inequality constraints.............:      0.000 (sys:      0.000 
>> wall:      0.000)
>> Equality constraint Jacobian.......:      4.477 (sys:      0.157 
>> wall:      1.157)
>> Inequality constraint Jacobian.....:      0.000 (sys:      0.000 
>> wall:      0.000)
>> Lagrangian Hessian.................:      0.000 (sys:      0.000 
>> wall:      0.000)
>>
>>
>>
>> On 09/08/2014 03:02 PM, Greg Horn wrote:
>>> My usual answer to increasing efficiency is using HSL (ma86/ma97) 
>>> with metis ordering and openmp. How expensive are your function 
>>> evaluations? What is your normal time per iteration, and how many 
>>> iterations does it take to solve? What sort of problem are you solving?
>>> On Mon, Sep 8, 2014 at 10:53 PM, Jon Herman <jon.herman at colorado.edu 
>>> <mailto:jon.herman at colorado.edu>> wrote:
>>>
>>>     Hello,
>>>
>>>     I am working on implementing IPOPT in a piece of software that
>>>     has a need for very good performance. Unfortunately, it seems
>>>     that right now my total run-time is about 80% in IPOPT (that
>>>     number excludes the function evaluations, as well as any time
>>>     setting up the problem, etc.). For me to put IPOPT to good use,
>>>     I'm hoping to make it run more efficiently, and even out the
>>>     workload between IPOPT and the function evaluations, preferably
>>>     shifting the work to the function evaluations as much as possible.
>>>
>>>     Originally, I was using the BLAS/LAPACK that can be installed
>>>     with IPOPT. In an attempt to improve performance, I switched to
>>>     OpenBLAS. To my confusion, performance did not change at all.
>>>     This is leading me to believe that something other than the BLAS
>>>     library is dominating the cost. (I am certain I properly removed
>>>     the old libraries when switching BLAS implementation) I'm not
>>>     sure how to effectively narrow down where IPOPT is spending most
>>>     of it's time, and how to subsequently improve that performance.
>>>
>>>     I've made sure to try the ma27, ma57, ma77, ma86, ma97, and
>>>     mumps solvers. Performance varies among them, but 80% of the
>>>     time spent in IPOPT is the best result I achieve (which is
>>>     typically with ma27 or ma57, the other solvers are closer to
>>>     90%). I've also made sure to try problems as small as 500
>>>     variables and 400 constraints, to as large as 110 000 variables
>>>     and 80 000 constraints (and many points in between those
>>>     extremes). Performance is very consistent across that range (for
>>>     a given solver), again regardless of the BLAS library being
>>>     used. I've been doing this using the quasi-Newton approximation
>>>     for the Hessian, which I was hoping to get away with, but I
>>>     suppose this may put a lot of work into IPOPT's side of the
>>>     court. I'll also mention that I'm calling IPOPT through the
>>>     PyIPOPT module (though I'm expecting this to create only a
>>>     small, fixed overhead).
>>>
>>>     If you have any thoughts on why IPOPT might be hogging such a
>>>     large fraction of my total run-time, and/or how I could improve
>>>     this (or determining if this might be entirely unavoidable), I
>>>     would greatly appreciate it! (and of course I'd be happy to
>>>     provide additional information if that would be useful)
>>>
>>>     Best regards,
>>>
>>>     Jon
>>>
>>>     _______________________________________________
>>>     Ipopt mailing list
>>>     Ipopt at list.coin-or.org <mailto:Ipopt at list.coin-or.org>
>>>     http://list.coin-or.org/mailman/listinfo/ipopt
>>>
>>
>> ------------------------------------------------------------------------
>> _______________________________________________
>> Ipopt mailing list
>> Ipopt at list.coin-or.org
>> http://list.coin-or.org/mailman/listinfo/ipopt
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.coin-or.org/pipermail/ipopt/attachments/20140908/cd7775b2/attachment-0001.html>


More information about the Ipopt mailing list