[Ipopt] IPOPT performance (and impact of BLAS library)
Jon Herman
jon.herman at colorado.edu
Mon Sep 8 19:55:20 EDT 2014
The problem I'm working with is very sparse, so your suggestion was very
applicable. I re-compiled OpenBLAS to be single-threaded (and then IPOPT
with OpenMP). Performance remains identical, unfortunately.
To clarify: I see the software I'm writing as having two distinct parts:
(1) IPOPT, which is code I did not write and can not very easily adapt
(other than using input options), and (2) the problem specific code that
I wrote which computes objective, constraints, Jacobian, etc., which I
can very easily adjust. Without going into the details of my own code, I
already know that my code has room for substantial improvement (a factor
of 10 at least, but through a substantial effort). However, before I
direct my efforts here (again, this is a large amount of work), I want
to make sure that IPOPT itself can also improve over its current
performance. Improving my own code by a factor of 10 only buys me a ~30%
improvement in total performance in the current situation (rather than
the 90% I'm looking for!). My first objective is to cut a good chunk out
of those 4.4 seconds that are not happening in the problem specific code.
I apologize if my terminology is confusing, but I hope this is more clear.
Thank you again for your patient responses!
On 09/08/2014 04:51 PM, Tony Kelman wrote:
> If your functions are actually in C, then there’s not much use in
> going through the Python interface to Ipopt, it adds more moving parts
> and there could be some strange threading interaction with the Python
> runtime libraries for all I know. Still, your function evaluations
> took wall: 2.417 out of OverallAlgorithm wall: 6.861. So there’s some
> room for improvement there.
> I’m confused by why you’re focusing on the “fraction of the run-time”
> being spent in Ipopt. I think we’re both getting confused using the
> same terms to refer to different things. We have no idea what your
> application is doing outside of Ipopt - let’s just talk about absolute
> time required by Ipopt to solve a given optimization problem. The
> breakdown within the time taken by Ipopt to solve an optimization
> problem can vary, but there is a normal expectation for what it should
> look like in most cases.
> OpenBLAS can have significant overhead for starting up its threading
> system, especially on small problems. It’s probably best to set
> OPENBLAS_NUM_THREADS to 1, and allocate threads instead to the
> multithreaded sparse solvers (MA86, MA97, WSMP, etc). An optimized
> BLAS doesn’t really help with Ipopt as much as you might hope based on
> the difference in dense performance between reference and optimized
> BLAS. MA57 and Mumps and newer sparse solvers do aggregation of small
> dense sub-blocks during the sparse factorization and send those off to
> BLAS. Unless your problem is very dense to start with, those blocks
> that get sent to BLAS are rarely all that large. Multithreading in
> Blas really only helps for large dense problems that do enough work on
> each thread to make up for the synchronization overhead.
> *From:* Jon Herman <mailto:jon.herman at Colorado.EDU>
> *Sent:* Monday, September 08, 2014 3:19 PM
> *To:* Tony Kelman <mailto:kelman at berkeley.edu> ; Greg Horn
> <mailto:gregmainland at gmail.com> ; Jon Herman
> <mailto:jon.herman at colorado.edu>
> *Cc:* ipopt mailing list <mailto:ipopt at list.coin-or.org>
> *Subject:* Re: [Ipopt] IPOPT performance (and impact of BLAS library)
> Actually, that's a misunderstanding. The user functions are in C,
> Python is just used as a top layer, outside of the optimization (but I
> do initialize IPOPT through this interface).
>
> I'm now running on multiple cores through OpenBLAS, and from what I
> understand the ma86 solver accomplishes this through OpenMP. I can see
> on the system monitor that all cores are indeed being used, though it
> again hasn't had a significant impact on the total run-time...this
> does not seem to be where the hold-up was in the first place.
>
> Are my expectations unreasonable, and would IPOPT only take a lower
> fraction of the run-time for a system requiring more costly function
> evaluations?
> And what do you mean by those processes taking so much time not making
> sense? Is there any chance this is due to me incorrectly utilizing IPOPT?
>
>
> On 09/08/2014 03:41 PM, Tony Kelman wrote:
>> If you’re using PyIpopt, then presumably you’re writing your function
>> callbacks in Python, which is not exactly a recipe for speed.
>> According to that timing they’re not completely negligible, the
>> gradient and Jacobian are taking almost as much time as
>> LinearSystemFactorization and LinearSystemBacksolve. I’m surprised to
>> see UpdateBarrierParameter through CheckConvergence taking that much
>> time, that doesn’t make much sense.
>> In what way are you running on 4 cores? Openblas? MA27 doesn’t even
>> use Blas.
>> *From:* Jon Herman <mailto:jon.herman at colorado.edu>
>> *Sent:* Monday, September 08, 2014 2:24 PM
>> *To:* Greg Horn <mailto:gregmainland at gmail.com> ; Jon Herman
>> <mailto:jon.herman at colorado.edu>
>> *Cc:* ipopt mailing list <mailto:ipopt at list.coin-or.org>
>> *Subject:* Re: [Ipopt] IPOPT performance (and impact of BLAS library)
>> I've copied below the timing output from one of the moderately sized
>> examples I've looked at, using ma27. I haven't taken a look at these
>> outputs before (thanks for the recommendation!), so I'll study this a
>> little more, but any thoughts are welcome.
>> This solves in 130 iterations (142 objective/constraint evaluations,
>> 131 gradient evaluations), so about 0.2 CPU seconds per iteration
>> (this is running on 4 cores).
>>
>> Using metis ordering doesn't seem to significantly affect
>> performance. I haven't tried using ma86 or ma97 with OpenMP enabled,
>> I'll go and give that a shot.
>>
>> For Tony Kelman: what do you mean by "unless my function evaluations
>> are implemented inefficiently"? At this point they are a minority of
>> the run-time, so any efficiency there does not seem to be the
>> problem? Or are you getting at something else?
>>
>> Thank you for the quick responses so far!
>>
>> Timing Statistics:
>>
>> OverallAlgorithm....................: 26.471 (sys: 0.922
>> wall: 6.861)
>> PrintProblemStatistics.............: 0.001 (sys: 0.000
>> wall: 0.000)
>> InitializeIterates.................: 0.175 (sys: 0.004
>> wall: 0.062)
>> UpdateHessian......................: 0.467 (sys: 0.013
>> wall: 0.120)
>> OutputIteration....................: 0.005 (sys: 0.001
>> wall: 0.002)
>> UpdateBarrierParameter.............: 8.311 (sys: 0.309
>> wall: 2.153)
>> ComputeSearchDirection.............: 6.042 (sys: 0.191
>> wall: 1.557)
>> ComputeAcceptableTrialPoint........: 1.658 (sys: 0.059
>> wall: 0.429)
>> AcceptTrialPoint...................: 1.943 (sys: 0.063
>> wall: 0.501)
>> CheckConvergence...................: 7.860 (sys: 0.282
>> wall: 2.034)
>> PDSystemSolverTotal.................: 12.647 (sys: 0.417
>> wall: 3.264)
>> PDSystemSolverSolveOnce............: 11.446 (sys: 0.378
>> wall: 2.954)
>> ComputeResiduals...................: 0.997 (sys: 0.030
>> wall: 0.257)
>> StdAugSystemSolverMultiSolve.......: 10.953 (sys: 0.379
>> wall: 2.831)
>> LinearSystemScaling................: 0.000 (sys: 0.000
>> wall: 0.000)
>> LinearSystemSymbolicFactorization..: 0.018 (sys: 0.000
>> wall: 0.005)
>> LinearSystemFactorization..........: 5.611 (sys: 0.195
>> wall: 1.451)
>> LinearSystemBackSolve..............: 4.692 (sys: 0.169
>> wall: 1.215)
>> LinearSystemStructureConverter.....: 0.000 (sys: 0.000
>> wall: 0.000)
>> LinearSystemStructureConverterInit: 0.000 (sys: 0.000
>> wall: 0.000)
>> QualityFunctionSearch...............: 1.581 (sys: 0.077
>> wall: 0.414)
>> TryCorrector........................: 0.000 (sys: 0.000
>> wall: 0.000)
>> Task1...............................: 0.363 (sys: 0.018
>> wall: 0.096)
>> Task2...............................: 0.567 (sys: 0.022
>> wall: 0.147)
>> Task3...............................: 0.076 (sys: 0.005
>> wall: 0.020)
>> Task4...............................: 0.000 (sys: 0.000
>> wall: 0.000)
>> Task5...............................: 0.507 (sys: 0.020
>> wall: 0.132)
>> Function Evaluations................: 9.348 (sys: 0.328
>> wall: 2.417)
>> Objective function.................: 0.240 (sys: 0.009
>> wall: 0.062)
>> Objective function gradient........: 4.316 (sys: 0.150
>> wall: 1.116)
>> Equality constraints...............: 0.316 (sys: 0.012
>> wall: 0.082)
>> Inequality constraints.............: 0.000 (sys: 0.000
>> wall: 0.000)
>> Equality constraint Jacobian.......: 4.477 (sys: 0.157
>> wall: 1.157)
>> Inequality constraint Jacobian.....: 0.000 (sys: 0.000
>> wall: 0.000)
>> Lagrangian Hessian.................: 0.000 (sys: 0.000
>> wall: 0.000)
>>
>>
>>
>> On 09/08/2014 03:02 PM, Greg Horn wrote:
>>> My usual answer to increasing efficiency is using HSL (ma86/ma97)
>>> with metis ordering and openmp. How expensive are your function
>>> evaluations? What is your normal time per iteration, and how many
>>> iterations does it take to solve? What sort of problem are you solving?
>>> On Mon, Sep 8, 2014 at 10:53 PM, Jon Herman <jon.herman at colorado.edu
>>> <mailto:jon.herman at colorado.edu>> wrote:
>>>
>>> Hello,
>>>
>>> I am working on implementing IPOPT in a piece of software that
>>> has a need for very good performance. Unfortunately, it seems
>>> that right now my total run-time is about 80% in IPOPT (that
>>> number excludes the function evaluations, as well as any time
>>> setting up the problem, etc.). For me to put IPOPT to good use,
>>> I'm hoping to make it run more efficiently, and even out the
>>> workload between IPOPT and the function evaluations, preferably
>>> shifting the work to the function evaluations as much as possible.
>>>
>>> Originally, I was using the BLAS/LAPACK that can be installed
>>> with IPOPT. In an attempt to improve performance, I switched to
>>> OpenBLAS. To my confusion, performance did not change at all.
>>> This is leading me to believe that something other than the BLAS
>>> library is dominating the cost. (I am certain I properly removed
>>> the old libraries when switching BLAS implementation) I'm not
>>> sure how to effectively narrow down where IPOPT is spending most
>>> of it's time, and how to subsequently improve that performance.
>>>
>>> I've made sure to try the ma27, ma57, ma77, ma86, ma97, and
>>> mumps solvers. Performance varies among them, but 80% of the
>>> time spent in IPOPT is the best result I achieve (which is
>>> typically with ma27 or ma57, the other solvers are closer to
>>> 90%). I've also made sure to try problems as small as 500
>>> variables and 400 constraints, to as large as 110 000 variables
>>> and 80 000 constraints (and many points in between those
>>> extremes). Performance is very consistent across that range (for
>>> a given solver), again regardless of the BLAS library being
>>> used. I've been doing this using the quasi-Newton approximation
>>> for the Hessian, which I was hoping to get away with, but I
>>> suppose this may put a lot of work into IPOPT's side of the
>>> court. I'll also mention that I'm calling IPOPT through the
>>> PyIPOPT module (though I'm expecting this to create only a
>>> small, fixed overhead).
>>>
>>> If you have any thoughts on why IPOPT might be hogging such a
>>> large fraction of my total run-time, and/or how I could improve
>>> this (or determining if this might be entirely unavoidable), I
>>> would greatly appreciate it! (and of course I'd be happy to
>>> provide additional information if that would be useful)
>>>
>>> Best regards,
>>>
>>> Jon
>>>
>>> _______________________________________________
>>> Ipopt mailing list
>>> Ipopt at list.coin-or.org <mailto:Ipopt at list.coin-or.org>
>>> http://list.coin-or.org/mailman/listinfo/ipopt
>>>
>>
>> ------------------------------------------------------------------------
>> _______________________________________________
>> Ipopt mailing list
>> Ipopt at list.coin-or.org
>> http://list.coin-or.org/mailman/listinfo/ipopt
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.coin-or.org/pipermail/ipopt/attachments/20140908/cd7775b2/attachment-0001.html>
More information about the Ipopt
mailing list