<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
The problem I'm working with is very sparse, so your suggestion was
very applicable. I re-compiled OpenBLAS to be single-threaded (and
then IPOPT with OpenMP). Performance remains identical,
unfortunately.<br>
<br>
To clarify: I see the software I'm writing as having two distinct
parts: (1) IPOPT, which is code I did not write and can not very
easily adapt (other than using input options), and (2) the problem
specific code that I wrote which computes objective, constraints,
Jacobian, etc., which I can very easily adjust. Without going into
the details of my own code, I already know that my code has room for
substantial improvement (a factor of 10 at least, but through a
substantial effort). However, before I direct my efforts here
(again, this is a large amount of work), I want to make sure that
IPOPT itself can also improve over its current performance.
Improving my own code by a factor of 10 only buys me a ~30%
improvement in total performance in the current situation (rather
than the 90% I'm looking for!). My first objective is to cut a good
chunk out of those 4.4 seconds that are not happening in the problem
specific code.<br>
I apologize if my terminology is confusing, but I hope this is more
clear.<br>
<br>
Thank you again for your patient responses!<br>
<br>
<br>
<br>
<div class="moz-cite-prefix">On 09/08/2014 04:51 PM, Tony Kelman
wrote:<br>
</div>
<blockquote cite="mid:A5303125C52C4A6DA343A2DBA99E5987@TKsamsung"
type="cite">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<div dir="ltr">
<div style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR:
#000000">
<div>If your functions are actually in C, then there’s not
much use in going through the Python interface to Ipopt, it
adds more moving parts and there could be some strange
threading interaction with the Python runtime libraries for
all I know. Still, your function evaluations took wall:
2.417 out of OverallAlgorithm wall: 6.861. So there’s some
room for improvement there.</div>
<div> </div>
<div>I’m confused by why you’re focusing on the “fraction of
the run-time” being spent in Ipopt. I think we’re both
getting confused using the same terms to refer to different
things. We have no idea what your application is doing
outside of Ipopt - let’s just talk about absolute time
required by Ipopt to solve a given optimization problem. The
breakdown within the time taken by Ipopt to solve an
optimization problem can vary, but there is a normal
expectation for what it should look like in most cases.</div>
<div> </div>
<div>OpenBLAS can have significant overhead for starting up
its threading system, especially on small problems. It’s
probably best to set OPENBLAS_NUM_THREADS to 1, and allocate
threads instead to the multithreaded sparse solvers (MA86,
MA97, WSMP, etc). An optimized BLAS doesn’t really help with
Ipopt as much as you might hope based on the difference in
dense performance between reference and optimized BLAS. MA57
and Mumps and newer sparse solvers do aggregation of small
dense sub-blocks during the sparse factorization and send
those off to BLAS. Unless your problem is very dense to
start with, those blocks that get sent to BLAS are rarely
all that large. Multithreading in Blas really only helps for
large dense problems that do enough work on each thread to
make up for the synchronization overhead.</div>
<div> </div>
<div> </div>
<div style="FONT-SIZE: small; TEXT-DECORATION: none;
FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal;
COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline">
<div style="FONT: 10pt tahoma">
<div> </div>
<div style="BACKGROUND: #f5f5f5">
<div style="font-color: black"><b>From:</b> <a
moz-do-not-send="true"
title="jon.herman@Colorado.EDU"
href="mailto:jon.herman@Colorado.EDU">Jon Herman</a>
</div>
<div><b>Sent:</b> Monday, September 08, 2014 3:19 PM</div>
<div><b>To:</b> <a moz-do-not-send="true"
title="kelman@berkeley.edu"
href="mailto:kelman@berkeley.edu">Tony Kelman</a> ;
<a moz-do-not-send="true"
title="gregmainland@gmail.com"
href="mailto:gregmainland@gmail.com">Greg Horn</a> ;
<a moz-do-not-send="true"
title="jon.herman@colorado.edu"
href="mailto:jon.herman@colorado.edu">Jon Herman</a>
</div>
<div><b>Cc:</b> <a moz-do-not-send="true"
title="ipopt@list.coin-or.org"
href="mailto:ipopt@list.coin-or.org">ipopt mailing
list</a> </div>
<div><b>Subject:</b> Re: [Ipopt] IPOPT performance (and
impact of BLAS library)</div>
</div>
</div>
<div> </div>
</div>
<div style="FONT-SIZE: small; TEXT-DECORATION: none;
FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal;
COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline">Actually,
that's a misunderstanding. The user functions are in C,
Python is just used as a top layer, outside of the
optimization (but I do initialize IPOPT through this
interface).<br>
<br>
I'm now running on multiple cores through OpenBLAS, and from
what I understand the ma86 solver accomplishes this through
OpenMP. I can see on the system monitor that all cores are
indeed being used, though it again hasn't had a significant
impact on the total run-time...this does not seem to be
where the hold-up was in the first place.<br>
<br>
Are my expectations unreasonable, and would IPOPT only take
a lower fraction of the run-time for a system requiring more
costly function evaluations?<br>
And what do you mean by those processes taking so much time
not making sense? Is there any chance this is due to me
incorrectly utilizing IPOPT?<br>
<br>
<br>
<div class="moz-cite-prefix">On 09/08/2014 03:41 PM, Tony
Kelman wrote:<br>
</div>
<blockquote
cite="mid:EA9948BCAAFF46B9ADFC41503ECF627A@TKsamsung"
type="cite">
<div dir="ltr">
<div style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri';
COLOR: #000000">
<div>If you’re using PyIpopt, then presumably you’re
writing your function callbacks in Python, which is
not exactly a recipe for speed. According to that
timing they’re not completely negligible, the
gradient and Jacobian are taking almost as much time
as LinearSystemFactorization and
LinearSystemBacksolve. I’m surprised to see
UpdateBarrierParameter through CheckConvergence
taking that much time, that doesn’t make much sense.</div>
<div> </div>
<div>In what way are you running on 4 cores? Openblas?
MA27 doesn’t even use Blas.</div>
<div> </div>
<div> </div>
<div style="FONT-SIZE: small; TEXT-DECORATION: none;
FONT-FAMILY: "Calibri"; FONT-WEIGHT:
normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY:
inline">
<div style="FONT: 10pt tahoma">
<div> </div>
<div style="BACKGROUND: #f5f5f5">
<div style="font-color: black"><b>From:</b> <a
title="jon.herman@colorado.edu"
href="mailto:jon.herman@colorado.edu"
moz-do-not-send="true">Jon Herman</a> </div>
<div><b>Sent:</b> Monday, September 08, 2014
2:24 PM</div>
<div><b>To:</b> <a
title="gregmainland@gmail.com"
href="mailto:gregmainland@gmail.com"
moz-do-not-send="true">Greg Horn</a> ; <a
title="jon.herman@colorado.edu"
href="mailto:jon.herman@colorado.edu"
moz-do-not-send="true">Jon Herman</a> </div>
<div><b>Cc:</b> <a
title="ipopt@list.coin-or.org"
href="mailto:ipopt@list.coin-or.org"
moz-do-not-send="true">ipopt mailing list</a>
</div>
<div><b>Subject:</b> Re: [Ipopt] IPOPT
performance (and impact of BLAS library)</div>
</div>
</div>
<div> </div>
</div>
<div style="FONT-SIZE: small; TEXT-DECORATION: none;
FONT-FAMILY: "Calibri"; FONT-WEIGHT:
normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY:
inline">I've copied below the timing output from one
of the moderately sized examples I've looked at,
using ma27. I haven't taken a look at these outputs
before (thanks for the recommendation!), so I'll
study this a little more, but any thoughts are
welcome.<br>
This solves in 130 iterations (142
objective/constraint evaluations, 131 gradient
evaluations)<big>, so about 0.2 CPU seconds per
iteration (this is running on 4 cores)</big>.<br>
<br>
Using metis ordering doesn't seem to significantly
affect performance. I haven't tried using ma86 or
ma97 with OpenMP enabled, I'll go and give that a
shot.<br>
<br>
For Tony Kelman: what do you mean by "unless my
function evaluations are implemented inefficiently"?
At this point they are a minority of the run-time,
so any efficiency there does not seem to be the
problem? Or are you getting at something else?<br>
<br>
Thank you for the quick responses so far!<br>
<br>
Timing Statistics:<br>
<br>
OverallAlgorithm....................: 26.471
(sys: 0.922 wall: 6.861)<br>
PrintProblemStatistics.............: 0.001
(sys: 0.000 wall: 0.000)<br>
InitializeIterates.................: 0.175
(sys: 0.004 wall: 0.062)<br>
UpdateHessian......................: 0.467
(sys: 0.013 wall: 0.120)<br>
OutputIteration....................: 0.005
(sys: 0.001 wall: 0.002)<br>
UpdateBarrierParameter.............: 8.311
(sys: 0.309 wall: 2.153)<br>
ComputeSearchDirection.............: 6.042
(sys: 0.191 wall: 1.557)<br>
ComputeAcceptableTrialPoint........: 1.658
(sys: 0.059 wall: 0.429)<br>
AcceptTrialPoint...................: 1.943
(sys: 0.063 wall: 0.501)<br>
CheckConvergence...................: 7.860
(sys: 0.282 wall: 2.034)<br>
PDSystemSolverTotal.................: 12.647
(sys: 0.417 wall: 3.264)<br>
PDSystemSolverSolveOnce............: 11.446
(sys: 0.378 wall: 2.954)<br>
ComputeResiduals...................: 0.997
(sys: 0.030 wall: 0.257)<br>
StdAugSystemSolverMultiSolve.......: 10.953
(sys: 0.379 wall: 2.831)<br>
LinearSystemScaling................: 0.000
(sys: 0.000 wall: 0.000)<br>
LinearSystemSymbolicFactorization..: 0.018
(sys: 0.000 wall: 0.005)<br>
LinearSystemFactorization..........: 5.611
(sys: 0.195 wall: 1.451)<br>
LinearSystemBackSolve..............: 4.692
(sys: 0.169 wall: 1.215)<br>
LinearSystemStructureConverter.....: 0.000
(sys: 0.000 wall: 0.000)<br>
LinearSystemStructureConverterInit: 0.000
(sys: 0.000 wall: 0.000)<br>
QualityFunctionSearch...............: 1.581
(sys: 0.077 wall: 0.414)<br>
TryCorrector........................: 0.000
(sys: 0.000 wall: 0.000)<br>
Task1...............................: 0.363
(sys: 0.018 wall: 0.096)<br>
Task2...............................: 0.567
(sys: 0.022 wall: 0.147)<br>
Task3...............................: 0.076
(sys: 0.005 wall: 0.020)<br>
Task4...............................: 0.000
(sys: 0.000 wall: 0.000)<br>
Task5...............................: 0.507
(sys: 0.020 wall: 0.132)<br>
Function Evaluations................: 9.348
(sys: 0.328 wall: 2.417)<br>
Objective function.................: 0.240
(sys: 0.009 wall: 0.062)<br>
Objective function gradient........: 4.316
(sys: 0.150 wall: 1.116)<br>
Equality constraints...............: 0.316
(sys: 0.012 wall: 0.082)<br>
Inequality constraints.............: 0.000
(sys: 0.000 wall: 0.000)<br>
Equality constraint Jacobian.......: 4.477
(sys: 0.157 wall: 1.157)<br>
Inequality constraint Jacobian.....: 0.000
(sys: 0.000 wall: 0.000)<br>
Lagrangian Hessian.................: 0.000
(sys: 0.000 wall: 0.000)<br>
<br>
<br>
<br>
<div class="moz-cite-prefix">On 09/08/2014 03:02 PM,
Greg Horn wrote:<br>
</div>
<blockquote
cite="mid:CAAr-h4um4EE5L8fFw7bsARAiEZte29MYuKjO+nntvYLMSiYdmg@mail.gmail.com"
type="cite">
<div dir="ltr">My usual answer to increasing
efficiency is using HSL (ma86/ma97) with metis
ordering and openmp. How expensive are your
function evaluations? What is your normal time
per iteration, and how many iterations does it
take to solve? What sort of problem are you
solving?</div>
<div class="gmail_extra">
<div> </div>
<div class="gmail_quote">On Mon, Sep 8, 2014 at
10:53 PM, Jon Herman <span dir="ltr"><<a
href="mailto:jon.herman@colorado.edu"
target="_blank" moz-do-not-send="true">jon.herman@colorado.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="PADDING-LEFT: 1ex; MARGIN: 0px 0px
0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
<div text="#000000" bgcolor="#FFFFFF">Hello,<br>
<br>
I am working on implementing IPOPT in a
piece of software that has a need for very
good performance. Unfortunately, it seems
that right now my total run-time is about
80% in IPOPT (that number excludes the
function evaluations, as well as any time
setting up the problem, etc.). For me to
put IPOPT to good use, I'm hoping to make
it run more efficiently, and even out the
workload between IPOPT and the function
evaluations, preferably shifting the work
to the function evaluations as much as
possible.<br>
<br>
Originally, I was using the BLAS/LAPACK
that can be installed with IPOPT. In an
attempt to improve performance, I switched
to OpenBLAS. To my confusion, performance
did not change at all. This is leading me
to believe that something other than the
BLAS library is dominating the cost. (I am
certain I properly removed the old
libraries when switching BLAS
implementation) I'm not sure how to
effectively narrow down where IPOPT is
spending most of it's time, and how to
subsequently improve that performance.<br>
<br>
I've made sure to try the ma27, ma57,
ma77, ma86, ma97, and mumps solvers.
Performance varies among them, but 80% of
the time spent in IPOPT is the best result
I achieve (which is typically with ma27 or
ma57, the other solvers are closer to
90%). I've also made sure to try problems
as small as 500 variables and 400
constraints, to as large as 110 000
variables and 80 000 constraints (and many
points in between those extremes).
Performance is very consistent across that
range (for a given solver), again
regardless of the BLAS library being used.
I've been doing this using the
quasi-Newton approximation for the
Hessian, which I was hoping to get away
with, but I suppose this may put a lot of
work into IPOPT's side of the court. I'll
also mention that I'm calling IPOPT
through the PyIPOPT module (though I'm
expecting this to create only a small,
fixed overhead). <br>
<br>
If you have any thoughts on why IPOPT
might be hogging such a large fraction of
my total run-time, and/or how I could
improve this (or determining if this might
be entirely unavoidable), I would greatly
appreciate it! (and of course I'd be happy
to provide additional information if that
would be useful)<br>
<br>
Best regards,<br>
<br>
Jon<br>
</div>
<br>
_______________________________________________<br>
Ipopt mailing list<br>
<a href="mailto:Ipopt@list.coin-or.org"
moz-do-not-send="true">Ipopt@list.coin-or.org</a><br>
<a
href="http://list.coin-or.org/mailman/listinfo/ipopt"
target="_blank" moz-do-not-send="true">http://list.coin-or.org/mailman/listinfo/ipopt</a><br>
<br>
</blockquote>
</div>
<div> </div>
</div>
</blockquote>
<br>
<hr> _______________________________________________<br>
Ipopt mailing list<br>
<a moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:Ipopt@list.coin-or.org">Ipopt@list.coin-or.org</a><br>
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://list.coin-or.org/mailman/listinfo/ipopt">http://list.coin-or.org/mailman/listinfo/ipopt</a><br>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>