<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-15"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
The test case
<br>
<a class="moz-txt-link-freetext"
href="http://www.coin-or.org/CppAD/Doc/example_a11c.cpp.xml">http://www.coin-or.org/CppAD/Doc/example_a11c.cpp.xml</a>
<br>
is an example from the OpenMP standards document. It does not use
CppAD at all and is intended to tests the limitations of your system
and compiler. It is one of the cases run by openmp/run.sh. I have
found that, for some systems and compilers, it has the type of
performance that you describe below.
<br>
<br>
On 2/18/2011 12:30 PM, <a class="moz-txt-link-abbreviated" href="mailto:schattenpflanze@arcor.de">schattenpflanze@arcor.de</a> wrote:
<blockquote cite="mid:4D5ED6FC.4070004@arcor.de" type="cite">Hello,
<br>
<br>
I have another question concerning the performance of CppAD when
OpenMP is enabled. It seems that CppAD scales very badly when the
number of threads and cores exceeds a certain number. I have tried
to construct a minimal example reproducing the issue. Running the
simple (and absolutely pointless) example code listed below on a
machine with 32 native cores (no hyperthreading, single
workstation) yields the following results:
<br>
1 thread: 8.6 seconds
<br>
4 threads: 2.8 s
<br>
8 threads: 2.2 s
<br>
10 threads: 2.4 s
<br>
12 threads: 4 s
<br>
14 threads: 3.8 s
<br>
16 threads: 4.2 s
<br>
24 threads: 8.1 s (!)
<br>
28 threads: 9.5 s (!)
<br>
<br>
I am, of course, aware that additional threads cause additional
overhead, and that the performance does not necessarily increase
with the number of threads. However, this significant _decrease_
seems strange. In particular, if I remove the line
<br>
CppAD::Independent(x)
<br>
from the code, I obtain:
<br>
4 threads: 0.38 s
<br>
8 threads: 0.20 s
<br>
16 threads: 0.14 s
<br>
24 threads: 0.12 s,
<br>
which is the kind of scaling that I would have expected.
<br>
<br>
Memory consumption seems to be low. I have tried various
scheduling and variable sharing policies, but the problem
persists. I also attach the interesting results of the CppAD
openmp test script. What is the reason for this behaviour and how
can I counter it?
<br>
<br>
Thank you and best regards,
<br>
Peter
<br>
<br>
<br>
Test code:
<br>
----------------------------------------------------
<br>
int n_par = 45;
<br>
CppAD::vector<AD<double> > x(n_par);
<br>
for (int i=0; i<n_par; ++i) {
<br>
x[i] = i;
<br>
} <br>
<br>
#pragma omp parallel for \
<br>
firstprivate(x) \
<br>
schedule(dynamic,1) \
<br>
num_threads(global_paras->n_threads)
<br>
for (int i=0; i<1000; ++i) { <br>
CppAD::Independent(x);
<br>
CppAD::vector<AD<double> > y(1);
<br>
<br>
y[0] = 0.0;
<br>
for (int i=0; i<1000; ++i) {
<br>
for (int j=0; j<(int)x.size(); ++j) {
<br>
y[0] += CppAD::pow(x[j] - y[0], 0.1);
<br>
}
<br>
}
<br>
}
<br>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
CppAD mailing list
<a class="moz-txt-link-abbreviated" href="mailto:CppAD@list.coin-or.org">CppAD@list.coin-or.org</a>
<a class="moz-txt-link-freetext" href="http://list.coin-or.org/mailman/listinfo/cppad">http://list.coin-or.org/mailman/listinfo/cppad</a>
</pre>
</blockquote>
<br>
</body>
</html>