<div dir="ltr"><div>Hi All,</div><div><br></div><div>I've been using CLP for a little while and am finding it to be very good, proving fast and stable results.</div><div><br></div><div>One issue I've seen is that when I'm creating and executing many models in parallel (we build and execute perhaps a few hundred models per second across 16 cores in one program) the ratio of time spent creating the model and then executing it goes from 1:1 to 10:1 under heavy load when we attempt to run the same code on a 32 core machine with work spread evenly across all cores. Execution times remain pretty much static, only increasing with model complexity or when the machine running it is maxing out all available cores and the program becomes CPU starved.</div><div><br></div><div>The model creation time is spent almost exclusively in the </div><div>CoinPackedMatrix->appendCol(</div><div>const int <span class="gmail-Apple-tab-span" style="white-space:pre">     </span>vecsize,</div><div>const int * <span class="gmail-Apple-tab-span" style="white-space:pre">   </span>vecind,</div><div>const double * <span class="gmail-Apple-tab-span" style="white-space:pre"> </span>vecelem<span class="gmail-Apple-tab-span" style="white-space:pre">       </span>)</div><div><br></div><div>method. I'm wondering if there's any internal global locks in any of the libraries involved here that might be making performance worse as parallelism increases. Another theory I have is that we only want to increase parallelism from 16 to 32 cores when we're under very heavy load so the characteristics of the models may also be changing in terms of the size of the matrix we want to construct. Is there a more efficient way to allocate the matrix than using appendCol?</div><div><br></div><div>Any thoughts?</div><div><br></div><div>Thanks,</div><div>David Prime</div></div>