<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252" />

  
  </head><body bgcolor="#FFFFFF" text="#000000">Dear All,<br>

<br>

Indeed, for LP problems where solution time is dominated by the matrix-vector product in PRICE there is some scope for vectorization. This was a (disused) feature of CPLEX 10-20 years ago.<br>

<br>

However, there is still minimal computation for each memory access so, particularly with modern architectures, performance improvement is limited by the number of memory channels rather than number of cores.<br>

<br>

For general LP problems the amount of such task parallelism isn't enough to get meaningful overall performance gain.<br>

<br>

Julian<br><br><div class="gmail_quote">On 25 February 2017 15:53:46 GMT+00:00, John Forrest <john.forrest@fastercoin.com> wrote:<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


    <div class="moz-cite-prefix">David,<br />

      <br />

      I have done a bit of work to use AVX2 on Haswell architecture.  At

      present it is only in pricing out matrix as that seemed the best

      candidate for vector instructions.  I was partly just playing

      around and the code was aimed at primal simplex.  For dense

      factorizations, I am sure openblas etc will be upgraded.<br />

      <br />

      There were alignment issues on Haswell, and I can look to see what

      is new on later architectures.<br />

      <br />

      The idea was to be able to generate a copy of matrix suitable for

      vector instructions.  This was done using some dubious parameters

      and -DAVX2=4 (for Haswell) in configure to get performance

      improvement.<br />

      <br />

      John Forrest<br />

      On 25/02/17 11:04, David Prime wrote:<br />

    </div>

    <blockquote cite="mid:CAKAJfxtF8oNhUF-u-mrCqfftV1nZECAJ-UsT38Fh1Ktx6uSevg@mail.gmail.com" type="cite">

      <meta http-equiv="Context-Type" content="text/html; charset=UTF-8" />

      <div dir="ltr">Hi all,

        <div><br />

        </div>

        <div>With the release of the new Xeon Skylake CPUs imminent and

          google compute cloud offering early access to servers running

          them, I'm wondering if there's any pre existing work or

          thoughts on how CLP performs with these extra SIMD extensions?</div>

        <div><br />

        </div>

        <div>I'm planning to get hold of a few servers as soon as I can

          and compile the lib with various compilers/settings and see if

          there are any nice automatic optimisations to be had. I'll

          share any benchmarks I generate and maybe we can see if

          there's any further room for optimisation on these new

          architectures.</div>

        <div><br />

        </div>

        <div>Are there any clp specific build flags to enable/disable

          certain features that could be relevant here?</div>

        <div><br />

        </div>

        <div>Cheers,</div>

        <div>David</div>

      </div>

      <br />

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br />

      <pre wrap="">_______________________________________________

Clp mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Clp@list.coin-or.org">Clp@list.coin-or.org</a>

<a class="moz-txt-link-freetext" href="https://urldefense.proofpoint.com/v2/url?u=http-3A__list.coin-2Dor.org_mailman_listinfo_clp&d=DwMD-g&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=JO30qYlZJBnOYqNHaz_LLqRwIFx89cT9BYhx1vmS6QM&m=v7KRvkPyYVoa403NU9HsD4-3sxIvf93OSIVk60rCtrU&s=KxM0suqn5aHDkOaqzKC8iAJUOlSj1xr5HfBbAmPwt2E&e=">http://list.coin-or.org/mailman/listinfo/clp</a>

</pre>

    </blockquote>

    <p><br />

    </p>

  
</blockquote></div><br>

-- <br>

Sent from my Android device with K-9 Mail. Please excuse my brevity.</body></html>