SIMD on Windows

Manu turkeyman at gmail.com
Sat Jun 29 19:22:10 PDT 2013


You should probably watch my talk again ;)
Most of the points I make towards the end when I make the claim "almost
everyone who tries to use SIMD will see the same or slower performance, and
the reason is they have simply revealed other bottlenecks".
And I also made the point "only by strictly applying ALL of the points I
demonstrated, will you see significant performance improvement".

The problem with your code is that it doesn't do any real work. Your
operations are all dependent on the result of the previous operation. The
scalar operations have a shorter latency than the SIMD operations, and they
all execute in parallel.
This is exactly the pathological worst-case comparison that basically
everyone new to SIMD tries to write and wonders why it's slow.
I guess I should have demonstrated this point more clearly in my talk. It
was very rushed (actually, the script was basically on the spot), sorry
about that!

There's not enough code in those loops. You're basically profiling loop
iteration performance and the latency of a float opcode vs a simd opcode...
not any significant work.
You should see a big difference if you unroll the loop 4-8 times (or more
for such a short loop, depending on the CPU).
I also made the point that you should always avoid doing SIMD profiling on
an x86, and certainly not an x64, since it is both, the most forgiving
(results in the least wins of any arch), and also the hardest to predict;
the performance difference you see will almost certainly not be the same on
someone else's chip..

Look again to my points about latency, reducing the overall pipeline length
(demonstrated with the addition sequence), and unrolling the loops.


On 30 June 2013 06:34, Jonathan Dunlap <jadit2 at gmail.com> wrote:

> I did watch Manu's a few days ago which inspired me to start this project.
> With the updates in http://dpaste.dzfl.pl/fce2d93b**, I'm still a bit
> clueless as to why there is almost zero performance difference...
> considering that is seems like an ideal setup to benefit from SIMD. I feel
> that if I can't see gains here: that I shouldn't bother using them in
> practice, where sometimes non-ideal operations must be done.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20130630/fd3049a3/attachment-0001.html>


More information about the Digitalmars-d mailing list