<div class="gmail_quote">On 16 January 2012 19:01, Timon Gehr <span dir="ltr"><<a href="mailto:timon.gehr@gmx.ch">timon.gehr@gmx.ch</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On 01/16/2012 05:59 PM, Manu wrote:<br>

</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

On 16 January 2012 18:48, Andrei Alexandrescu<br></div>

<<a href="mailto:SeeWebsiteForEmail@erdani.org" target="_blank">SeeWebsiteForEmail@erdani.org</a> <mailto:<a href="mailto:SeeWebsiteForEmail@erdani.org" target="_blank">SeeWebsiteForEmail@<u></u>erdani.org</a>>><div>

<div class="h5"><br>

wrote:<br>

<br>

    On 1/16/12 10:46 AM, Manu wrote:<br>

<br>

        A function using float arrays and a function using hardware vectors<br>

        should certainly not be the same speed.<br>

<br>

<br>

    My point was that the version using float arrays should<br>

    opportunistically use hardware ops whenever possible.<br>

<br>

<br></div></div><div class="im">

I think this is a mistake, because such a piece of code never exists<br>

outside of some context. If the context it exists within is all FPU code<br>

(and it is, it's a float array), then swapping between FPU and SIMD<br>

execution units will probably result in the function being slower than<br>

the original (also the float array is unaligned). The SIMD version<br>

however must exist within a SIMD context, since the API can't implicitly<br>

interact with floats, this guarantees that the context of each function<br>

matches that within which it lives.<br>

This is fundamental to fast vector performance. Using SIMD is an all or<br>

nothing decision, you can't just mix it in here and there.<br>

You don't go casting back and fourth between floats and ints on every<br>

other line... obviously it's imprecise, but it's also a major<br>

performance hazard. There is no difference here, except the performance<br>

hazard is much worse.<br>

</div></blockquote>

<br>

I think DMD now uses XMM registers for scalar floating point arithmetic on x86_64.<br>

</blockquote></div><br><div>x64 can do the swapping too with no penalty, but that is the only architecture that can. So it might be a viable x64 optimisation, but only for x64 codegen, which means any tech to detect and apply the optimisation should live in the back end, not in the front end as a higher level semantic.</div>