<div class="gmail_quote">On 16 January 2012 18:48, Andrei Alexandrescu <span dir="ltr"><<a href="mailto:SeeWebsiteForEmail@erdani.org">SeeWebsiteForEmail@erdani.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On 1/16/12 10:46 AM, Manu wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

A function using float arrays and a function using hardware vectors<br>

should certainly not be the same speed.<br>

</blockquote>

<br></div>

My point was that the version using float arrays should opportunistically use hardware ops whenever possible.</blockquote><div class="gmail_quote"><br></div>I think this is a mistake, because such a piece of code never exists outside of some context. If the context it exists within is all FPU code (and it is, it's a float array), then swapping between FPU and SIMD execution units will probably result in the function being slower than the original (also the float array is unaligned). The SIMD version however must exist within a SIMD context, since the API can't implicitly interact with floats, this guarantees that the context of each function matches that within which it lives.</div>

<div class="gmail_quote">This is fundamental to fast vector performance. Using SIMD is an all or nothing decision, you can't just mix it in here and there.</div><div class="gmail_quote">You don't go casting back and fourth between floats and ints on every other line... obviously it's imprecise, but it's also a major performance hazard. There is no difference here, except the performance hazard is much worse.</div>