SIMD benchmark

Michel Fortin michel.fortin at michelf.com
Mon Jan 16 09:32:54 PST 2012


On 2012-01-16 16:59:44 +0000, Manu <turkeyman at gmail.com> said:

> 
> On 16 January 2012 18:48, Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org
>> wrote:
> 
>> On 1/16/12 10:46 AM, Manu wrote:
>> 
>>> A function using float arrays and a function using hardware vectors
>>> should certainly not be the same speed.
>> 
>> My point was that the version using float arrays should opportunistically
>> use hardware ops whenever possible.
> 
> I think this is a mistake, because such a piece of code never exists
> outside of some context. If the context it exists within is all FPU code
> (and it is, it's a float array), then swapping between FPU and SIMD
> execution units will probably result in the function being slower than the
> original (also the float array is unaligned). The SIMD version however must
> exist within a SIMD context, since the API can't implicitly interact with
> floats, this guarantees that the context of each function matches that
> within which it lives.
> This is fundamental to fast vector performance. Using SIMD is an all or
> nothing decision, you can't just mix it in here and there.
> You don't go casting back and fourth between floats and ints on every other
> line... obviously it's imprecise, but it's also a major performance hazard.
> There is no difference here, except the performance hazard is much worse.

Andrei's idea could be valid as an optimization when the compiler can 
see that all the operations can be performed with SIMD ops. In this 
particular case: if test1a(a) is inlined. But it can't work if the 
float[4] value crosses a function's boundary.

Or instead the optimization could be performed at the semantic level, 
like this: try to change the type of a variable float[4] to a float4, 
and if it can compile, use it instead. So if you have the same function 
working with a float[4] and a float4, and if all the functions you call 
on a given variable supports float4, it'll go for float4. But doing 
that at the semantic level would be rather messy, not counting the 
combinatorial explosion when multiple variables are at play.


-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



More information about the Digitalmars-d mailing list