SIMD on Windows

jerro a at a.com
Sat Jun 22 09:04:25 PDT 2013


On Saturday, 22 June 2013 at 15:41:43 UTC, Benjamin Thaut wrote:
> Am 22.06.2013 15:53, schrieb jerro:
>>> In its current state you don't want to be using SIMD with dmd 
>>> because
>>> the generated assembly will be significantly slower then if 
>>> you just
>>> use the default FPU math.
>>
>> That may be true for some kinds of code, but it isn't true int 
>> general.
>> For example, see the comparison of pfft's performance when 
>> built with 64
>> bit DMD using SIMD and without SIMD:
>>
>> http://i.imgur.com/kYYI9R9.png
>>
>> This benchmark was run on a core i5 2500K on 64 bit Debian 
>> Wheezy.
>
> Ok I saw that you did write quite a few cirtical functions in 
> inline assembly. Not really a good argument for dmds codegen 
> with simd intrinsics.
>
> Kind Regards
> Benjamin Thaut

I have actually run that benchmark with the code from this branch:

https://github.com/jerro/pfft/tree/experimental

The only function in sse_float.d on that branch that uses inline 
assembly is scalar_to_vector. The reason why I used more inline 
assembly in the master branch is that DMD didn't have intrinsics 
for some instructions such as shufps at the time.

I'm not really arguing for DMD's codegen with SIMD intrinsics. 
It's more that, from what I've seen, it doesn't produce very good 
scalar floating point code either (at least when compared to LDC 
or GDC). Whether I use scalar floating point or SIMD, pfft is 
about two times slower if I compile it with DMD than it is if I 
compile it with GDC.


More information about the Digitalmars-d mailing list