SIMD/intrinsincs questions

Fri Nov 6 13:10:42 PST 2009

Don wrote:
> Mike Farnsworth wrote:
>> Don Wrote:
>>
>>> Mike Farnsworth wrote:
>>
>>>> In dmd and ldc, is there any support for SSE or other SIMD 
>>>> intrinsics?  I realize that I could write some asm blocks, but that 
>>>> means each operation (vector add, sub, mul, dot product, etc.) would 
>>>> need to probably include a prelude and postlude with loads and 
>>>> stores.  I worry that this will not get optimized away (unless I 
>>>> don't use 'naked'?).
>>>>
>>>> In the alternative, is it possible to support something along the 
>>>> lines of gcc's vector extensions:
>>>>
>>>> typedef int v4si __attribute__ ((vector_size (16)));
>>>> typedef float v4sf __attribute__ ((vector_size (16)));
>>>>
>>>> where the compiler will automatically generate opAdd, etc. for those 
>>>> types?  I'm not suggesting using gcc's syntax, of course, but you 
>>>> get the idea..  It would provide a very easy way for the compiler to 
>>>> prefer to keep 4-float vectors in SSE registers, pass them in 
>>>> registers where appropriate in function calls, nuke lots of loads 
>>>> and stores when inlining, etc.
>>>>
>>>> Having good, native SIMD support in D seems like a natural fit 
>>>> (heck, it's got complex numbers built-in).
>>>>
>>>> Of course, there are some operations that the available SSE 
>>>> intrinsics cover that the compiler can't expose via the typical 
>>>> operators, so those still need to be supported somehow.  Does anyone 
>>>> know if ldc or dmd has those, or if they'll optimize away SSE loads 
>>>> and stores if I roll my own structs with asm blocks?  I saw from the 
>>>> ldc source it had the usual llvm intrinsics, but as far as 
>>>> hardware-specific codegen intrinsics I couldn't spot any.
>>>>
>>>> Thanks,
>>>> Mike Farnsworth
>>> Hi Mike, Welcome to D!
>>> In the latest compiler release (ie, this morning!), fixed-length 
>>> arrays have become value types. This is a big step: it means that 
>>> (eg) float[4] can be returned from a function for the first time. On 
>>> 32-bit, we're a bit limited in SSE support (eg, since *no* 32-bit AMD 
>>> processors have SSE2) -- but this will mean that on 64 bit, we'll be 
>>> able to define an ABI in which  short static arrays are passed in SSE 
>>> registers.
>>>
>>> Also, D has array operations.  If x, y, and z are int[4], then
>>> x[] = y[]*3 + z[];
>>> corresponds directly to SIMD operations. DMD doesn't do much with 
>>> them yet (there's been so many language design issues that 
>>> optimisation hasn't received much attention), but the language has 
>>> definitely been planned with SIMD in mind.
>>
>>
>> Awesome, does this also apply to dynamic arrays?  And how far does 
>> that go?  E.g. if I were to do something odd like:
>>
>> x[] = ((y[] % 5) ^ 2) + z[];
> 
> Yes, that works, and it applies to dynamic arrays too. A key idea behind 
> this is that since modern machines support SIMD, it's quite ridiculous 
> for a high level languages to not be able to express it.

Mike, for more info on the supported operations you may want to refer to 
the Thermopylae excerpt:

http://erdani.com/d/thermopylae.pdf

Andrei