SIMD/intrinsincs questions

Fri Nov 6 12:43:23 PST 2009

Mike Farnsworth wrote:
> Don Wrote:
> 
>> Mike Farnsworth wrote:
> 
>>> In dmd and ldc, is there any support for SSE or other SIMD intrinsics?  I realize that I could write some asm blocks, but that means each operation (vector add, sub, mul, dot product, etc.) would need to probably include a prelude and postlude with loads and stores.  I worry that this will not get optimized away (unless I don't use 'naked'?).
>>>
>>> In the alternative, is it possible to support something along the lines of gcc's vector extensions:
>>>
>>> typedef int v4si __attribute__ ((vector_size (16)));
>>> typedef float v4sf __attribute__ ((vector_size (16)));
>>>
>>> where the compiler will automatically generate opAdd, etc. for those types?  I'm not suggesting using gcc's syntax, of course, but you get the idea..  It would provide a very easy way for the compiler to prefer to keep 4-float vectors in SSE registers, pass them in registers where appropriate in function calls, nuke lots of loads and stores when inlining, etc.
>>>
>>> Having good, native SIMD support in D seems like a natural fit (heck, it's got complex numbers built-in).
>>>
>>> Of course, there are some operations that the available SSE intrinsics cover that the compiler can't expose via the typical operators, so those still need to be supported somehow.  Does anyone know if ldc or dmd has those, or if they'll optimize away SSE loads and stores if I roll my own structs with asm blocks?  I saw from the ldc source it had the usual llvm intrinsics, but as far as hardware-specific codegen intrinsics I couldn't spot any.
>>>
>>> Thanks,
>>> Mike Farnsworth
>> Hi Mike, Welcome to D!
>> In the latest compiler release (ie, this morning!), fixed-length arrays 
>> have become value types. This is a big step: it means that (eg) float[4] 
>> can be returned from a function for the first time. On 32-bit, we're a 
>> bit limited in SSE support (eg, since *no* 32-bit AMD processors have 
>> SSE2) -- but this will mean that on 64 bit, we'll be able to define an 
>> ABI in which  short static arrays are passed in SSE registers.
>>
>> Also, D has array operations.  If x, y, and z are int[4], then
>> x[] = y[]*3 + z[];
>> corresponds directly to SIMD operations. DMD doesn't do much with them 
>> yet (there's been so many language design issues that optimisation 
>> hasn't received much attention), but the language has definitely been 
>> planned with SIMD in mind.
> 
> 
> Awesome, does this also apply to dynamic arrays?  And how far does that go?  E.g. if I were to do something odd like:
> 
> x[] = ((y[] % 5) ^ 2) + z[];

Yes, that works, and it applies to dynamic arrays too. A key idea behind 
this is that since modern machines support SIMD, it's quite ridiculous 
for a high level languages to not be able to express it.

> Would that also work?  (Sorry, I should test it myself, but I'm at work and haven't had time to get D tools installed yet and so am flying blind.)
> 
> On another note, I'm aware that the latest gcc versions have pretty good SIMD auto-vectorization, so I assume that will eventually be in the cards for dmd.  As for lcd, that is pretty much dependent on llvm itself, and that doesn't have auto-vectorization of code yet AFAIK.
> 
> Anyone familiar with ldc have any idea about getting optimized asm and/or SSE intrinsics to do the right thing?  As soon as I have some time, I'll stop being lazy and actually go try some of this stuff out myself and see what the compiled asm looks like, but if anyone has already figured out the answers I can stay lazy.
> 
> If it comes down to me needing to create some x86 asm in structs to get some initial SSE-based vector types working, I'll do that and share with the class.  I'm not amazing with that stuff, but it could serve as a poor-man's stopgap until the compilers mature a bit in this regard.

Yes, lots of stuff that should work doesn't yet. The emphasis has been 
on getting the fundamentals solid. There's a lot of activity planned -- 
in fact I'm improving the compiler support for operator loading right now.