SIMD/intrinsincs questions

Fri Nov 6 11:29:31 PST 2009

Mike Farnsworth wrote:
> Hey all,
> 
> The other day someone pointed me to Andrei's article in DDJ, and I dove headlong into researching D and what it is capable of.  I had only seen it referred to a few times with respect to template metaprogramming and that crazy compile-time ray tracer, but I have to say I've been very impressed with what I've seen, especially with D2.
> 
> A bit of background:  I work in the movie VFX industry, and worked in games development previously, and I have my own ray tracer that I experiment with (see http://renderspud.blogspot.com/ for info).  Back in college the incarnation of it was C++, then I went with C# so I could rapidly prototype a better version, and now I've slowly been converting it to C++ again with SSE support (getting to the SOA ray packet form soon, I hope) so that it doesn't suck speed-wise.  Anyway, long story short, SIMD is really important to me.
> 
> In dmd and ldc, is there any support for SSE or other SIMD intrinsics?  I realize that I could write some asm blocks, but that means each operation (vector add, sub, mul, dot product, etc.) would need to probably include a prelude and postlude with loads and stores.  I worry that this will not get optimized away (unless I don't use 'naked'?).
> 
> In the alternative, is it possible to support something along the lines of gcc's vector extensions:
> 
> typedef int v4si __attribute__ ((vector_size (16)));
> typedef float v4sf __attribute__ ((vector_size (16)));
> 
> where the compiler will automatically generate opAdd, etc. for those types?  I'm not suggesting using gcc's syntax, of course, but you get the idea..  It would provide a very easy way for the compiler to prefer to keep 4-float vectors in SSE registers, pass them in registers where appropriate in function calls, nuke lots of loads and stores when inlining, etc.
> 
> Having good, native SIMD support in D seems like a natural fit (heck, it's got complex numbers built-in).
> 
> Of course, there are some operations that the available SSE intrinsics cover that the compiler can't expose via the typical operators, so those still need to be supported somehow.  Does anyone know if ldc or dmd has those, or if they'll optimize away SSE loads and stores if I roll my own structs with asm blocks?  I saw from the ldc source it had the usual llvm intrinsics, but as far as hardware-specific codegen intrinsics I couldn't spot any.
> 
> Thanks,
> Mike Farnsworth

Hi Mike, Welcome to D!
In the latest compiler release (ie, this morning!), fixed-length arrays 
have become value types. This is a big step: it means that (eg) float[4] 
can be returned from a function for the first time. On 32-bit, we're a 
bit limited in SSE support (eg, since *no* 32-bit AMD processors have 
SSE2) -- but this will mean that on 64 bit, we'll be able to define an 
ABI in which  short static arrays are passed in SSE registers.

Also, D has array operations.  If x, y, and z are int[4], then
x[] = y[]*3 + z[];
corresponds directly to SIMD operations. DMD doesn't do much with them 
yet (there's been so many language design issues that optimisation 
hasn't received much attention), but the language has definitely been 
planned with SIMD in mind.