SSE in D
    bearophile 
    bearophileHUGS at lycos.com
       
    Sat Oct  2 18:34:16 PDT 2010
    
    
  
Emil Madsen:
You are asking many different things, let's disentangle your questions a little.
>Is there a D equivalent of the "xmmintrin.h", or any other convenient way of doing SSE in D?<
D2 language is not designed to be an academic language, it's designed to be a reasonably practical language (despite some of its feature are not just buggy or unfinished, but also contain new design ideas, that far from being "battle tested", so no one knows if they will actually turn out to be good in large or very large D2 programs).
But its implementation is not fully practical yet. In a compiler like GCC you may see a ton of dirty or smelly little features that turn out being practically useful or even almost necessary for real-world code, that are absent from the C standard. The D2 compiler lacks a big amount of such dirty utility corner cases. Even the (D1) compiler LDC shows some of such necessary dirty little features, like the allow_inline pragma to allow inlining of functions that contain asm, and so on. I guess that when D2 will be more finished, and some people will write a more efficient implementation of D2, those little smelly things will be added in abundance.
The xmmintrin little dirty intrinsics are absent from DMD and D, both in practice and by design. GCC C is not designed much, they just add those SIMD operations to the ball of mud named GNU C (plus handy operator overloading if you want to sum or mult two registers represented as special arrays of doubles or floats or ints). D here is designed in a bit more idealistic way, and it tries to be semantically cleaner, so instead of those intrinsics, you are supposed to use vectorial operations done on arrays (both static and dynamic).
Many of such operations are already implemented and more or less they work, but unless your arrays are large, they actually usually slow down your code, because they are chunks of pre-written asm (that use SSE+ registers too) designed for large arrays, are they are not inlined. In theory in future the D front-end will be able to replace a sum of two 4-float static arrays with a single SSE instruction (or little more) (if you have compiled the code for SSE-enabled CPUs). In practice DMD is far from this point, and the development efforts are (rightly!) focused on finishing core features and removing the worst implementation (or even design) bugs. Optimization of code generation matters are for later.
> - I've been looking into the Array Operators, but will those work, for
> instance if I'm doing something alike:
> a[3], b[4]
> c[4] = a+b;
The right D syntax is:
float[4] a, b, c;
c[] = a[] + b[];
You must always use [] after the array name. Arrays must have the same length.
And currently you can't use this syntax:
void main() {
    float[4] a, b;
    float[4] c[] = a[] + b[];
}
That gives the error:
test.d(3): Error: cannot implicitly convert expression (a[] + b[]) of type float[] to float[4u][]
Probably because of a unforeseen design bug that causes such collision between D and C syntax that is accepted still in D.
See this bug report for more info about this design problem, that so far most people (including the main designers) seem to happily ignore:
http://d.puremagic.com/issues/show_bug.cgi?id=3971
Here I have suggested a possible solution, the introduction of a -cstyle compiler flag, that was ignored even more:
http://d.puremagic.com/issues/show_bug.cgi?id=4580
So this code works:
void main() {
    float[4] a, b, c;
    c[] = a[] + b[];
}
But it performs a call to the asm routine that performs the vector c=a+b in assembly, that uses SSE registers too if your CPU (detected at runtime) supports them.
> and when will the compiler write SSE asm for the array operators?
DMD currently never writes SSE asm, unless you use those asm instructions in inlined asm code. The 64 bit DMD will probably be able to use those registers too, but I have no idea if then 32 bit DMD too will use them, I hope so, but I have little hope. I'd like to know this.
D1 LDC now uses SSE registers for most of its floating point operations because LLVM is very bad in using the X86 floating point stack.
Low-level D code written for D1 ldc is usually about as efficient as C code written for GCC. This is a very good thing. But recently the development of LDC has slowed down a lot, and there is no D2 version of it, it's not updated to the latest versions of LLVM and there's no Windows support because LLVM devs are paid by Apple and they don't care to make LLVM work fully (== with exceptions too) for Windows too, they just need to give to people the illusion that LLVM is multi-platform. I used to help LLVM development, but I have stopped until they will add a good support of exceptions on Windows.
There is a GCC-based D compiler too, named GDC, and I think it works, but I have never appreciated it much on Windows. Other people may give you more/better info on it.
> - is there a target=architecture for the compiler? or will it simply write
> SSE if one defines something alike -msse4? -
LDC D1 allows you to specify the target a little, while I think DMD always targets a Pentium1.
> I'm having a bit of trouble finding stuff
> about SSE for D, sources on the subject anyone?
There is not much to search :-)
Bye,
bearophile
    
    
More information about the Digitalmars-d
mailing list