Does dmd have SSE intrinsics?

Jeremie Pelletier jeremiep at gmail.com
Mon Sep 21 12:45:27 PDT 2009


Don wrote:
> dsimcha wrote:
>> == Quote from Don (nospam at nospam.com)'s article
>>> Jeremie Pelletier wrote:
>>>> While writing SSE assembly by hand in D is fun and works well, I'm 
>>>> wondering
>> if the compiler has intrinsics for its instruction set, much like 
>> xmmintrin.h in C.
>>>> The reason is that the compiler can usually reorder the intrinsics 
>>>> to optimize
>> performance.
>>>> I could always use C code to implement my SSE routines but then I'd 
>>>> lose the
>> ability to inline them in D.
>>> I know this is an old post, but since it wasn't answered...
>>> Make sure you know what the SSE intrinsics actually *do* in VC++/Intel!
>>> I've read many complaints about how poorly they perform on all compilers
>>> -- the penalty for allowing them to be reordered is that extra
>>> instructions are often added, which means that straightforward C code is
>>> sometimes faster!
>>> In this regard, I'm personally excited about array operations. I think
>>> the need for SSE intrinsics and vectorisation is a result of abstract
>>> inversion: the instruction set is higher-level than the "high level
>>> language"! Array operations allow D to catch up with asm again. When
>>> array operations get implemented properly, it'll be interesting to see
>>> how much need for SSE intrinsics remains.
>>
>> What's wrong with the current implementation of array ops (other than 
>> a few misc.
>> bugs that have already been filed)?  I thought they already use SSE if 
>> available.
> 
> (1) They don't take advantage of fixed-length arrays. In particular, 
> operations on float[4] should be a single SSE instruction (no function 
> call, no loop, nothing). This will make a huge difference to game and 
> graphics programmers, I believe.
> (2) The operations don't block on cache size.
> (3) DMD doesn't allow you to generate code assuming a minimum CPU 
> capabilities. (In fact, when generating inline asm, the CPU type is 
> 8086! (this is in bugzilla)) This limits the possible use of (1).
> 
> It's issue (1) which is the killer.


I agree that a -arch switch of some sort would the best thing to hit 
dmd. It is already most useful in gcc which supported up to core2 when I 
last used it.

I wrote a linear algebra module with support for 2D,3D,4D vectors, 
quaternions, 3x2 and 4x4 matrices, all with template structs so I can 
declare them for float, double, or real components. I used SSE for the 
bigger operations which grew up the module size considerably. This is 
where I first started looking for SSE intrinsics. It would also be 
greatly helpful if the compiler could generate SSE code by itself, it 
would save a LOT of inline assembly for simple operations.



More information about the Digitalmars-d mailing list