Does dmd have SSE intrinsics?

Mon Sep 21 18:58:26 PDT 2009

On Mon, 21 Sep 2009 18:32:50 -0400, Jeremie Pelletier <jeremiep at gmail.com>  
wrote:

> bearophile wrote:
>> Don:
>>> (1) They don't take advantage of fixed-length arrays. In particular,  
>>> operations on float[4] should be a single SSE instruction (no function  
>>> call, no loop, nothing). This will make a huge difference to game and  
>>> graphics programmers, I believe.
>> [...]
>>> It's issue (1) which is the killer.
>>  In my answer I have forgotten to say another small thing.
>>  The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I  
>> may like to add a second argument to such GC malloc, to specify the  
>> alignment, this can be used to save some memory when the alignment  
>> isn't necessary), while I think the std.c.stdlib.malloc() doesn't give  
>> pointers aligned to 16 bytes.
>>  In the following code if you want to implement the last line with one  
>> vector instruction then a and b arrays have to be aligned to 16 bytes.  
>> I think that currently LDC doesn't align a and b to 16 bytes.
>>  float[4] a = [1.f, 2., 3., 4.];
>> float[4] b[] = 10f;
>> float[4] c[] = a[] + b[];
>>  So you may need a syntax like the following, that's not handy:
>>  align(16) float[4] a = [1.f, 2., 3., 4.];
>> align(16) float[4] b[] = 10f;
>> align(16) float[4] c[] = a[] + b[];
>>  A possible solution is to automatically align to 16 (by default, but  
>> it can be changed to save stack space in specific situations) all  
>> static arrays allocated on the stack too :-)
>> A note: in future probably CPU vector instructions will relax their  
>> alignment requirements... it's already happening.
>>  Bye,
>> bearophile
>
> That 16bytes alignment is a restriction of the current usage of bit  
> fields. Since every bit in the field indexes a single 16bytes block, a  
> simple shift 4 bits to the right translate a pointer into its index in  
> the bit field. You could align on 4 bytes boundaries but at the cost of  
> doubling the size of bit fields, and possibly having slower collection  
> runs.
>
> Doesn't SSE have aligned and unaligned versions of its move  
> instructions? like MOVAPS and MOVUPS.

Yes, but the unaligned version is slower, even for aligned data.

Also, another issue for game/graphic/robotic programmers is the ability to  
return fixed length arrays from functions. Though struct wrappers  
mitigates this.