Does dmd have SSE intrinsics?

Robert Jacques sandford at jhu.edu
Tue Sep 22 09:38:09 PDT 2009


On Tue, 22 Sep 2009 12:09:23 -0400, Jeremie Pelletier <jeremiep at gmail.com>  
wrote:

> #ponce wrote:
>>>> In practice it's about an 8X speed difference!
>>>>
>>>> On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
>>>> On i7, movups on aligned data is the same speed as movaps. It's still  
>>>> slower if it's an unaligned access.
>>>>
>>>> It all depends on how important you think performance on Core2 and  
>>>> earlier Intel processors is.
>>> I wasn't aware of that, and here I was wondering why my SSE code was  
>>> slower than the FPU in certain places on my core2 quad, I now recall  
>>> using a lot of movups instructions, thanks for the tip.
>>  Indeed SSE is known to be overkill when dealing with unaligned data.
>> In C++ writing SSE code is so painful you either have to use intrisics,  
>> or use libraries like Eigen (a SIMD vectorization library based on  
>> expression templates, which can generate SSE, AVX or FPU code). But  
>> using such a library is often way too intrusive, and alignement is not  
>> in standard C++.
>>  D does already understand arrays operations like Eigen do, in order to  
>> increase cacheability. It would be great if it could statically detect  
>> 16-byte aligned data and perform SSE when possible (though there must  
>> be many others things to do :) ).
>
> The D memory manager already aligns data on 16 bytes boundaries. The  
> only case I can think of right now is when data is in a struct or class:
>
> struct {
> 	float[4] vec; // aligned!
> 	int a;
> 	float[4] vec; // unaligned!
> }

Yes, although classes have hidden vars, which are runtime dependent,  
changing the offset. Structs may be embedded in other things (therefore  
offset). And then there's the whole slicing from an array issue.



More information about the Digitalmars-d mailing list