Does dmd have SSE intrinsics?

Don nospam at nospam.com
Tue Sep 22 07:51:18 PDT 2009


bearophile wrote:
> Robert Jacques:
> 
>> Yes, but the unaligned version is slower, even for aligned data.
> 
> This is true today, but in future it may become a little less true, thanks to improvements in the CPUs.

The problem is that difference today is so extreme. On core2:
  movaps [mem128], xmm0; // aligned,   1 micro-op
  movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data!
In practice it's about an 8X speed difference!

On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
On i7, movups on aligned data is the same speed as movaps. It's still 
slower if it's an unaligned access.

It all depends on how important you think performance on Core2 and 
earlier Intel processors is.



More information about the Digitalmars-d mailing list