Does dmd have SSE intrinsics?
Jeremie Pelletier
jeremiep at gmail.com
Tue Sep 22 08:00:32 PDT 2009
Don wrote:
> bearophile wrote:
>> Robert Jacques:
>>
>>> Yes, but the unaligned version is slower, even for aligned data.
>>
>> This is true today, but in future it may become a little less true,
>> thanks to improvements in the CPUs.
>
> The problem is that difference today is so extreme. On core2:
> movaps [mem128], xmm0; // aligned, 1 micro-op
> movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data!
> In practice it's about an 8X speed difference!
>
> On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
> On i7, movups on aligned data is the same speed as movaps. It's still
> slower if it's an unaligned access.
>
> It all depends on how important you think performance on Core2 and
> earlier Intel processors is.
I wasn't aware of that, and here I was wondering why my SSE code was
slower than the FPU in certain places on my core2 quad, I now recall
using a lot of movups instructions, thanks for the tip.
More information about the Digitalmars-d
mailing list