SIMD benchmark

Sun Jan 15 12:41:45 PST 2012

On 15 January 2012 19:01, bearophile <bearophileHUGS at lycos.com> wrote:
> Iain Buclaw:
>
>> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up
>> with -O2 and above.  My oh my...
>
> Please, show me the assembly code produced, with its relative D source :-)
>
> Bye,
> bearophile

D code:
----
import core.simd;

void test2a(float4 a) { }

float4 test2()
{
   float4 a = 1.2;
   a = a * 3 + 7;
   test2a(a);
   return a;
}
----

Relevant assembly:
----
.LC5:
        .long   1067030938
        .long   1067030938
        .long   1067030938
        .long   1067030938
        .section        .rodata.cst4,"aM", at progbits,4
        .align 4

_D4test5test2FZNhG4f:
        .cfi_startproc
        movl    $3, %eax
        cvtsi2ss        %eax, %xmm0
        movb    $7, %al
        cvtsi2ss        %eax, %xmm1
        unpcklps        %xmm0, %xmm0
        unpcklps        %xmm1, %xmm1
        movlhps %xmm0, %xmm0
        movlhps %xmm1, %xmm1
        mulps   .LC5(%rip), %xmm0
        addps   %xmm1, %xmm0
        ret
        .cfi_endproc
----

As someone pointed out to me, the only optimisation missing was
constant propagation, but that doesn't matter too much for now.

Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';