SIMD benchmark
Iain Buclaw
ibuclaw at ubuntu.com
Sun Jan 15 12:41:45 PST 2012
On 15 January 2012 19:01, bearophile <bearophileHUGS at lycos.com> wrote:
> Iain Buclaw:
>
>> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up
>> with -O2 and above. My oh my...
>
> Please, show me the assembly code produced, with its relative D source :-)
>
> Bye,
> bearophile
D code:
----
import core.simd;
void test2a(float4 a) { }
float4 test2()
{
float4 a = 1.2;
a = a * 3 + 7;
test2a(a);
return a;
}
----
Relevant assembly:
----
.LC5:
.long 1067030938
.long 1067030938
.long 1067030938
.long 1067030938
.section .rodata.cst4,"aM", at progbits,4
.align 4
_D4test5test2FZNhG4f:
.cfi_startproc
movl $3, %eax
cvtsi2ss %eax, %xmm0
movb $7, %al
cvtsi2ss %eax, %xmm1
unpcklps %xmm0, %xmm0
unpcklps %xmm1, %xmm1
movlhps %xmm0, %xmm0
movlhps %xmm1, %xmm1
mulps .LC5(%rip), %xmm0
addps %xmm1, %xmm0
ret
.cfi_endproc
----
As someone pointed out to me, the only optimisation missing was
constant propagation, but that doesn't matter too much for now.
Regards
--
Iain Buclaw
*(p < e ? p++ : p) = (c & 0x0f) + '0';
More information about the Digitalmars-d
mailing list