Optimizing for SIMD: best practices?(i.e. what features are allowed?)
tsbockman
thomas.bockman at gmail.com
Sun Mar 7 23:09:33 UTC 2021
On Sunday, 7 March 2021 at 22:54:32 UTC, tsbockman wrote:
> import std.meta : Repeat;
> void euclideanDistanceFixedSizeArray(V)(ref Repeat!(3,
> const(V)) a, ref Repeat!(3, const(V)) b, out V result)
> if(is(V : __vector(float[length]), size_t length))
> ...
>
> Resulting asm with is(V == __vector(float[16])):
>
> .LCPI1_0:
> .long 0x7fc00000
> pure nothrow @nogc void
> app.euclideanDistanceFixedSizeArray!(__vector(float[16])).euclideanDistanceFixedSizeArray(ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), out __vector(float[16])):
> mov rax, qword ptr [rsp + 8]
> vbroadcastss zmm0, dword ptr [rip + .LCPI1_0]
> ...
Apparently the optimizer is too stupid to skip the redundant
float.nan broadcast when result is an `out` parameter, so just
make it `ref V result` instead for better code gen:
pure nothrow @nogc void
app.euclideanDistanceFixedSizeArray!(__vector(float[16])).euclideanDistanceFixedSizeArray(ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref const(__vector(float[16])), ref __vector(float[16])):
mov rax, qword ptr [rsp + 8]
vmovaps zmm0, zmmword ptr [rax]
vmovaps zmm1, zmmword ptr [r9]
vmovaps zmm2, zmmword ptr [r8]
vsubps zmm0, zmm0, zmmword ptr [rcx]
vmulps zmm0, zmm0, zmm0
vsubps zmm1, zmm1, zmmword ptr [rdx]
vsubps zmm2, zmm2, zmmword ptr [rsi]
vaddps zmm0, zmm0, zmm0
vfmadd231ps zmm0, zmm1, zmm1
vfmadd231ps zmm0, zmm2, zmm2
vmovaps zmmword ptr [rdi], zmm0
vsqrtps zmm0, zmm0
vmovaps zmmword ptr [rdi], zmm0
vzeroupper
ret
More information about the Digitalmars-d-learn
mailing list