Strange instruction sequence with DMD while calling functions with float parameters

Basile B. b2.temp at gmx.com
Sat Feb 15 02:43:12 UTC 2020


On Friday, 14 February 2020 at 22:36:20 UTC, PatateVerte wrote:
> Hello
> I noticed a strange behaviour of the DMD compiler when it has 
> to call a function with float arguments.
>
> I build with the flags "-mcpu=avx2 -O  -m64" under windows 64 
> bits using "DMD32 D Compiler v2.090.1-dirty"
>
> I have the following function :
>    float mul_add(float a, float b, float c); //Return a * b + c
>
> When I try to call it :
>    float f = d_mul_add(1.0, 2.0, 3.0);
>
> I tested with other functions with float parameters, and there 
> is the same problem.
>
> Then the following instructions are generated :
>         //Loads the values, as it can be expected
>    	vmovss xmm2,dword [rel 0x64830]
> 	vmovss xmm1,dword [rel 0x64834]
> 	vmovss xmm0,dword [rel 0x64838]
>         //Why ?
> 	movq r8,xmm2
> 	movq rdx,xmm1
> 	movq rcx,xmm0
>         //
> 	call 0x400   //0x400 is where the mul_add function is located
>
> My questions are :
>  - Is there a reason why the registers xmm0/1/2 are saved in 
> rcx/rdx/r8 before calling ? The calling convention specifies 
> that the floating point parameters have to be put in xmm 
> registers, and not GPR, unless you are using your own calling 
> convention.
>  - Why is it done using non-avx instructions ? Mixing AVX and 
> non-AVX instructions may impact the speed greatly.
>
> Any idea ? Thank you in advance.

It's simply the bad codegen (or rather a missed opportunity to 
optimize) from DMD, its backend doesn't see that the parameters 
are already in the right order and in the right registers so it 
copy them and put them in the regs for the inner func call.

I had observed this in the past too, i.e unexplained round 
tripping from GP to SSE regs. For good FP codegen use LDC2 or GDC 
or write iasm (but loose inlining).

For other people who'd like to observe the problem: 
https://godbolt.org/z/gvqEqz.
By the way I had to deactivate AVX2 targeting because otherwise 
the result is even more weird (https://godbolt.org/z/T9NwMc)


More information about the Digitalmars-d-learn mailing list