Struct copies

Sun Jan 26 23:00:17 PST 2014

On Sunday, 26 January 2014 at 13:02:50 UTC, bearophile wrote:
>
> In the case of swapping Foos why isn't LLVM optimizing the swap 
> function to a shorter asm like swap2? I have asked this on the 
> LLVM IRC channel, and aKor has told me that similar C code 
> Clang on swaps two Foo using a memcpy so uses a single 32 bit 
> copy. So perhaps ldc2 can do the same for this common case.
>

Hi bearophile!

In fact, ldc uses llvm.memcpy in the swap function. This is what 
I get with ldc 0.13.0-alpha1 using LLVM 3.4 on mingw32 with no 
optimization:

define weak_odr x86_stdcallcc void 
@"\01__D4swap20__T4swapTS4swap3FooZ4swapFNaNbNfKS4swap3FooKS4swap3FooZv"(%swap.Foo* 
inreg %y_arg, %swap.Foo* %x_arg) {
entry:
   %aux = alloca %swap.Foo, align 2
   %tmp = bitcast %swap.Foo* %aux to i8*
   %tmp1 = bitcast %swap.Foo* %x_arg to i8*
   call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp, i8* %tmp1, i32 
4, i32 1, i1 false)
   %tmp2 = load %swap.Foo* %aux
   %tmp3 = bitcast %swap.Foo* %x_arg to i8*
   %tmp4 = bitcast %swap.Foo* %y_arg to i8*
   call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp3, i8* %tmp4, i32 
4, i32 1, i1 false)
   %tmp5 = load %swap.Foo* %x_arg
   %tmp6 = bitcast %swap.Foo* %y_arg to i8*
   %tmp7 = bitcast %swap.Foo* %aux to i8*
   call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp6, i8* %tmp7, i32 
4, i32 1, i1 false)
   %tmp8 = load %swap.Foo* %y_arg
   ret void
}

Using -O2 or -O3, I get IR and ASM similar to the one you posted. 
I do not understand this. I'll check what clang is doing here.

Regards,
Kai