Struct copies
Kai Nacke
kai at redstar.de
Sun Jan 26 23:00:17 PST 2014
On Sunday, 26 January 2014 at 13:02:50 UTC, bearophile wrote:
>
> In the case of swapping Foos why isn't LLVM optimizing the swap
> function to a shorter asm like swap2? I have asked this on the
> LLVM IRC channel, and aKor has told me that similar C code
> Clang on swaps two Foo using a memcpy so uses a single 32 bit
> copy. So perhaps ldc2 can do the same for this common case.
>
Hi bearophile!
In fact, ldc uses llvm.memcpy in the swap function. This is what
I get with ldc 0.13.0-alpha1 using LLVM 3.4 on mingw32 with no
optimization:
define weak_odr x86_stdcallcc void
@"\01__D4swap20__T4swapTS4swap3FooZ4swapFNaNbNfKS4swap3FooKS4swap3FooZv"(%swap.Foo*
inreg %y_arg, %swap.Foo* %x_arg) {
entry:
%aux = alloca %swap.Foo, align 2
%tmp = bitcast %swap.Foo* %aux to i8*
%tmp1 = bitcast %swap.Foo* %x_arg to i8*
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp, i8* %tmp1, i32
4, i32 1, i1 false)
%tmp2 = load %swap.Foo* %aux
%tmp3 = bitcast %swap.Foo* %x_arg to i8*
%tmp4 = bitcast %swap.Foo* %y_arg to i8*
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp3, i8* %tmp4, i32
4, i32 1, i1 false)
%tmp5 = load %swap.Foo* %x_arg
%tmp6 = bitcast %swap.Foo* %y_arg to i8*
%tmp7 = bitcast %swap.Foo* %aux to i8*
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp6, i8* %tmp7, i32
4, i32 1, i1 false)
%tmp8 = load %swap.Foo* %y_arg
ret void
}
Using -O2 or -O3, I get IR and ASM similar to the one you posted.
I do not understand this. I'll check what clang is doing here.
Regards,
Kai
More information about the digitalmars-d-ldc
mailing list