memcpy() comparison: C, Rust, and D

Tue Jan 31 05:50:33 PST 2017

On Tuesday, 31 January 2017 at 01:30:48 UTC, Walter Bright wrote:
> Just from D's type signature, we can know a lot about memcpy():
>
> 1. There are no side effects.
> 2. The return value is derived from s1.
> 3. Nothing s2 transitively points to is altered via s2.
> 4. Copies of s1 or s2 are not saved.
>
> The C declaration does not give us any of that info, although 
> the C description
> does give us 2, and the 'restrict' says that s1 and s2 do not 
> overlap.
>
> The Rust declaration does not give us 1, 2 or 4 (because it is 
> marked as unsafe). If it was safe, the declaration does not 
> give us 2.
>
> By this information being knowable from the declaration, the 
> compiler knows it too and can make use of it.

Well, I would not have taken memcpy as an example in favor of D. 
Good C compilers (like gcc) know what memcpy does and are able to 
optimize it according to its arguments. DMD may know better about 
memcpy through its declaration but does not make any use about it.

A simple example:
// cmemcpy.c
#include <string.h>
#include <stdio.h>

int main(void) {
	char a[16] = "world hello";	
	char b[16] = "";

	memcpy(b, a, 12);
	memcpy(b, a + 6, 5);
	memcpy(b + 6, a, 5);
	printf("%s -> %s\n", a, b);
}
//------------

gcc -Ofast produces the following code:
main:
.LFB0:
	.cfi_startproc
	subq	$40, %rsp
	.cfi_def_cfa_offset 48
	movl	$.LC0, %edi
	movabsq	$7307126011096887159, %rax
	movq	%rax, (%rsp)
	movq	%rsp, %rdx
	movq	%rax, 16(%rsp)
	leaq	16(%rsp), %rsi
	movq	$7302252, 24(%rsp)
	movl	22(%rsp), %eax
	movq	$0, 8(%rsp)
	movl	$7302252, 8(%rsp)
	movl	%eax, (%rsp)
	movzbl	26(%rsp), %eax
	movb	%al, 4(%rsp)
	movl	16(%rsp), %eax
	movl	%eax, 6(%rsp)
	movzbl	20(%rsp), %eax
	movb	%al, 10(%rsp)
	xorl	%eax, %eax
	call	printf
	xorl	%eax, %eax
	addq	$40, %rsp
	.cfi_def_cfa_offset 8
	ret

No call to memcpy, this has been optimized out by the compiler.

Now a D equivalent:
// dmemcpy.d
module dmemcpy;

import core.stdc.string, std.stdio;

void main() {
	char [16] a_ = "world hello", b_ = "";
	void* a = &a_[0], b = &b_[0];

	memcpy(b, a, 12);
	memcpy(b, a + 6, 5);
	memcpy(b + 6, a, 5);
	writefln("%s -> %s", a_, b_);
}
//--------------------

dmd -O -release -inline -boundscheck=off prouces the following 
asm:
_Dmain:
		push	RBP
		mov	RBP,RSP
		sub	RSP,020h
		lea	RSI,_TMP0 at PC32[RIP]
		lea	RDI,-020h[RBP]
		movsd
		movsd
		lea	RSI,_TMP0 at PC32[RIP]
		lea	RDI,-010h[RBP]
		movsd
		movsd
		mov	EDX,0Ch
		lea	RSI,-020h[RBP]
		lea	RDI,-010h[RBP]
		call	  memcpy at PLT32
		mov	EDX,5
		lea	RSI,-01Ah[RBP]
		lea	RDI,-010h[RBP]
		call	  memcpy at PLT32
		mov	EDX,5
		lea	RSI,-020h[RBP]
		lea	RDI,-0Ah[RBP]
		call	  memcpy at PLT32
		lea	RDX,_TMP0 at PC32[RIP]
		mov	EDI,8
		mov	RSI,RDX
		push	dword ptr -018h[RBP]
		push	dword ptr -020h[RBP]
		push	dword ptr -8[RBP]
		push	dword ptr -010h[RBP]
		call	  
_D3std5stdio27__T8writeflnTAyaTG16aTG16aZ8writeflnFNfAyaG16aG16aZv at PLT32
		add	RSP,020h
		xor	EAX,EAX
		mov	RSP,RBP
		pop	RBP
		ret

So with DMD, calls to memcpy are done verbatim, without any 
optimization :-(
To be fair, gdc will optimize the memcpy call out too.
But, my main argument here, is that a good C compiler, is able to 
do a very good job at optimizing memcpy, so the extra information 
brought by the D language, is not so useful in practice.