Replacing C's memcpy with a D implementation

Sun Jun 10 22:39:26 UTC 2018

On Sunday, 10 June 2018 at 22:23:08 UTC, Walter Bright wrote:
> On 6/10/2018 11:16 AM, David Nadlinger wrote:
>> Because of the large amounts of noise, the only conclusion one 
>> can draw from this is that memcpyD is the slowest,
>
> Probably because it does a memory allocation.
>
>
>> followed by the ASM implementation.
>
> The CPU makers abandoned optimizing the REP instructions 
> decades ago, and just left the clunky implementations there for 
> backwards compatibility.
>
>
>> In fact, memcpyC and memcpyNaive produce exactly the same 
>> machine code (without bounds checking), as LLVM recognizes the 
>> loop and lowers it into a memcpy. memcpyDstdAlg instead gets 
>> turned into a vectorized loop, for reasons I didn't 
>> investigate any further.
>
> This amply illustrates my other point that looking at the 
> assembler generated is crucial to understanding what's 
> happening.

On some cpu architectures(for example intel atoms) rep movsb is 
the fatest memcpy.