Replacing C's memcpy with a D implementation
Walter Bright
newshound2 at digitalmars.com
Sun Jun 10 22:23:08 UTC 2018
On 6/10/2018 11:16 AM, David Nadlinger wrote:
> Because of the large amounts of noise, the only conclusion one can draw from
> this is that memcpyD is the slowest,
Probably because it does a memory allocation.
> followed by the ASM implementation.
The CPU makers abandoned optimizing the REP instructions decades ago, and just
left the clunky implementations there for backwards compatibility.
> In fact, memcpyC and memcpyNaive produce exactly the same machine code (without
> bounds checking), as LLVM recognizes the loop and lowers it into a memcpy.
> memcpyDstdAlg instead gets turned into a vectorized loop, for reasons I didn't
> investigate any further.
This amply illustrates my other point that looking at the assembler generated is
crucial to understanding what's happening.
More information about the Digitalmars-d
mailing list