Replacing C's memcpy with a D implementation
Temtaime
temtaime at gmail.com
Sun Jun 10 22:39:26 UTC 2018
On Sunday, 10 June 2018 at 22:23:08 UTC, Walter Bright wrote:
> On 6/10/2018 11:16 AM, David Nadlinger wrote:
>> Because of the large amounts of noise, the only conclusion one
>> can draw from this is that memcpyD is the slowest,
>
> Probably because it does a memory allocation.
>
>
>> followed by the ASM implementation.
>
> The CPU makers abandoned optimizing the REP instructions
> decades ago, and just left the clunky implementations there for
> backwards compatibility.
>
>
>> In fact, memcpyC and memcpyNaive produce exactly the same
>> machine code (without bounds checking), as LLVM recognizes the
>> loop and lowers it into a memcpy. memcpyDstdAlg instead gets
>> turned into a vectorized loop, for reasons I didn't
>> investigate any further.
>
> This amply illustrates my other point that looking at the
> assembler generated is crucial to understanding what's
> happening.
On some cpu architectures(for example intel atoms) rep movsb is
the fatest memcpy.
More information about the Digitalmars-d
mailing list