Replacing C's memcpy with a D implementation
David Nadlinger
code at klickverbot.at
Sun Jun 10 23:39:13 UTC 2018
On Sunday, 10 June 2018 at 22:23:08 UTC, Walter Bright wrote:
> On 6/10/2018 11:16 AM, David Nadlinger wrote:
>> Because of the large amounts of noise, the only conclusion one
>> can draw from this is that memcpyD is the slowest,
>
> Probably because it does a memory allocation.
Of course; that was already pointed out earlier in the thread.
> The CPU makers abandoned optimizing the REP instructions
> decades ago, and just left the clunky implementations there for
> backwards compatibility.
That's not entirely true. Intel started optimising some of the
REP string instructions again on Ivy Bridge and above. There is a
CPUID bit to indicate that (ERMS?); I'm sure the Optimization
Manual has further details. From what I remember, `rep movsb` is
supposed to beat an AVX loop on most recent Intel µarchs if the
destination is aligned and the data is longer than a few cache
lines. I've never measured that myself, though.
— David
More information about the Digitalmars-d
mailing list