memcpy vs slice copy
Sergey Gromov
snake.scaly at gmail.com
Mon Mar 16 05:58:35 PDT 2009
Mon, 16 Mar 2009 10:34:33 +0100, Don wrote:
> Sergey Gromov wrote:
>> Sun, 15 Mar 2009 13:17:50 +0000 (UTC), Moritz Warning wrote:
>>
>>> On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote:
>>>
>>>> While doing some string processing I've seen some unusual timings
>>>> compared to the C code, so I have written this to see the situation
>>>> better. When USE_MEMCPY is false this little benchmark runs about 3+
>>>> times slower:
>>> I did a little benchmark:
>>>
>>> ldc -release -O5
>>> true: 0.51
>>> false: 0.63
>>>
>>> dmd -release -O
>>> true: 4.47
>>> false: 3.58
>>>
>>> I don't see a very big difference between slice copying and memcpy (but
>>> between compilers).
>>>
>>> Btw.: http://www.digitalmars.com/pnews/read.php?
>>> server=news.digitalmars.com&group=digitalmars.D.bugs&artnum=14933
>>
>> The original benchmark swapped insanely on my 1GB laptop so I've cut the
>> number of iterations in half, to 50_000_000. Compiled with -O -release
>> -inline. Results:
>>
>> slice: 2.31
>> memcpy: 0.73
>>
>> That's 3 times difference. Disassembly:
>>
>> slice:
>> L31: mov ECX,EDX
>> mov EAX,6
>> lea ESI,010h[ESP]
>> mov ECX,EAX
>> mov EDI,EDX
>> rep
>> movsb
>> add EDX,6
>> add EBX,6
>> cmp EBX,011E1A300h
>> jb L31
>>
>> memcpy:
>> L35: push 6
>> lea ECX,014h[ESP]
>> push ECX
>> push EBX
>> call near ptr _memcpy
>> add EBX,6
>> add ESI,6
>> add ESP,0Ch
>> cmp ESI,011E1A300h
>> jb L35
>>
>> Seems like rep movsb is /way/ sub-optimal for copying data.
>
> Definitely! The difference ought to be bigger than a factor of 3. Which
> means that memcpy probably isn't anywhere near optimal, either.
> rep movsd is always 4 times quicker than rep movsb. There's a range of
> lengths for which rep movsd is optimal; outside that range, there's are
> other options which are even faster.
>
> So there's a factor of 4-8 speedup available on most memory copies.
> Low-hanging fruit! <g>
Don't disregard the function call overhead. memcpy is called 50 M
times, copying only 6 bytes per call.
More information about the Digitalmars-d
mailing list