memcpy vs slice copy

Sergey Gromov snake.scaly at gmail.com
Mon Mar 16 05:58:35 PDT 2009


Mon, 16 Mar 2009 10:34:33 +0100, Don wrote:

> Sergey Gromov wrote:
>> Sun, 15 Mar 2009 13:17:50 +0000 (UTC), Moritz Warning wrote:
>> 
>>> On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote:
>>>
>>>> While doing some string processing I've seen some unusual timings
>>>> compared to the C code, so I have written this to see the situation
>>>> better. When USE_MEMCPY is false this little benchmark runs about 3+
>>>> times slower:
>>> I did a little benchmark:
>>>
>>> ldc -release -O5
>>> true: 0.51
>>> false: 0.63
>>>
>>> dmd -release -O
>>> true: 4.47
>>> false: 3.58
>>>
>>> I don't see a very big difference between slice copying and memcpy (but 
>>> between compilers).
>>>
>>> Btw.: http://www.digitalmars.com/pnews/read.php?
>>> server=news.digitalmars.com&group=digitalmars.D.bugs&artnum=14933
>> 
>> The original benchmark swapped insanely on my 1GB laptop so I've cut the
>> number of iterations in half, to 50_000_000.  Compiled with -O -release
>> -inline.  Results:
>> 
>> slice: 2.31
>> memcpy: 	0.73
>> 
>> That's 3 times difference.  Disassembly:
>> 
>> slice:
>> L31:            mov     ECX,EDX
>>                 mov     EAX,6
>>                 lea     ESI,010h[ESP]
>>                 mov     ECX,EAX
>>                 mov     EDI,EDX
>>                 rep
>>                 movsb
>>                 add     EDX,6
>>                 add     EBX,6
>>                 cmp     EBX,011E1A300h
>>                 jb      L31
>> 
>> memcpy:
>> L35:            push    6
>>                 lea     ECX,014h[ESP]
>>                 push    ECX
>>                 push    EBX
>>                 call    near ptr _memcpy
>>                 add     EBX,6
>>                 add     ESI,6
>>                 add     ESP,0Ch
>>                 cmp     ESI,011E1A300h
>>                 jb      L35
>> 
>> Seems like rep movsb is /way/ sub-optimal for copying data.
> 
> Definitely! The difference ought to be bigger than a factor of 3. Which 
> means that memcpy probably isn't anywhere near optimal, either.
> rep movsd is always 4 times quicker than rep movsb. There's a range of 
> lengths for which rep movsd is optimal; outside that range, there's are 
> other options which are even faster.
> 
> So there's a factor of 4-8 speedup available on most memory copies. 
> Low-hanging fruit! <g>

Don't disregard the function call overhead.  memcpy is called 50 M
times, copying only 6 bytes per call.



More information about the Digitalmars-d mailing list