memcpy vs slice copy

Don nospam at nospam.com
Mon Mar 16 02:34:33 PDT 2009


Sergey Gromov wrote:
> Sun, 15 Mar 2009 13:17:50 +0000 (UTC), Moritz Warning wrote:
> 
>> On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote:
>>
>>> While doing some string processing I've seen some unusual timings
>>> compared to the C code, so I have written this to see the situation
>>> better. When USE_MEMCPY is false this little benchmark runs about 3+
>>> times slower:
>> I did a little benchmark:
>>
>> ldc -release -O5
>> true: 0.51
>> false: 0.63
>>
>> dmd -release -O
>> true: 4.47
>> false: 3.58
>>
>> I don't see a very big difference between slice copying and memcpy (but 
>> between compilers).
>>
>> Btw.: http://www.digitalmars.com/pnews/read.php?
>> server=news.digitalmars.com&group=digitalmars.D.bugs&artnum=14933
> 
> The original benchmark swapped insanely on my 1GB laptop so I've cut the
> number of iterations in half, to 50_000_000.  Compiled with -O -release
> -inline.  Results:
> 
> slice: 2.31
> memcpy: 	0.73
> 
> That's 3 times difference.  Disassembly:
> 
> slice:
> L31:            mov     ECX,EDX
>                 mov     EAX,6
>                 lea     ESI,010h[ESP]
>                 mov     ECX,EAX
>                 mov     EDI,EDX
>                 rep
>                 movsb
>                 add     EDX,6
>                 add     EBX,6
>                 cmp     EBX,011E1A300h
>                 jb      L31
> 
> memcpy:
> L35:            push    6
>                 lea     ECX,014h[ESP]
>                 push    ECX
>                 push    EBX
>                 call    near ptr _memcpy
>                 add     EBX,6
>                 add     ESI,6
>                 add     ESP,0Ch
>                 cmp     ESI,011E1A300h
>                 jb      L35
> 
> Seems like rep movsb is /way/ sub-optimal for copying data.

Definitely! The difference ought to be bigger than a factor of 3. Which 
means that memcpy probably isn't anywhere near optimal, either.
rep movsd is always 4 times quicker than rep movsb. There's a range of 
lengths for which rep movsd is optimal; outside that range, there's are 
other options which are even faster.

So there's a factor of 4-8 speedup available on most memory copies. 
Low-hanging fruit! <g>





More information about the Digitalmars-d mailing list