[Issue 2313] Poor array ~= append performance

Don nospam at nospam.com.au
Wed Aug 27 03:06:02 PDT 2008


d-bugmail at puremagic.com wrote:
> http://d.puremagic.com/issues/show_bug.cgi?id=2313
> 
> 
> 
> 
> 
> ------- Comment #4 from bugzilla at digitalmars.com  2008-08-27 01:05 -------
> Why is byte[] = byte[] slower than memcpy? The answer isn't very simple.
> Consider this program:
> 
> import std.c.string;
> 
> long timer()
> {
>     asm
>     {   naked                   ;
>         rdtsc                   ;
>         ret                     ;
>     }
> }
> 
> void test1(byte[] a, byte[] b)
> {
>     a[] = b[];
> }
> 
> void test2(byte[] a, byte[] b)
> {
>     memcpy(a.ptr, b.ptr, a.length);
> }
> 
> void main()
> {
>     for (int i = 4; i < 100_000_000; i *= 2)
>     {
>         auto a = new byte[i];
>         auto b = new byte[i];
> 
>         auto start = timer();
>         test1(a, b);
>         auto end = timer();
>         auto r1 = end - start;
> 
>         start = timer();
>         test2(a, b);
>         end = timer();
>         auto r2 = end - start;
> 
>         printf("i: %8d,\t[]=[]: %8lld,\tmemcpy: %8lld\n", i, r1, r2);
>     }
> }
> 
> Running this program produces:
> 
> i:        4,    []=[]:      144,        memcpy:      568
> i:        8,    []=[]:      144,        memcpy:      300
> i:       16,    []=[]:      172,        memcpy:      324
> i:       32,    []=[]:      204,        memcpy:      344
> i:       64,    []=[]:      288,        memcpy:      276
> i:      128,    []=[]:      288,        memcpy:      272
> i:      256,    []=[]:      352,        memcpy:      364
> i:      512,    []=[]:      372,        memcpy:      424
> i:     1024,    []=[]:      552,        memcpy:      564
> i:     2048,    []=[]:      684,        memcpy:     1384
> i:     4096,    []=[]:     1344,        memcpy:     1772
> i:     8192,    []=[]:     2900,        memcpy:     3216
> i:    16384,    []=[]:     5292,        memcpy:     5472
> i:    32768,    []=[]:    11496,        memcpy:    10388
> i:    65536,    []=[]:    29484,        memcpy:    27480
> i:   131072,    []=[]:   110464,        memcpy:    67784
> i:   262144,    []=[]:   655580,        memcpy:   562400
> i:   524288,    []=[]:  1204124,        memcpy:  1107256
> i:  1048576,    []=[]:  2364588,        memcpy:  2272552
> i:  2097152,    []=[]:  4516440,        memcpy:  4417764
> i:  4194304,    []=[]:  8996992,        memcpy:  8817176
> i:  8388608,    []=[]: 20223908,        memcpy: 17717748
> i: 16777216,    []=[]: 35774952,        memcpy: 36094652
> i: 33554432,    []=[]: 71008068,        memcpy: 71246896
> i: 67108864,    []=[]: 142982284,       memcpy: 145473300
> 
> There's not much of a consistent conclusion to be drawn from that.
Except, we can conclude that
(1) Walter's machine has a 64Kb L1 data cache. The penalty for a cache 
miss is 1.5 clocks. It's probably an AMD CPU. Judging by the timing, it 
looks like a K8 (Hammer) <g>
(2) neither a[] = b[], nor memcpy(), attempt to optimise for cache 
misses. Both look like rep movsd; to me.

BTW,
(3) rtdsc doesn't serialise, so the counts for low numbers are pretty 
much garbage. You need to stick a mov EAX, 0; cpuid; in there.
(4) cache effects are giving memcpy a big advantage. If you swap the 
order of test1 and test2, you'll probably find the order reverses.

There's potential to do something about (2). Not easy though.


More information about the Digitalmars-d-bugs mailing list