Array append performance

Sean Kelly sean at invisibleduck.org
Wed Aug 27 08:13:54 PDT 2008


== Quote from bearophile (bearophileHUGS at lycos.com)'s article
> Lionello Lunesu:
> > I think it safe to conclude that memcpy (T2) is the fastest all-round
> > solution. Only the inlined custom code can beat it.
> > What's more, to use memcpy only one line in gc.d needs to be changed.
> > Both T3 and T4 would need a change in the compiler.
> From more benchmarks I have seen that if you don't use ASM (with prefetching, etc) and/or more clever code, the following
is the faster from DataType from ubyte to ulong:
> template Range(int stop) {
>     static if (stop <= 0)
>         alias Tuple!() Range;
>     else
>         alias Tuple!(Range!(stop-1), stop-1) Range;
> }
> switch (i) {
>     case 0: break;
>     case 1: foreach (j; Range!(1)) a[j] = b[j]; break;
>     case 2: foreach (j; Range!(2)) a[j] = b[j]; break;
>     case 3: foreach (j; Range!(3)) a[j] = b[j]; break;
>     case 4: foreach (j; Range!(4)) a[j] = b[j]; break;
>     case 5: foreach (j; Range!(5)) a[j] = b[j]; break;
>     case 6: foreach (j; Range!(6)) a[j] = b[j]; break;
>     case 7: foreach (j; Range!(7)) a[j] = b[j]; break;
>     case 8: foreach (j; Range!(8)) a[j] = b[j]; break;
>     default: memcpy(a.ptr, b.ptr, a.length * DataType.sizeof);
> }

Why not just change that to this instead:

switch(i) {
    case 8: a[7] = b[7];
    case 7: a[6] = b[6];
    case 6: a[5] = b[5];
    case 5: a[4] = b[4];
    case 4: a[3] = b[3];
    case 3: a[2] = b[2];
    case 1: a[0] = b[0];
    case 0: break;
    default: memcpy( a.ptr, b.ptr, a.length * DataType.sizeof );
}

The Range stuff is cool and all, but not really necessary :-)


Sean
> A possible disadvantage may come from the fact that such code requires several instructions, so it increases the traffic in the
L1 code cache.
> Bye,
> bearophile





More information about the Digitalmars-d mailing list