Array append performance

Wed Aug 27 08:03:03 PDT 2008

Lionello Lunesu:

> I think it safe to conclude that memcpy (T2) is the fastest all-round 
> solution. Only the inlined custom code can beat it.
> What's more, to use memcpy only one line in gc.d needs to be changed. 
> Both T3 and T4 would need a change in the compiler.

>From more benchmarks I have seen that if you don't use ASM (with prefetching, etc) and/or more clever code, the following is the faster from DataType from ubyte to ulong:

template Range(int stop) {
    static if (stop <= 0)
        alias Tuple!() Range;
    else
        alias Tuple!(Range!(stop-1), stop-1) Range;
}

switch (i) {
    case 0: break;
    case 1: foreach (j; Range!(1)) a[j] = b[j]; break;
    case 2: foreach (j; Range!(2)) a[j] = b[j]; break;
    case 3: foreach (j; Range!(3)) a[j] = b[j]; break;
    case 4: foreach (j; Range!(4)) a[j] = b[j]; break;
    case 5: foreach (j; Range!(5)) a[j] = b[j]; break;
    case 6: foreach (j; Range!(6)) a[j] = b[j]; break;
    case 7: foreach (j; Range!(7)) a[j] = b[j]; break;
    case 8: foreach (j; Range!(8)) a[j] = b[j]; break;
    default: memcpy(a.ptr, b.ptr, a.length * DataType.sizeof);
}

A possible disadvantage may come from the fact that such code requires several instructions, so it increases the traffic in the L1 code cache.

Bye,
bearophile