Array append performance
bearophile
bearophileHUGS at lycos.com
Wed Aug 27 08:03:03 PDT 2008
Lionello Lunesu:
> I think it safe to conclude that memcpy (T2) is the fastest all-round
> solution. Only the inlined custom code can beat it.
> What's more, to use memcpy only one line in gc.d needs to be changed.
> Both T3 and T4 would need a change in the compiler.
>From more benchmarks I have seen that if you don't use ASM (with prefetching, etc) and/or more clever code, the following is the faster from DataType from ubyte to ulong:
template Range(int stop) {
static if (stop <= 0)
alias Tuple!() Range;
else
alias Tuple!(Range!(stop-1), stop-1) Range;
}
switch (i) {
case 0: break;
case 1: foreach (j; Range!(1)) a[j] = b[j]; break;
case 2: foreach (j; Range!(2)) a[j] = b[j]; break;
case 3: foreach (j; Range!(3)) a[j] = b[j]; break;
case 4: foreach (j; Range!(4)) a[j] = b[j]; break;
case 5: foreach (j; Range!(5)) a[j] = b[j]; break;
case 6: foreach (j; Range!(6)) a[j] = b[j]; break;
case 7: foreach (j; Range!(7)) a[j] = b[j]; break;
case 8: foreach (j; Range!(8)) a[j] = b[j]; break;
default: memcpy(a.ptr, b.ptr, a.length * DataType.sizeof);
}
A possible disadvantage may come from the fact that such code requires several instructions, so it increases the traffic in the L1 code cache.
Bye,
bearophile
More information about the Digitalmars-d
mailing list