Array append performance
Sean Kelly
sean at invisibleduck.org
Wed Aug 27 08:13:54 PDT 2008
== Quote from bearophile (bearophileHUGS at lycos.com)'s article
> Lionello Lunesu:
> > I think it safe to conclude that memcpy (T2) is the fastest all-round
> > solution. Only the inlined custom code can beat it.
> > What's more, to use memcpy only one line in gc.d needs to be changed.
> > Both T3 and T4 would need a change in the compiler.
> From more benchmarks I have seen that if you don't use ASM (with prefetching, etc) and/or more clever code, the following
is the faster from DataType from ubyte to ulong:
> template Range(int stop) {
> static if (stop <= 0)
> alias Tuple!() Range;
> else
> alias Tuple!(Range!(stop-1), stop-1) Range;
> }
> switch (i) {
> case 0: break;
> case 1: foreach (j; Range!(1)) a[j] = b[j]; break;
> case 2: foreach (j; Range!(2)) a[j] = b[j]; break;
> case 3: foreach (j; Range!(3)) a[j] = b[j]; break;
> case 4: foreach (j; Range!(4)) a[j] = b[j]; break;
> case 5: foreach (j; Range!(5)) a[j] = b[j]; break;
> case 6: foreach (j; Range!(6)) a[j] = b[j]; break;
> case 7: foreach (j; Range!(7)) a[j] = b[j]; break;
> case 8: foreach (j; Range!(8)) a[j] = b[j]; break;
> default: memcpy(a.ptr, b.ptr, a.length * DataType.sizeof);
> }
Why not just change that to this instead:
switch(i) {
case 8: a[7] = b[7];
case 7: a[6] = b[6];
case 6: a[5] = b[5];
case 5: a[4] = b[4];
case 4: a[3] = b[3];
case 3: a[2] = b[2];
case 1: a[0] = b[0];
case 0: break;
default: memcpy( a.ptr, b.ptr, a.length * DataType.sizeof );
}
The Range stuff is cool and all, but not really necessary :-)
Sean
> A possible disadvantage may come from the fact that such code requires several instructions, so it increases the traffic in the
L1 code cache.
> Bye,
> bearophile
More information about the Digitalmars-d
mailing list