Array append performance
Robert Jacques
sandford at jhu.edu
Sun Aug 24 21:47:20 PDT 2008
On Sun, 24 Aug 2008 22:14:36 -0400, bearophile <bearophileHUGS at lycos.com>
wrote:
> And, why the stride? I presume it's useful if you want a 2D matrix.
Yes, stride is a key enabler for integrating 2D/3D/nD matrices with
built-in arrays.
Other pros:
- Fast O(1) reversing of an array. Looping through an array in reverse was
a common enough operation to create an additional keyword and language
feature (foreach_reverse) specifically for it.
- Slicing out a struct element. i.e. slice the red channel out of an image
or the first name from an Employee array.
- Accessing every Nth element. This is a common MATLAB operation and often
used in visualization (i.e. a quick and simple resolution adjustment for a
line plot). Also, some parallel algorithms benefit from interleaving data.
This is a fairly standard multi-threading technique, particularly when
processing data[x] may take longer than processing data[y] (though it's
not the first thing I'd try).
And lastly, SSE works on 16-byte structs, so why not use the extra 4-bytes?
> It may be useful, but maybe it's overkill, so you may want to benchmark
> just (ptr, length, capacity) too.
For a 12-byte struct, either {ptr, length, capacity} or {ptr, length,
stride}, the performance decrease for -release -o -inline was -0.7%, 0.3%
and 0.2% for 1, 2 and 3 arrays respectively and -0%, 8% and 0.8%
respectively for debug.
> Another useful benchmark is to test a program that manages tons of short
> strings too, like this one:
> http://en.wikipedia.org/wiki/D_language#Example_2
Okay, looking just at the main loop (writefln is really slow) the
performance cost of the extra 4-bytes for a capacity field is about 1.7%
(naive cost, i.e no SSE, etc). I didn't actually implement a capacity
field, so this is a worst case.
> You may want to address some of the things superdan has said.
Superdan raised the good point that if the GC is aware of array length at
creation, it might compact the capacity to allocate something else. To the
best of my knowledge (which isn't saying much) both the Phobos and Tango
GC are block base, and so are un-aware of the array length. Also, Sean
Kelly wrote the Tango GC and isn't raising any red flags, which is
probably a good sign it's a valid optimization.
More information about the Digitalmars-d
mailing list