Array append performance

Sun Aug 24 19:00:30 PDT 2008

On Sat, 23 Aug 2008 13:16:26 -0400, Steven Schveighoffer  
<schveiguy at yahoo.com> wrote:
> The problem is that many times I don't append to an array or slice, why
> should I have to accept the cost of an extra 4-8 byte field for every  
> slice
> and array that isn't going to change size (which I would argue is used  
> more
> often)?

I've written some micro benchmarks to test out the difference between an  
8-byte struct (i.e. the current arrays) and a 16-byte struct (ptr, length,  
capacity, stride) with respect to function call performance. I wrote 2  
benchmarks: 1) an actual function call (which calculated the exponential)  
and 2) an optimized struct copy (i.e. simulating pass by value).
In release -o or debug mode, (1) had performance decreases of -5.1%, 9.8%  
and 1.8% for passing 1, 2 and 3 structs respectively. When -inline is  
enabled this changes to 0.5%, 2.5% and 5% respectively. For the (2)  
benchmark, I looked at using mmx and sse2 instructions to accelerate the  
struct copy. Using sse2 instructions provided a 15% performance gain using  
unaligned structs and an 84% gain using aligned structs. 16-byte SSE2  
copies beat single 8-byte MMX copies by 9% (both aligned), though dual MMX  
(i.e. a 16-byte copy) still beat the built-in copy by 61% (again aligned).  
(All in -release -o -inline, on a Core2 CPU)

So the performance cost, even for very small workloads, looks to be pretty  
minor and with some work performance could actually increase. (I'm  
assuming the use of sse/mmx/AltiVec for memcopy acceleration would be  
enabled by a complier flag on 32-bit)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hello.d
Type: application/octet-stream
Size: 5966 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20080824/e032da3e/attachment.obj>