Array append performance

Mon Aug 25 13:37:21 PDT 2008

On Mon, 25 Aug 2008 22:25:42 +0400, Sean Kelly <sean at invisibleduck.org>  
wrote:

> superdan wrote:
>> Sean Kelly Wrote:
>>
>>> == Quote from superdan (super at dan.org)'s article
>>>> Jerry Quinn Wrote:
>>>>> dsimcha Wrote:
>>>>>
>>>>>> I just ran into this yesterday porting C++ code that used  
>>>>>> push_back() heavily to
>>>>>> D.  Ended up rewriting the inner loops to pre-allocate.  The D code  
>>>>>> is now
>>>>>> actually faster than the C++ code, but the inner loops are less  
>>>>>> readable.  For a
>>>>>> whopping overhead of 4 extra bytes per array (on 32-bit), why not  
>>>>>> include the
>>>>>>
>>>>>> Other than the 4 bytes of extra overhead, does anyone see any  
>>>>>> downside to
>>>>>> including a capacity field?  Also, is there some situation I'm not  
>>>>>> thinking of
>>>>>> where the 4 bytes actually matter more than the speed gains in  
>>>>>> question?
>>>>> For starters, you need 8 bytes on 64 bit systems.  I suspect a more  
>>>>> important reason might be
>>> making slice operations more difficult to implement correctly.
>>>>> That said, I think I still favor it, since I find I need to do  
>>>>> iterative appends to arrays fairly often.
>>> Otherwise, it noticably restricts the places I can use built-in arrays.
>>>> sorry to always be the one who puts the moose on the table. y'all who  
>>>> want to make capacity a field in
>>> the array are off base.
>>>> today each slice has length and data. why. because all a slice is  
>>>> *is* length and data. it's a view of a
>>> portion of memory. the slice don't care where memory came from.  
>>> capacity is part of the memory that
>>> slices come from. it would be shared by several slices. and all of a  
>>> sudden you realize capacity can't be
>>> a field in the slice.
>>>
>>> I'm not arguing in favor of a capacity field, but it could be a field  
>>> in the slice as long as it's the same
>>> value as the length.  For example, it's legal to append to a slice in  
>>> D right now, but doing so always
>>> reallocates because resizing a slice in place would be bad.
>>  then we have a field hanging in there that's only valid sometimes.  
>> that won't quite work. will it be copyable? like when you copy a slice  
>> to another. what happens with the capacity field?
>
> That's the weird bit.  The capacity of any copy of a slice or an array  
> should be the length of the copy, so copying couldn't be done via memcpy  
> on the array ref.  More simply, you could probably just default the  
> capacity to zero, but I don't like the idea of violating the logical  
> invariant for array refs just to make copying / initialization easier.  
> And the weirdness of this suggests to me (and to you as well I presume)  
> that perhaps adding a capacity field isn't proper for built-in arrays.
>
>
> Sean

Both Java and C# strings aren't supposed to be fast on appending, that's  
why they both have StringBuilders (which has a capacity field). The same  
can be applied to other array types, a library ArrayBuilder class is a  
good solution, I think.