[D-runtime] Fw: [phobos] newCapacity function in array appending

Fawzi Mohamed fawzi at gmx.ch
Fri Jul 16 03:49:07 PDT 2010


On 15-lug-10, at 15:45, Steve Schveighoffer wrote:
>
> ----- Original Message ----
>> From: Sean Kelly <sean at invisibleduck.org>
>> To: D's runtime library developers list <d-runtime at puremagic.com>
>> Sent: Wed, July 14, 2010 4:54:35 PM
>> Subject: Re: [D-runtime] Fw: [phobos] newCapacity function in array  
>> appending
>>
>> On Jul 14, 2010, at 6:19 AM, Steve Schveighoffer wrote:
>>>
>>> I  assume the newCapacity function is used to avoid constantly  
>>> looking for
>> new
>>
>>> memory to allocate.  The theory being if you are appending, you   
>>> don't want
>> to
>>
>>> just allocate exactly what you need, you want to  preallocate a  
>>> little more.
>>
>> What gets me is that the GC already does this,  since it uses  
>> blocks that are
>> powers of two in size, and then page-sized chunks  (which may again  
>> have extra
>> space allocated at the end based on some qualities  of the alloc  
>> request).  I
>> can see newCapacity allocating 4x extra on an int  append than a  
>> byte append,
>> but the other logic has always seemed a bit  overthought to me.   
>> But then I
>> never sat down and tested it so I figured  maybe it's actually good  
>> in
>> practice.  I didn't notice the screwed up  growth factor based on  
>> element size
>> though.
>
> Any appends of data < page do not use newCapacity, it's only for  
> page size and
> above.  And then, it was only when allocating a brand new block.   
> From the bug
> report you can see it grew linearly by a page each time, until it  
> needed to
> reallocate, and then the growth was ridiculous (200% growth!).
>
> My new checked in code sort of hovers around 40% growth for larger  
> blocks, even
> when extending in place.  I think the formula may need to be tweaked  
> some more,
> but it's definitely much better than the linear growth pattern.  In  
> my test
> program, it shaves off about 1/10th of a second.  The down side of  
> course is
> when you are appending to a large array, it's going to add 40% back  
> on itself
> which will be significant, even if you are just adding one element.   
> I think for
> very large arrays, we probably want to make the growth approach some  
> low
> percentage (like 1 or 2%).  I also am wondering whether we should  
> consider the
> number of elements being added, i.e. if you are only adding 1 or 2  
> elements,
> don't grow as much.

No the thing is that you want that even a loop adding single elements  
does not have a quadratic cost.
any grow strategy that reallocates after x elements has a quadratic  
cost:
growing n chunks implies n times realloc & copy of an array of length  
1,2,3,...n chunks, thus a cost of sum(1,..n)=n*(n+1)/2=O(n^2).
A simple strategy to avoid the quadratic cost is geometric growth:  
grow by multiplying with a factor.
If the array is large then the absolute amount of wasted memory can be  
large, so one might want to reduce that a bit.
That is where O(x/log2(x)) is useful, it reduces the extra memory, but  
as the logarithm grows slower than any polynomial it still avoids the  
quadratic cost.

one can then tune these two functions for example with
adding x*(a*1/(log2(x)+1)) extra space
for small x (note that if x are bytes x>=1)  then you add a "a" factor  
more.
for large x this becomes less.
one can then change the factor for the extra space to
	(a*(m+b)/(log2(x)+b))
here  the larger b the less the reduction for large x kicks in.
m is something that can be adjusted to choose for which value of  
log2(x) the factor is exactly a.
It can be useful for example if you know that x is at least 32, then  
log2(x) is at least 5, and it makes sense to choose m=5 (I called m  
minBits, minimum number of bits in x).

With this knowledge it should be possible to tweak a bit better the  
function if one is so inclined.

Note that it might be good to keep a small and b largish if you know  
that you have many small arrays, as each of those will "waste" a extra  
space, so that is the wasted percentage for small arrays. for large  
ones it can be less.

Adding little when the array grows little is not a solution, because  
it gives the quadratic growth.
If you know that you array will not grow much then you should resize  
it manually... automatic append cannot avoid the quadratic cost and  
always grow little.

Fawzi

>
> -Steve
>
>
>
>
> _______________________________________________
> D-runtime mailing list
> D-runtime at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/d-runtime



More information about the D-runtime mailing list