[D-runtime] Fw: [phobos] newCapacity function in array appending

Wed Jul 14 06:19:09 PDT 2010

----- Original Message ----
> From: Fawzi Mohamed <fawzi at gmx.ch>
> 
> 
> > ----- Forwarded Message ----
> >> From: Steve Schveighoffer  <schveiguy at yahoo.com>
> >> 
> >> There is a couple issues  with the growth rate of arrays in the array 
append
> >>  function.
> >> 
> >> First, the newCapacity function is only  called when the  array must be
> >> reallocated.  I believe  this to be an error, I think the  newCapacity 
>function
> > 
> >>  should always be used, even if pages can be  added.  This would use  less 
>cycles
> >> 
> >> searching for pages to   append.
> 
> I am not sure I follow, you mean using the new function to  determine how many 
>pages really commit?
> well indeed you would spare some  cycles, but at the cost of using memory, not 
>sure it is really a win (the use of  pages already

I assume the newCapacity function is used to avoid constantly looking for new 
memory to allocate.  The theory being if you are appending, you don't want to 
just allocate exactly what you need, you want to preallocate a little more.

At least when the current pool is exhausted, newCapacity is used to allocate a 
fresh block, I just don't know why it's not used to extend the current block 
(which is why I'm trying to add that feature).

One of the issues with the current GC is that getting a size of a memory block 
that consists of hundreds or thousands of pages is that it is a linear search 
for the end of the block.   This is mitigated by the LRU cache, but once you 
exceed the number of cache, performance should probably fall off a cliff 
(however, it doesn't for reasons I don't understand).  But when I extend a 
block, it must do this same linear search to find the page size before doing an 
extend.  I'm trying to do that linear search as little as possible.  All this 
could probably be mitigated if the allocated length was somehow cached per 
allocated page.

I didn't write newCapacity, so I'm not sure about the theory behind it, or what 
desirable properties for appending are.  But it does make sense to me that if 
you are continuously appending, you want to preallocate more than "just enough 
to append the new data".  How much more, I have no clue :)  I assumed the 
formula in newCapacity was written by someone who knows what they are doing.  
Well, except for using the element size in the percentage formula.

> >> 
> >> Second, the newCapacity function seems to  use the size of the  element to
> >> determine a percentage  growth.  I feel this is an  error.  I think the  
>growth
> >> factor should strictly be related to the  amount of  memory being 
requested,
> >> which
> >> 
> >> already takes  into account the size  of the element.
> >> 
> >> At present,  this means that arrays of ints grow 4x as  fast as arrays of
> >>  bytes.
> >> 
> >> And that's 4x in percentage terms, not in  4x  the memory used.  The memory 
>used
> >> 
> >>  actually grows 16x  faster.
> >> 
> >> Does everyone agree  with this?
> 
> Again I am not too sure, I flip flopped a couple of times  implementing 
>http://github.com/fawzi/blip/blob/master/blip/util/Grow.d , a  version of which 
>I had also used in tango.
> 
> basically if you append to an  array do you want the appending be determined by 
>the number of elements of the  array, or its size?
> Code will probably work depending on the number of  elements of the array, not 
>on its memory usage.
> In the end it is just a  factor, but if you have many small elements I feel 
>that it if "reasonable" to  allocate a smaller amount of extra space.
> On the other hand as you pointed  out if you thing strictly from the memory 
>point of view, one can argue that the  percentile of extra space should depend 
>only on the space used, not on how it is  used internally.
> 
> Finally I kept the "per element" view, but that was done  without any real 
>testing

Well, per length or per size, it's both a percentage.  The size is a linear 
scale of the length, so it's not really important whether you use size or 
length.  But does it make sense to alter the percentage based on the element 
size?  That is, if arrays of bytes should grow by 50% each time a realloc is 
needed, should arrays of ints grow by 200%?  It doesn't make sense to me, yet 
that is what was in the calculation.

The other factors are seemingly arbitrary, so I'm not sure how to tune those.

-Steve