[D-runtime] Fw: [phobos] newCapacity function in array appending
Steve Schveighoffer
schveiguy at yahoo.com
Wed Jul 14 06:19:09 PDT 2010
----- Original Message ----
> From: Fawzi Mohamed <fawzi at gmx.ch>
>
>
> > ----- Forwarded Message ----
> >> From: Steve Schveighoffer <schveiguy at yahoo.com>
> >>
> >> There is a couple issues with the growth rate of arrays in the array
append
> >> function.
> >>
> >> First, the newCapacity function is only called when the array must be
> >> reallocated. I believe this to be an error, I think the newCapacity
>function
> >
> >> should always be used, even if pages can be added. This would use less
>cycles
> >>
> >> searching for pages to append.
>
> I am not sure I follow, you mean using the new function to determine how many
>pages really commit?
> well indeed you would spare some cycles, but at the cost of using memory, not
>sure it is really a win (the use of pages already
I assume the newCapacity function is used to avoid constantly looking for new
memory to allocate. The theory being if you are appending, you don't want to
just allocate exactly what you need, you want to preallocate a little more.
At least when the current pool is exhausted, newCapacity is used to allocate a
fresh block, I just don't know why it's not used to extend the current block
(which is why I'm trying to add that feature).
One of the issues with the current GC is that getting a size of a memory block
that consists of hundreds or thousands of pages is that it is a linear search
for the end of the block. This is mitigated by the LRU cache, but once you
exceed the number of cache, performance should probably fall off a cliff
(however, it doesn't for reasons I don't understand). But when I extend a
block, it must do this same linear search to find the page size before doing an
extend. I'm trying to do that linear search as little as possible. All this
could probably be mitigated if the allocated length was somehow cached per
allocated page.
I didn't write newCapacity, so I'm not sure about the theory behind it, or what
desirable properties for appending are. But it does make sense to me that if
you are continuously appending, you want to preallocate more than "just enough
to append the new data". How much more, I have no clue :) I assumed the
formula in newCapacity was written by someone who knows what they are doing.
Well, except for using the element size in the percentage formula.
> >>
> >> Second, the newCapacity function seems to use the size of the element to
> >> determine a percentage growth. I feel this is an error. I think the
>growth
> >> factor should strictly be related to the amount of memory being
requested,
> >> which
> >>
> >> already takes into account the size of the element.
> >>
> >> At present, this means that arrays of ints grow 4x as fast as arrays of
> >> bytes.
> >>
> >> And that's 4x in percentage terms, not in 4x the memory used. The memory
>used
> >>
> >> actually grows 16x faster.
> >>
> >> Does everyone agree with this?
>
> Again I am not too sure, I flip flopped a couple of times implementing
>http://github.com/fawzi/blip/blob/master/blip/util/Grow.d , a version of which
>I had also used in tango.
>
> basically if you append to an array do you want the appending be determined by
>the number of elements of the array, or its size?
> Code will probably work depending on the number of elements of the array, not
>on its memory usage.
> In the end it is just a factor, but if you have many small elements I feel
>that it if "reasonable" to allocate a smaller amount of extra space.
> On the other hand as you pointed out if you thing strictly from the memory
>point of view, one can argue that the percentile of extra space should depend
>only on the space used, not on how it is used internally.
>
> Finally I kept the "per element" view, but that was done without any real
>testing
Well, per length or per size, it's both a percentage. The size is a linear
scale of the length, so it's not really important whether you use size or
length. But does it make sense to alter the percentage based on the element
size? That is, if arrays of bytes should grow by 50% each time a realloc is
needed, should arrays of ints grow by 200%? It doesn't make sense to me, yet
that is what was in the calculation.
The other factors are seemingly arbitrary, so I'm not sure how to tune those.
-Steve
More information about the D-runtime
mailing list