Spikes in array capacity

Steven Schveighoffer schveiguy at yahoo.com
Fri Jul 2 05:34:30 PDT 2010


On Fri, 02 Jul 2010 01:34:26 -0400, Ali Çehreli <acehreli at yahoo.com> wrote:

> BCS wrote:
>
>  >> There is something strange in the array capacity algorithm.
>  >> [...]
>  >> Is that intended?
>
>  > I'm going to take a guess that the gc is (after some point) allocating
>  > into the smallest hole with "enough" capacity.
>
> You may be right. :)
>
>  > If you re run the test
>  > but with something else going on to add some noise, do the spikes  
> move?
>
> Yes, the spikes move.
>
>  > void makeNoise()
>  > {
>  >      static byte[][256] data;
>  >      data[rand() % $] = new byte[rand() % 512];
>  > }
>
> I am not familiar with the dmd internals; but following the code took me  
> to the function newCapacity() in src/druntime/src/rt/lifetime.d. The  
> capacity growth is more complicated than what I've been assuming so far.
>
> And yes, GC is in the picture. Quoting the comments from that file:
>
> /*
>   * Better version by Dave Fladebo:
>   * This uses an inverse logorithmic algorithm to pre-allocate a bit more
>   * space for larger arrays.
>   * - Arrays smaller than PAGESIZE bytes are left as-is, so for the most
>   * common cases, memory allocation is 1 to 1. The small overhead added
>   * doesn't affect small array perf. (it's virtually the same as
>   * current).
>   * - Larger arrays have some space pre-allocated.
>   * - As the arrays grow, the relative pre-allocated space shrinks.
>   * - The logorithmic algorithm allocates relatively more space for
>   * mid-size arrays, making it very fast for medium arrays (for
>   * mid-to-large arrays, this turns out to be quite a bit faster than the
>   * equivalent realloc() code in C, on Linux at least. Small arrays are
>   * just as fast as GCC).
>   * - Perhaps most importantly, overall memory usage and stress on the GC
>   * is decreased significantly for demanding environments.
>   */
>
> So there may not be a bug after all...
>
> Ali

If your original code is all that was being run, I think there is a bug.

I'll tell you why -- the GC provides a way to append pages to an allocated  
block.  Essentially, you can glue consecutive unused pages together to  
make a larger block, thereby growing your block.

But anything under a page is simply reallocated because anything less than  
a page is inside a pool of blocks of the same size, and those cannot be  
glued together.

So when your array is growing, and it gets larger than a page, it should  
grow a single page at a time.  But I think (and I stress think, I'm not  
sure) that when a block of pages is deallocated, each page becomes  
independently free.  They do not stay glued together.  So for an  
allocating requesting 1 or 2 more pages to grow by 2000 pages is suspect.   
I still have to examine the code, but I think there is a problem.

Back to my original assertion -- to disprove the theory that some other  
"larger hole" is why it gets so big, in one instance your number of  
elements grows from 1,155,070 (4MB) to 3,153,917 (12MB), that's almost a  
200% increase.  In order for that 12MB hole to exist, something must have  
allocated it before, and then deallocated it.  It can't be your loop  
because your loop has not allocated that much space yet.  I would find  
that very strange to happen somewhere in the runtime without you  
specifically allocating it.  If however, your original post is not the  
entire code, then there may be some possibility to that.

BTW, you can take the newCapacity function out of the loop by simply  
setting the length and then writing the last element, i.e.:

a.length += 1;
a[$-1] = i;

newCapacity isn't called when setting the length explicitly, because it's  
not considered to be an append.

-Steve


More information about the Digitalmars-d mailing list