Spikes in array capacity
Steven Schveighoffer
schveiguy at yahoo.com
Fri Jul 2 05:34:30 PDT 2010
On Fri, 02 Jul 2010 01:34:26 -0400, Ali Çehreli <acehreli at yahoo.com> wrote:
> BCS wrote:
>
> >> There is something strange in the array capacity algorithm.
> >> [...]
> >> Is that intended?
>
> > I'm going to take a guess that the gc is (after some point) allocating
> > into the smallest hole with "enough" capacity.
>
> You may be right. :)
>
> > If you re run the test
> > but with something else going on to add some noise, do the spikes
> move?
>
> Yes, the spikes move.
>
> > void makeNoise()
> > {
> > static byte[][256] data;
> > data[rand() % $] = new byte[rand() % 512];
> > }
>
> I am not familiar with the dmd internals; but following the code took me
> to the function newCapacity() in src/druntime/src/rt/lifetime.d. The
> capacity growth is more complicated than what I've been assuming so far.
>
> And yes, GC is in the picture. Quoting the comments from that file:
>
> /*
> * Better version by Dave Fladebo:
> * This uses an inverse logorithmic algorithm to pre-allocate a bit more
> * space for larger arrays.
> * - Arrays smaller than PAGESIZE bytes are left as-is, so for the most
> * common cases, memory allocation is 1 to 1. The small overhead added
> * doesn't affect small array perf. (it's virtually the same as
> * current).
> * - Larger arrays have some space pre-allocated.
> * - As the arrays grow, the relative pre-allocated space shrinks.
> * - The logorithmic algorithm allocates relatively more space for
> * mid-size arrays, making it very fast for medium arrays (for
> * mid-to-large arrays, this turns out to be quite a bit faster than the
> * equivalent realloc() code in C, on Linux at least. Small arrays are
> * just as fast as GCC).
> * - Perhaps most importantly, overall memory usage and stress on the GC
> * is decreased significantly for demanding environments.
> */
>
> So there may not be a bug after all...
>
> Ali
If your original code is all that was being run, I think there is a bug.
I'll tell you why -- the GC provides a way to append pages to an allocated
block. Essentially, you can glue consecutive unused pages together to
make a larger block, thereby growing your block.
But anything under a page is simply reallocated because anything less than
a page is inside a pool of blocks of the same size, and those cannot be
glued together.
So when your array is growing, and it gets larger than a page, it should
grow a single page at a time. But I think (and I stress think, I'm not
sure) that when a block of pages is deallocated, each page becomes
independently free. They do not stay glued together. So for an
allocating requesting 1 or 2 more pages to grow by 2000 pages is suspect.
I still have to examine the code, but I think there is a problem.
Back to my original assertion -- to disprove the theory that some other
"larger hole" is why it gets so big, in one instance your number of
elements grows from 1,155,070 (4MB) to 3,153,917 (12MB), that's almost a
200% increase. In order for that 12MB hole to exist, something must have
allocated it before, and then deallocated it. It can't be your loop
because your loop has not allocated that much space yet. I would find
that very strange to happen somewhere in the runtime without you
specifically allocating it. If however, your original post is not the
entire code, then there may be some possibility to that.
BTW, you can take the newCapacity function out of the loop by simply
setting the length and then writing the last element, i.e.:
a.length += 1;
a[$-1] = i;
newCapacity isn't called when setting the length explicitly, because it's
not considered to be an append.
-Steve
More information about the Digitalmars-d
mailing list