Spikes in array capacity

Steven Schveighoffer schveiguy at yahoo.com
Fri Jul 2 08:21:13 PDT 2010


On Fri, 02 Jul 2010 10:01:11 -0400, Ali Çehreli <acehreli at yahoo.com> wrote:

> Steven Schveighoffer wrote:
>  > newCapacity isn't called when setting the length explicitly, because
>  > it's not considered to be an append.
>
> You are right.
>
> Looks like I was successful in locating the culprit. :) I took liberty  
> to write ++a.length and also outputted the number of elements that are  
> in reserve at each allocation:
>
> import std.stdio;
> import std.array;
>
> void main()
> {
>      int[] a;
>      size_t oldCapacity;
>
>      foreach (i; 0 .. 100_000_000) {
>          ++a.length;
>          a[$-1] = i;
>
>          if (capacity(a) != oldCapacity) {
>              writeln("length: ", a.length,
>                      " capacity: ", capacity(a),
>                      " ratio: ", cast(double)capacity(a) / oldCapacity,
>                      " reserve: ", capacity(a) - a.length);
>              oldCapacity = capacity(a);
>          }
>      }
> }
>
> Now the spikes are gone, and the allocations asymptote at once-per-1024  
> elements for an int array:
>
> length: 1 capacity: 3 ratio: inf reserve: 2
> length: 4 capacity: 7 ratio: 2.33333 reserve: 3
> length: 8 capacity: 15 ratio: 2.14286 reserve: 7
> length: 16 capacity: 31 ratio: 2.06667 reserve: 15
> length: 32 capacity: 63 ratio: 2.03226 reserve: 31
> length: 64 capacity: 127 ratio: 2.01587 reserve: 63
> length: 128 capacity: 255 ratio: 2.00787 reserve: 127
> length: 256 capacity: 509 ratio: 1.99608 reserve: 253
> length: 512 capacity: 1021 ratio: 2.00589 reserve: 509
> length: 1022 capacity: 2045 ratio: 2.00294 reserve: 1023
> length: 2046 capacity: 3069 ratio: 1.50073 reserve: 1023
> length: 3070 capacity: 4093 ratio: 1.33366 reserve: 1023
> ...
>
> In the long run, there is one allocation per 1024 elements. That still  
> feels like amortized constant, but considering that relocation times  
> would be O(N), I think reserve should still grow with N, but maybe not  
> too much. (I am not sure about this last point at all.)
>

4x1024 = 4096 == page size.

The newCapacity function is supposed to reduce the effect of this  
phenomenon.  i.e., when appending, it's assumed you are going to continue  
appending, so to reduce the overhead, the amount added to the memory block  
is supposed to be related to the total number of bytes already allocated.   
It doesn't seem like the function is working very well...  It could have  
been something I did, or it could have always been broken :)  I honestly  
did not examine the code at all, I just only ever read the comments,  
assuming it works.

Please add these comments to the bug report, it will help when I look at  
it.  I don't have a lot of time to look at it right now, so I'm afraid  
I'll forget what we were doing later :)

-Steve


More information about the Digitalmars-d mailing list