Efficiency of using array as stack

Steven Schveighoffer schveiguy at yahoo.com
Mon Mar 25 08:26:01 PDT 2013


If you happened to get that accidental blank send, sorry, I tried to  
cancel it.

On Sat, 23 Mar 2013 19:45:21 -0400, Ivan Kazmenko <gassa at mail.ru> wrote:

> On Saturday, 23 March 2013 at 19:33:06 UTC, Jonathan M Davis wrote:
>> You might want to check out this article where someone ran into similar  
>> issues:
>>
>> https://www.semitwist.com/articles/article/view/don-t-use-arrays-as-stacks
>>
>> And if you haven't read Steven's article on arrays, you should do so:
>>
>> http://dlang.org/d-array-article.html
>
> That was a good reading, thank you!  The whole matter became clearer for  
> me.
>
> But now, I have one more example which confuses me, and the articles  
> don't seem to help right away with it.  In this example, I am moving a  
> "sliding window" of length n; now that we're done with the stack, here  
> is a simple usage pattern of the queue.  What I do is repeatedly (n  
> times also) add an element to the back of the array and then remove an  
> element from the front of it, keeping the whole queue constantly large.   
> Before the start, I reserve c*n entries for my array, where c is a real  
> number between 1 and 2.  I track reallocations by monitoring the address  
> of an element of a which should remain still as long as the array is not  
> reallocated.
>
> -----
> import std.algorithm;
> import std.array;
> import std.range;
> import std.stdio;
>
> void main ()
> {
> 	int n = 2_000_000;
> 	double c = 1.5;
> 	auto a = array (iota (n));
> 	a.reserve (cast (int) (n * c));
> 	auto prev = &a[$ - 1];
> 	int changes = 0;
> 	foreach (i; 0..n)
> 	{
> 		debug {writeln (a.capacity);}
> 		a.length++;
> 		a = a[1..$];
> 		auto next = &a[$ - i - 1];
> 		if (prev != next)
> 		{
> 			changes++;
> 			prev = next;
> 		}
> 	}
> 	writeln (changes);
> }
> -----
>
> Now, if I set c to 2, there are no reallocations: all the work goes with  
> the 2*n pre-allocated elements.  But as I decrease c down to 1, the  
> number of reallocations grows *linearly*, roughly 1 per 2000 appends.   
> So for c = 1, there are 1044 reallocations in total.  And that's still  
> quadratic behavior, albeit divided by a large constant (~2000)!

So here is how the appender tries to add data:

1. if the block is less than a page, memory is tracked as a bin of  
like-sized blocks that fit into a page.  Starting at 16-byte blocks, then  
doubling until you reach half-page size.  So if you need something that  
consumes less than 16-bytes, a 16-byte chunk is allocated out of a 16-byte  
bin (which is a page that has nothing but 16-byte chunks in it).
2. When you reallocate to a larger block size (16 to 32 for instance), the  
block MUST be moved into another bin, because only 16-byte blocks are  
allowed in that bin.
3. When you get to PAGE size (4096 bytes) and larger, the appender takes a  
different approach:
    a. If the page following your block is unallocated, and it can simply  
extend the block into that next page, it will do so.  This avoids copying  
the data to another block, which arguably would be more expensive than  
trying to double the capacity (not sure if this is true, but that's how it  
works).
    b. If not, it will reallocate into a new or existing empty block that  
can hold the entire data, plus some additional space calculated by an  
algorithm that is not quite double, but aims to amortize appending (if you  
search in the lifetime.d file in druntime, look for newCapacity to find  
the function that calculates this extra space).

HOWEVER, setting the length is NOT considered appending by the runtime.   
The runtime takes a different approach when just setting the length -- it  
does NOT add any extra capacity to try and amortize the allocations.  The  
idea is that you set the length usually once, and then use the array  
without appending.  It is more efficient if you are appending to give it  
the elements to append rather than zero-init them first.

You will likely get better performance if you use:

a ~= int.init;

instead of:

a.length++;

> What happens (locally) is, once the array size is not enough, it starts  
> being grown by blocks of 1024 or 891 (?!) elements in a repeating  
> pattern; one of these reallocates, the other does not.  The textbook  
> growth implementation suggests doubling the array size, but it looks  
> like D attempts something smarter here.  However, in my case, the result  
> is ~8x slower behavior when not pre-allocating.  The more is n, the  
> slower is the version without pre-allocation (c = 1) compared to the  
> version with pre-allocation (c = 2).  And this means that a D array is  
> not also an efficient queue out-of-the-box.  However, as in the case  
> with stack, it could perhaps be efficient with a bit of tweaking (some  
> custom tailored calls to reserve perhaps?).

growth of 1024 is when the new pages are tacked onto the end (4096 /  
sizeof(int)), growth of 891 is interesting.  I can explain it though :)

you have allocated 2,000,001 ints at the time you increment length, or  
8,000,004 bytes.  The page size is 4096.  So the block size you will need  
to hold this is 8,003,584 (must be a multiple of pages).

So you now have 3580 extra bytes to grow into, divide by 4 (because  
capacity is in terms of int elements), and you get... 896.

Hm... where are those extra 5 ints?  The answer is weird, but the way the  
array runtime can 'tack on' extra pages requires we store capacity (see my  
previously referenced array article for more explanation) at the  
beginning, and we require 16-byte alignment.  So the capacity requires 16  
extra bytes, even though it only uses 4 or 8 of them (depending on  
architecture).  But wait, that's only 4 ints!  Where's that extra int?

Now, this is REALLY weird.  Because blocks in memory can be sequential,  
and we consider pointer at the end of a block to actually be pointing at  
the beginning of the next block (for GC and other reasons), we add an  
extra byte of padding at the END of the block to buffer it from  
accidentally pointing at the next block.  Consider if you had a  
max-capacity slice, and you did sl[$..$], that would now be pointing at  
the NEXT block (which may not even be allocated!) and you could wreak some  
havoc by appending to that block or setting its length.  Since ints must  
be 4-byte aligned, we can't use the last 4 bytes of the block because of  
that one padding byte, so you lose one more int for that.

-Steve


More information about the Digitalmars-d-learn mailing list