Efficiency of using array as stack
Ivan Kazmenko
gassa at mail.ru
Sat Mar 23 05:10:56 PDT 2013
Hello again!
Today, I had trouble using D built-in dynamic array as a stack:
it timed out badly. I have then tried to reduce the problem down
to a minimal example. I am using DMD 2.062.
Now, I have this sample program. It creates an array of length
one million, and then keeps adding an element to the array and
removing it in a loop, also one million times. Such a pattern,
and more complex ones, can easily occur whenever an array is used
as a stack.
What actually happens is constant relocation and movement
resulting in 1_000_000 * 1_000_000 operations which just looks
bad for me.
-----
import core.memory;
import std.range;
import std.stdio;
void main ()
{
version (NOGC) {GC.disable ();}
int n = 1_000_000;
auto s = array (iota (n));
s.reserve (n * 2);
foreach (i; 0..n)
{
s.length++;
debug {writefln ("after ++: %s %s", s.capacity, &s[0]);}
s.length--;
debug {writefln ("after --: %s %s", s.capacity, &s[0]);}
}
}
-----
Running debug build, we can see that the address of s[0] changes
after each increase of length, and the capacity is reduced to
zero after each decrease. So, the line "s.reserve (n * 2)" fails
to hint the compiler that I want to allocate the array once and
then use it without relocation - and that I would like to be able
to do!
I was wondering if garbage collection has something to do with
the inefficiency, but the answer is no. If we choose the "NOGC"
version, it quickly fills some large portion of memory (1GB in my
local test) with the copies and stops responding. If GC is
active, it just constantly swaps between two memory locations.
Now, in section 4.1.10 of TDPL ("Assigning to .length") it is
stated that "If the array shrinks <...>, D guarantees that the
array is not reallocated", and later, "that guarantee does not
also imply that further expansions of the array will avoid
reallocation", so, formally, everything is by the letter of the
law so far.
However, inserting another "s.reserve (n * 2)" just after
"s.length--" does not help either. And I think the whole thing
does contradict the intent described in TDPL section 4.1.9
("Expanding"). Here it goes: "If there is no slack space left,
... The allocator may find an empty block to the right of the
current block and gobble it into the current block. This
operation is known as coalescing". Now, this quote and its
context give no formal guarantee that the memory allocator works
exactly like that, but they definitely sound like a good thing to
do.
I hope many would agree that reducing the length once does not at
all imply we want to reduce it further. Frankly, I have thought
so far that D dynamic arrays can be used as queue and stack
implementations done "right", i.e., efficiently.
So my two questions are:
1. I would like to have a way to reduce the length of the array
without removing the guarantees of the preceding "s.reserve". Is
it possible in the current D implementation? How?
2. Ideally, I would like a dynamic array in D to act efficiently
as a stack. That means, the amortized cost of N stack operations
should be O(N). To achieve this, I would propose "lazy"
reduction of the space reserved for the array. I suppose the
implementation guarantees that for expanding arrays as shown in
the example at http://dlang.org/arrays.html#resize : when
capacity is equal to zero, the memory manager roughly doubles the
allocated size of the array. The very same trick could be used
for array shrinking: instead of reducing the capacity to zero
(i.e., guaranteeing to allocate the exact amount, the memory
manager could leave the allocated size equal to (for example) min
(prev, cur * 2) where prev is the allocated size before the
shrinking and cur is the size used after the shrinking.
I suspect that the above suggestion could conflict some other
array usage patterns because the array syntax actually deals with
array views, not arrays "in flesh" (in memory). One case I can
imagine is the following:
-----
import std.range;
import std.stdio;
void main ()
{
auto a = array (iota (30)); // [0, 1, ..., 29]
auto b = a[10..$]; // [10, 11, ..., 19]
b.length -= 10; // *** what is now b.capacity?
b.length += 10; // now b should be [10, 11, ..., 19, 0, 0, ...,
0]
writeln (b);
}
-----
Now, at line ***, if memory manager leaves b.capacity as 10, it
will point at the space occupied by the array a, which does not
sound right. As I'm not a D expert (yet? ;) such investigations
are insightful), I don't know right now whether this problem is
solvable. Please share your thoughts on that.
But at least if we don't have any more views into the memory
after b (as in my first example, and generally true when an array
is used as a stack), the memory manager could detect that and
take an optimal decision regarding b.capacity.
Thank you for reading to this point, I confess that was rather
lengthy.
-----
Ivan Kazmenko.
More information about the Digitalmars-d-learn
mailing list