Stack-allocated arrays

Wed Nov 12 11:25:12 PST 2008

Sean Kelly wrote:
> Andrei Alexandrescu wrote:
>> KennyTM~ wrote:
>>> The best solution I can think of, without compiler modification is a 
>>> struct/class that contains a static array member T[1024] and a 
>>> dynamic array member T[] initialized to null; and the code chooses 
>>> which member to use in the constructor. But this always occupies 
>>> 1024*T.sizeof bytes and there will always be a conditional (if) 
>>> sticked to all access methods.
>>
>> This entire discussion gets me thinking - could an alternate stack be 
>> a good (or even better) solution? Consider a global per-thread 
>> "superstack" - a growable region that allocates memory in large chunks 
>> and makes sub-chunks accessible in a strictly LIFO manner. The 
>> primitives of the superstack are as follows:
>>
>> void* SuperStack.allocate(size_t bytes);
>> void SuperStack.free(size_t bytes);
>> size_t SuperStack.slack();
>>
>> The SuperStack's management is a singly-linked list of large blocks.
> 
> It's an implementation detail, but when the owner thread dies it should 
> null the links for all the list nodes, assuming the data contained in 
> those nodes can be shared across threads.

Damn straight it could free() them assuming it has malloc()ated them!

>> With the SuperStack in place, code could look like this:
>>
>> void foo(in size_t s)
>> {
>>     auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow);
>>     scope(exit) SuperStack.free(s);
>>     ... play with a ...
>> }
>>
>> Is there interest for such a thing?
> 
> I suppose I'm wondering what the advantage is of this approach over 
> simply using dynamic memory.

Speed. An allocation/deallocation is most of the time a counter bump and 
a check.

> Is it the fact that allocations are 
> non-locking?

No.

> Pure stack allocations make sense because all such 
> allocations are scoped and guaranteed to be extremely fast.  So I 
> suppose this is simply a parallel local "stack" for the same purpose and 
> the goal is to avoid the assembly magic necessary for an alloca() type 
> solution?  And if this is the case, then I assume that sharing would be 
> illegal (at least on D2)?

It is better than the stack because it can tap into the entire heap, as 
opposed to whatever was reserved for the stack. You can also expand an 
array on the superstack, whereas you can't with alloca. (I forgot to 
provide that primitive.) Yah, and indeed recall that D2 is all built 
around no implicit sharing across threads.

> My initial reaction is that I'm skeptical of any attempt at manual 
> replacement of built-in functionality.  It's been shown, for example, 
> that custom allocators often perform worse than the default allocators.

I seem to recall that. Oh, wait, I gave a talk on that, several times 
:o). Emery Berger studied the problem. It turns out user-defined heaps 
are usually worse off, with one exception: regions. SuperStack is a 
region. That's why I'm counting it could make a difference.

>  That aside, the general approach does seem reasonable.  Though I'd 
> still prefer we just get real stack-based dynamic allocation in D.  It 
> seems quite natural that "scope" should provide this for arrays as a QOI 
> feature, as it does for classes.

I agree. There are a few problems. One is that loops can make things 
tricky, as you need to call alloca only once. The other has to do with 
the way Walter generates code, I forgot the details, but it's fixable if 
some time is invested. And the third is, as I said, stack allocation can 
actually fail long before the system runs out of memory. It has happened 
to me.

Andrei