proposal: capacity variable for dynamic arrays

Mon Jul 10 10:58:29 PDT 2006

Ameer Armaly skrev:
> "Oskar Linde" <oskar.lindeREM at OVEgmail.com> wrote in message 
> news:e8tvq5$1i3d$1 at digitaldaemon.com...
>> Ameer Armaly skrev:
>>> Right now dynamic arrays use length to resize and handle memmory 
>>> allocation. This is a bit limiting as far as the old memmory  reservation 
>>> problem as well as a few other things.  What I propose is adding a new 
>>> variable called capacity, which handles the actual memmory allocation and 
>>> using length for the accepted size of the array.  Here are a few code 
>>> fragments to show how it would work:
>>>
>>> char[] string = new char[1024];
>>> assert(string.length==1024 && string.capacity=1024);
>>> string.length=0; // we've reserved memmory, but haven't freed it
>>> version(break) writefln(string[1..4]); // throws a bounds exception, 
>>> since length is zero
>>> string.capacity = 1300; // traditional resize with a different variable, 
>>> will be set to whatever the true capacity is
>>> string.length = 2048; // would still work, since length is greater than 
>>> capacity so it acts the same way.  However, capacity would be the actual 
>>> amount of memmory reserved by the malloc algorithm, in this case doubling 
>>> the size of the array.
>>> string.capacity = 0; // freed
>>>
>>> In short, this  solves the problem of reserving memmory with little cost, 
>>> while maintaining backwards compatibility.  It could also be used in the 
>>> stream and socket code, where the actual capacity doesn't change but the 
>>> length does, removing the need for a separate size variable, something 
>>> like this:
>>>
>>> char[] buf = new char[1024];
>>> buf.length = 0;
>>> Socket s;
>>> // set up the socket
>>> s.read(buf); // Length is set to the actual data, but capacity is the 
>>> same. This way the array is truely dynamic.
>>>
>>> Thoughts?
>> Where would the capacity value be stored? Today the capacity is (as far as 
>> I understand) stored by the GC in correspondance to the the allocated 
>> chunk of memory.
>>
>> What could be done is having a virtual .capacity property that when set to 
>> a value makes sure the capacity is /at least/ the value set and returns 
>> the actual capacity when read. So for example after doing
>> arr.capacity = 1024;
>> arr.capacity could be something at least 1024, like probably 2047.
>>
> Sounds good to me, especially since that means no redundant variables.  What 
> I see capacity as holding is the maximum amount of elements I can put in 
> this array _without_ any new allocations.

Here is to play with. (You probably need to add 
~/dmd/src/phobos/internal/gc to module search path). No warranties: :)

--- arrcapacity.d
import std.gc;
import internal.gc.gcx; // :)

int capacity(T)(T x) {
         gc_t gc = cast(gc_t)getGCHandle();
         uint gccap = gc.capacity(x);
         if (gccap == 0)
                 return x.length;
         return (gccap-1)/(typeof(x[0]).sizeof);
}

void setCapacity(T)(inout T x, int capacity) {
         int oldLength = x.length;
         x.length = capacity;
         x = x[0..oldLength];
}
---

usage:

if (arr.capacity() < 100)
	arr.setCapacity(200);

It's a shame implicit injected array member functions don't support the 
property syntax: arr.capacity; arr.capacity=200;

NOTE: I have no idea if the above code is even close to correct...
The code would also be nicer if Phobos was altered slightly. Especially 
if _gc.capacity() was exposed by std.gc.

/Oskar