Feature suggestion: in-place append to array

Thu Apr 1 07:34:18 PDT 2010

On Thu, 01 Apr 2010 01:41:02 -0400, Mike S  
<mikes at notarealaddresslololololol.com> wrote:

> Steven Schveighoffer wrote:
>   > What do you mean by nondeterministic?  It's very deterministic, just  
> not
>> always easy to determine ;)  However, given enough context, it's really  
>> easy to determine.
>
> When I say deterministic, I'm referring to determinism from the user's  
> point of view, where the allocation behavior is affected solely by the  
> parameter (the size request, e.g. 10000 objects) and not by some kind of  
>   internal state, hidden context, or arcane black magic. :p

Its abstracted to the GC, but the current GC is well defined.  If you  
request to allocate blocks with length of a power of 2 under a page, you  
will get exactly that length, all the way down to 16 bytes.  If you  
request to allocate a page or greater, you get a contiguous block of  
memory that is a multiple of a page.

With that definition, is the allocator deterministic enough for your needs?

>
>   > The amount of memory given is determined by the GC, and ultimately by
>> the OS.  The currently supported OSes allocate in Page-sized chunks, so  
>> when you allocate any memory from the OS, you are allocating a page  
>> (4k).  Most likely, you may not need a whole page for the data you are  
>> allocating, so the GC gives you more finely sized chunks by breaking up  
>> a page into smaller pieces.  This strategy works well in some cases,  
>> and can be wasteful in others.  The goal is to strike a balance that is  
>> "good enough" for everyday programming, but can be specialized when you  
>> need it.
>
> That's understandable, and it makes sense that the actual memory being  
> allocated would correspond to some chunk size.  It's really just opaque  
> black box behavior that poses a problem; if users are given well-defined  
> guidelines and chunk sizes, that would work just fine.  For instance, a  
> spec like, "reserve a multiple of 512 bytes and that's exactly what you  
> will be given," would allow users to minimize wastefulness and know  
> precisely how much memory they're allocating.

I think in the interest of allowing innovative freedom, such requirements  
should be left up to the GC implementor, not the spec or runtime.  Anyone  
who wants to closely control memory usage should just understand how the  
GC they are using works.

>
>>  If you want to control memory allocation yourself, you can always do  
>> that by allocating page-sized chunks and doing the memory management on  
>> those chunks yourself.  I do something very similar in dcollections to  
>> speed up allocation/destruction.
>>  <snip>
>>  I think D has deterministic allocation, and better ability than C++ to  
>> make custom types that look and act like builtins.  Therefore, you can  
>> make an array type that suits your needs and is almost exactly the same  
>> syntax as a builtin array (except for some things reserved for  
>> builtins, like literals).  Such a thing is certainly possible, even  
>> with using the GC for your allocation.
>
> That parallels what game devs do in C++:  They tend to use custom  
> allocators a lot, and they're likely to follow the same basic strategy  
> in D too, if/when it becomes a suitable replacement.  I'm still just  
> browsing though, and I'm not all that familiar with D.  If you can't  
> actually use the built-in dynamic arrays for this purpose, how difficult  
> would it be to reimplement a contiguously stored dynamic container using  
> custom allocation?  I suppose you'd have to build it from the ground up  
> using a void pointer to a custom allocated block of memory, right?  Do  
> user-defined types in D have any/many performance disadvantages compared  
> to built-ins?

No, you would most likely use templates, not void pointers.  D's template  
system is far advanced past C++, and I used it to implement my custom  
allocators.  It works great.

User-defined types are as high performance as builtins as long as the  
compiler inlines properly.

>
>>  BTW, I made the change to the runtime renaming the function previously  
>> known as setCapacity to reserve.  It won't be a property, even if that  
>> bug is fixed.
>>  -Steve
>
> That's a bit of a downer, since a capacity property would have nice  
> symmetry with the length property.  I suppose there were good reasons  
> though.  Considering the name change, does that mean reserve can only  
> reserve new space, i.e. it can't free any that's already been allocated?

Capacity still exists as a read-only property.  I did like the symmetry,  
but the point was well taken that the act of setting the capacity was not  
exact.  It does mean that reserving space can only grow, not shrink.  In  
fact, the capacity property calls the same runtime function as reserve,  
just passing 0 as the amount requested to get the currently reserved space.

You can't use capacity to free space because that could result in dangling  
pointers.  Freeing space is done through the delete keyword.  We do not  
want to make it easy to accidentally free space.

>   (That makes me wonder:  Out of curiosity, how does the garbage  
> collector know how much space is allocated to a dynamic array or  
> especially to a void pointer?  I suppose it's registered somewhere?)

The GC can figure out what page an interior pointer belongs to, and  
therefore how much memory that block uses.  There is a GC function to get  
the block info of an interior pointer, which returns a struct that  
contains the pointer to the block, the length of the block, and its flags  
(whether it contains pointers or not).  This function is what the array  
append feature uses to determine how much capacity can be used.  I believe  
this lookup is logarithmic in complexity.

-Steve