Feature Request: Linked Arrays

Thu May 1 23:48:27 PDT 2008

Robert Fraser wrote:
> downs wrote:
>> Since one of D's weaknesses is bad speed when appending to dynamic 
>> arrays, I've been thinking about a way to fix this.
>>
>> Linked arrays are a hybrid between flat arrays and linked list: a 
>> single linked list of array chunks.
>>
>> Basically, a linked array (syntax: T[>], new T[50>]) looks like this:
>>
>> struct LinkedArray {
>>   void* start;
>>   size_t my_length;
>>   size_t length();
>>   LinkedArray* nextBlock, lastBlock;
>> }
>>
>> When appending to a LinkedArray, it is first checked if the current 
>> block can be grown to accomodate the new member(s) *without* 
>> reallocating.
>>
>> If it can't, instead of reallocating and copying, a new LinkedArray is 
>> tacked on the end.
>>
>> Because of this, there is never redundancy as with current arrays, and 
>> the load on the GC can be vastly reduced.
>>
>> Appending is also much closer to linear than with current arrays. This 
>> makes Linked Arrays highly suitable for queues and buffers.
>>
>> On the other hand, they're completely incompatible with C libraries, 
>> don't expose a .ptr property (not flat), and because of the 
>> complexities of their layout, all slices have to be constant (value, 
>> not reference).
>>
>> But these disadvantages are, imho, more than made up by the benefits 
>> they'd offer as an alternative to current arrays.
>>
>> Also, they don't need new keywords :)
>>
>> Whaddya think?
>>
>>  --downs
> 
> I'd rather ask the GC to expose a canExpand() method and have it 
> implemented in the library.

I agree.

BTW, to state the obvious:

"Linked arrays" have several dis-advantages when compared to arrays. 
Namely after a while of lots of insertions deletions they become very 
fragmented and performance slows down.   The main thing you have to 
watch out for is traversal. Often I've seen code slow down because 
people write a data structure to speed up insertion, not realizing that 
most of the time was spent in traversal.  By using links the work of the 
pre-fetcher, code optimizer and cache is increased.

Granted, lots of long small arrays are better then a link-list however 
it has nothing on single array traversal.  So really to figure out how 
your data will be used.  How many times will a node be visited verse how 
many times you'll insert something.  As a game programmer, typically 
array nodes have the potential to be visited several times (maybe every 
frame) however things can only be inserted once (and normally at the end 
of the array).

One think what would help, is if you could get the next free aligned 
block in memory from the previous block.  That way the array would be 
slightly more cache friendly.

I'm not saying this sort of data struct doesn't have it's uses, 
sometimes its useful for minimizing Memory Manager fragmentation.  In 
fact if you have a huge array, sometimes the only way to get around 
fragmentation is to spit it up.

-Joel