Array append proposal

Sergey Gromov snake.scaly at gmail.com
Fri Oct 10 09:47:07 PDT 2008


Fri, 10 Oct 2008 11:11:40 -0400,
Steven Schveighoffer wrote:
> If this bit is set, then the array can be appended to in place if its length 
> equals the length stored in the heap (and of course, the memory block 
> contains enough space).  If not, then a new array is allocated, and the data 
> copied, the length stored in the heap is left alone.

Then you'd better keep the memory block size there, too.  It's the 
memory block size check which kills performance right now.

> * There are some threading issues to look at, but I am nowhere near as 
> versed in this as some of you guys.  I'd guess you can do something similar 
> to Bartosz' thin locks?  Except you would never expand the lock.  The first 
> time you lose contention, you assume you can't append in place, and allocate 
> a new memory space.  The only drawback is if the array which grabbed the 
> lock decides it will allocate a new array as well, and the second attempt to 
> append could have done it in place, you lose some performance there.  But 
> that's probably a rare corner case. The bit that is used for the 'is array 
> head' can be used for lock contention.  So something like this is stored in 
> the heap:
> 
> bit [31]: set if nobody is trying to expand array, unset if it is currently 
> being expanded.
> bit [0:30]: the stored length
> 
> So in order to expand an array, you compare and swap with your struct-local 
> length+atBeginning, swapping it for 0.
> *  success: check GC if block can accomodate new data;  if so, store the new 
> length back into the heap-stored length, with the MSB set;  if not, allocate 
> a new array, copying the data from the original.
> *  failure: allocate a new array, copying the data from the original.

I don't think a primitive type should concern itself with thread safety.  
So you can always keep simple length in the array's global length field.

> One final caveat.  Currently there exists code like this for building a 
> large array:
> 
> int[] arr = new int[10000];
> arr.length = 0;
> for(int i = 0; i < 10000; i++)
>    arr ~= i;
> 
> In the new scheme, should the code:
> 
> arr.length = 0;
> 
> cause the array to try and shrink the heap-stored length to 0?  My 
> preference is no, as this is going to be much less inefficient than 
> something like:
> 
> int[] arr = new int[10000];
> for(int i = 0; i < 10000; i++)
>    arr[i] = i;

Imagine the code

int[] arr = new int[10000];
int[] arr2 = arr;
arr.length = 0;
for(int i = 0; i < 10000; i++)
   arr[i] = i;

arr2 is overwritten and you can do nothing about that.  Downsize should 
convert arr into an ordinary slice.  So this sort of optimization stops 
working.  Well, it's a hack anyway but then you need a legitimate way of 
pre-allocating an array leaving its length zero.



More information about the Digitalmars-d mailing list