D array expansion and non-deterministic re-allocation

Tue Nov 24 09:44:40 PST 2009

Bill Baxter wrote:
> On Tue, Nov 24, 2009 at 8:26 AM, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>> Steven Schveighoffer wrote:
>>> On Tue, 24 Nov 2009 11:01:10 -0500, Andrei Alexandrescu
>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>
>>>> Steven Schveighoffer wrote:
>>>> [snip]
>>>>> Andrei has mentioned that he thinks we can store the allocated length in
>>>>> the GC block, which I think would also work.  You also wouldn't need an MRU
>>>>> cache in that case, but he says it's in *addition* to the MRU cache, so I'm
>>>>> not sure what he means.
>>>> [snip]
>>>>
>>>> Reaching the GC block is relatively expensive, so the MRU still helps. In
>>>> essence it's like this. When appending:
>>>>
>>>> a) look up the cache, if there, you have O(1) amortized append that's
>>>> really fast
>>>>
>>>> b) if not in the cache, talk to the GC, still O(1) amortized append
>>>> that's not as fast
>>>>
>>>> Both help providing an important performance guarantee. I was a bit
>>>> worried about guaranteeing "O(1) amortized append for up to 8 arrays at a
>>>> time."
>>> Have you considered the performance impact on allocating non-array types?
>>>  That is, are you intending on all allocations storing the allocated length,
>>> even class or struct allocations who will likely never append?  Or will
>>> there be a "non-appendable" flag?
>>>
>>> Also, the part without the MRU cache was my original proposal from last
>>> year, I had some ideas on how length could be stored.  For example, in a
>>> page of up to 128 byte blocks, you only need 8 bits to store the length
>>> (alas, you cannot store with 4 bits for 16-byte blocks because you need to
>>> cover both 0 and 16).  This reduces the overhead for those blocks.  For 256
>>> byte to 1-page blocks, 16 bits is acceptable multi-page blocks, the cost of
>>> storing a 32-bit integer is negligible.
>> Having access to the requested length is important at larger lengths, so
>> probably we could be fine by not actually storing it for allocations up to
>> 128 bytes.
>>
>>> It is true the lookup of the MRU cache will not involve dissecting the
>>> address of the block to find it's container block, but you still will need a
>>> lock, or are you planning on doing something clever?   I think the locking
>>> will be the bottleneck, and if you don't make it the same as the global
>>> lock, add the cost of both locks when you actually need to append.
>> The cache is a thread-local map from pointers to size_t. Using it does not
>> require any locking I think.
> 
> Wait, what happens if you're sharing an array between two different threads?

The cache does not work with shared arrays, and threads cannot share 
non-shared arrays.

Andrei