GC.sizeOf(array.ptr)

Tue Sep 30 09:01:54 PDT 2014

On Tuesday, 30 September 2014 at 15:46:54 UTC, Steven 
Schveighoffer wrote:
> On 9/30/14 10:24 AM, Dicebot wrote:
>> On Tuesday, 30 September 2014 at 14:01:17 UTC, Steven 
>> Schveighoffer wrote:
>>>> Assertion passes with D1/Tango runtime but fails with 
>>>> current D2
>>>> runtime. This happens because `result.ptr` is not actually a 
>>>> pointer
>>>> returned by gc_qalloc from array reallocation, but interior 
>>>> pointer 16
>>>> bytes from the start of that block. Druntime stores some 
>>>> metadata
>>>> (length/capacity I presume) in the very beginning.
>>>
>>> This is accurate, it stores the "used" size of the array. But 
>>> it's
>>> only the case for arrays, not general GC.malloc blocks.
>>>
>>> Alternative is to use result.capacity, which essentially 
>>> looks up the
>>> same thing (and should be more accurate). But it doesn't 
>>> cover the
>>> same inputs.
>>
>> Why is it stored in the beginning and not in the end of the 
>> block (like
>> capacity)? I'd like to explore options of removing interior 
>> pointer
>> completely before proceeding with adding more special cases to 
>> GC
>> functions.
>
> First, it is the capacity. It's just that the capacity lives at 
> the beginning of larger blocks.
>
> The reason is due to the ability to extend pages.
>
> With smaller blocks (2048 bytes or less), the page is divided 
> into equal portions, and those can NEVER be extended. Any 
> attempt to extend results in a realloc into another block. 
> Putting the capacity at the end makes sense for 2 reasons: 1. 1 
> byte is already reserved to prevent cross-block pointers, 2. It 
> doesn't cause alignment issues. We can't very well offset a 16 
> byte block by 16 bytes. But importantly, the capacity field 
> does not move.
>
> However, for page and above size (4096+ bytes), the original 
> (D1 and early D2) runtime would attempt to extend into the next 
> page, without moving the data. Thus we save the copy of data 
> into a new block, and just set some bits and we're done.

Ah that must be what confused me - I looked at small block offset 
calculation originally and blindly assumed same logic for other 
sizes. Sorry, my fault!

> But this poses a problem for when the capacity field is stored 
> at the end -- especially since we are caching the block info. 
> The block info can change with a call to GC.extend (whereas a 
> fixed-size block, the block info CANNOT change). Depending on 
> what "version" of the block info you have, the "end" can be 
> different, and you may end up corrupting data. This is 
> especially important for shared or immutable array blocks, 
> where multiple threads could be appending at the same time.
>
> So I made the call to put it at the beginning of the block, 
> which obviously doesn't change, and offset everything by 16 
> bytes to maintain alignment.
>
> It may very well be that we can put it at the end of the block 
> instead, and you can probably do so without much effort in the 
> runtime (everything uses CTFE functions to calculate padding 
> and location of the capacity). It has been such a long time 
> since I did that, I'm not very sure of all the reasons not to 
> do it. A look through the mailing list archives might be useful.

I think it should be possible. That way actual block size will be 
simply considered a bit smaller and extending happen before 
reserved space is hit. But of course I have only a very vague 
knowledge of druntime ackquired while porting cdgc so may need to 
think about it a bit more and probably chat with Leandro too :)

Have created bugzilla issue for now : 
https://issues.dlang.org/show_bug.cgi?id=13558