GC.sizeOf(array.ptr)
Dicebot via Digitalmars-d
digitalmars-d at puremagic.com
Tue Sep 30 09:01:54 PDT 2014
On Tuesday, 30 September 2014 at 15:46:54 UTC, Steven
Schveighoffer wrote:
> On 9/30/14 10:24 AM, Dicebot wrote:
>> On Tuesday, 30 September 2014 at 14:01:17 UTC, Steven
>> Schveighoffer wrote:
>>>> Assertion passes with D1/Tango runtime but fails with
>>>> current D2
>>>> runtime. This happens because `result.ptr` is not actually a
>>>> pointer
>>>> returned by gc_qalloc from array reallocation, but interior
>>>> pointer 16
>>>> bytes from the start of that block. Druntime stores some
>>>> metadata
>>>> (length/capacity I presume) in the very beginning.
>>>
>>> This is accurate, it stores the "used" size of the array. But
>>> it's
>>> only the case for arrays, not general GC.malloc blocks.
>>>
>>> Alternative is to use result.capacity, which essentially
>>> looks up the
>>> same thing (and should be more accurate). But it doesn't
>>> cover the
>>> same inputs.
>>
>> Why is it stored in the beginning and not in the end of the
>> block (like
>> capacity)? I'd like to explore options of removing interior
>> pointer
>> completely before proceeding with adding more special cases to
>> GC
>> functions.
>
> First, it is the capacity. It's just that the capacity lives at
> the beginning of larger blocks.
>
> The reason is due to the ability to extend pages.
>
> With smaller blocks (2048 bytes or less), the page is divided
> into equal portions, and those can NEVER be extended. Any
> attempt to extend results in a realloc into another block.
> Putting the capacity at the end makes sense for 2 reasons: 1. 1
> byte is already reserved to prevent cross-block pointers, 2. It
> doesn't cause alignment issues. We can't very well offset a 16
> byte block by 16 bytes. But importantly, the capacity field
> does not move.
>
> However, for page and above size (4096+ bytes), the original
> (D1 and early D2) runtime would attempt to extend into the next
> page, without moving the data. Thus we save the copy of data
> into a new block, and just set some bits and we're done.
Ah that must be what confused me - I looked at small block offset
calculation originally and blindly assumed same logic for other
sizes. Sorry, my fault!
> But this poses a problem for when the capacity field is stored
> at the end -- especially since we are caching the block info.
> The block info can change with a call to GC.extend (whereas a
> fixed-size block, the block info CANNOT change). Depending on
> what "version" of the block info you have, the "end" can be
> different, and you may end up corrupting data. This is
> especially important for shared or immutable array blocks,
> where multiple threads could be appending at the same time.
>
> So I made the call to put it at the beginning of the block,
> which obviously doesn't change, and offset everything by 16
> bytes to maintain alignment.
>
> It may very well be that we can put it at the end of the block
> instead, and you can probably do so without much effort in the
> runtime (everything uses CTFE functions to calculate padding
> and location of the capacity). It has been such a long time
> since I did that, I'm not very sure of all the reasons not to
> do it. A look through the mailing list archives might be useful.
I think it should be possible. That way actual block size will be
simply considered a bit smaller and extending happen before
reserved space is hit. But of course I have only a very vague
knowledge of druntime ackquired while porting cdgc so may need to
think about it a bit more and probably chat with Leandro too :)
Have created bugzilla issue for now :
https://issues.dlang.org/show_bug.cgi?id=13558
More information about the Digitalmars-d
mailing list