Request for comments: std.d.lexer

Fri Feb 1 13:10:33 PST 2013

On 2/1/2013 3:22 AM, Dmitry Olshansky wrote:
> 01-Feb-2013 15:05, Walter Bright пишет:
>> On 1/30/2013 8:44 AM, Dmitry Olshansky wrote:
>>> In allocation scheme I proposed that ID could be a 32bit offset into
>>> the unique
>>> identifiers chunk.
>>
>> That only works if you know in advance the max size the chunk can ever
>> be and preallocate it. Otherwise, you have no guarantee that the next
>> allocated chunk will be within 32 bits of address of the previous chunks.
>>
>
> Well I supposed it's exactly one reallocatable block. Then token have an offset
> that doesn't care if the block was reallocated.
>
> Or rather the reallocating just RESERVE virtual RAM for it (say 1G), and COMMIT
> it page by page when you need to grow it. Once lexing is done, shrink virtual
> region to the actual used size to free up address space (e.g. if we are on 32bits).
>
> AS for 32bit limit that gives 4Gb maximum of the cumulative length of all unique
> identifier names is more then enough by any standard. I haven't seen a 4G
> codebase not to speak of identifiers alone that even if we count all the
> repetitions separately.

Your technique can work, provided the number of identifiers isn't large enough 
that memory fragmentation will prevent being able to reallocate the buffer to a 
larger size.