std.d.lexer requirements

Thu Aug 2 11:08:23 PDT 2012

On 8/2/2012 4:52 AM, deadalnix wrote:
> Le 02/08/2012 09:30, Walter Bright a écrit :
>> On 8/1/2012 11:49 PM, Jacob Carlborg wrote:
>>> On 2012-08-02 02:10, Walter Bright wrote:
>>>
>>>> 1. It should accept as input an input range of UTF8. I feel it is a
>>>> mistake to templatize it for UTF16 and UTF32. Anyone desiring to feed it
>>>> UTF16 should use an 'adapter' range to convert the input to UTF8. (This
>>>> is what component programming is all about.)
>>>
>>> I'm no expert on ranges but won't that prevent slicing? Slicing is one
>>> of the
>>> main reasons for why the Tango XML parser is so amazingly fast.
>>>
>>
>> You don't want to use slicing on the lexer. The reason is that your
>> slices will be spread all over memory, as source files can be huge, and
>> all that memory will be retained and never released. What you want is a
>> compact representation after lexing. Compactness also helps a lot with
>> memory caching.
>>
>
> Token are not kept in memory. You usually consume them for other processing and
> throw them away.
>
> It isn't an issue.

The tokens are not kept, correct. But the identifier strings, and the string 
literals, are kept, and if they are slices into the input buffer, then 
everything I said applies.