std.d.lexer : voting thread
ilya-stromberg
ilya-stromberg-2009 at yandex.ru
Wed Oct 9 01:27:51 PDT 2013
On Wednesday, 9 October 2013 at 07:49:55 UTC, Andrei Alexandrescu
wrote:
> On 10/8/13 11:11 PM, ilya-stromberg wrote:
>> On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei
>> Alexandrescu wrote:
>>> To put my money where my mouth is, I have a proof-of-concept
>>> tokenizer
>>> for C++ in working state.
>>>
>>> http://dpaste.dzfl.pl/d07dd46d
>>
>> Why do you use "\0" as end-of-stream token:
>>
>> /**
>> * All token types include regular and reservedTokens, plus
>> the null
>> * token ("") and the end-of-stream token ("\0").
>> */
>>
>> We can have situation when the "\0" is a valid token, for
>> example for
>> binary formats. Is it possible to indicate end-of-stream
>> another way,
>> maybe via "empty" property for range-based API?
>
> I'm glad you asked. It's simply a decision by convention. I
> know no C++ source can contain a "\0", so I append it to the
> input and use it as a sentinel.
>
> A general lexer should take the EOF symbol as a parameter.
>
> One more thing: the trie matcher knows a priori (statically)
> what the maximum lookahead is - it's the maximum of all
> symbols. That can be used to pre-fill the input buffer such
> that there's never an out-of-bounds access, even with input
> ranges.
>
>
> Andrei
So, it's interesting to see a new improved API, because we need a
really generic lexer. I think it's not so difficult.
More information about the Digitalmars-d
mailing list