std.d.lexer : voting thread

ilya-stromberg ilya-stromberg-2009 at yandex.ru
Wed Oct 9 01:27:51 PDT 2013


On Wednesday, 9 October 2013 at 07:49:55 UTC, Andrei Alexandrescu 
wrote:
> On 10/8/13 11:11 PM, ilya-stromberg wrote:
>> On Tuesday, 8 October 2013 at 00:16:45 UTC, Andrei 
>> Alexandrescu wrote:
>>> To put my money where my mouth is, I have a proof-of-concept 
>>> tokenizer
>>> for C++ in working state.
>>>
>>> http://dpaste.dzfl.pl/d07dd46d
>>
>> Why do you use "\0" as end-of-stream token:
>>
>>   /**
>>    * All token types include regular and reservedTokens, plus 
>> the null
>>    * token ("") and the end-of-stream token ("\0").
>>    */
>>
>> We can have situation when the "\0" is a valid token, for 
>> example for
>> binary formats. Is it possible to indicate end-of-stream 
>> another way,
>> maybe via "empty" property for range-based API?
>
> I'm glad you asked. It's simply a decision by convention. I 
> know no C++ source can contain a "\0", so I append it to the 
> input and use it as a sentinel.
>
> A general lexer should take the EOF symbol as a parameter.
>
> One more thing: the trie matcher knows a priori (statically) 
> what the maximum lookahead is - it's the maximum of all 
> symbols. That can be used to pre-fill the input buffer such 
> that there's never an out-of-bounds access, even with input 
> ranges.
>
>
> Andrei

So, it's interesting to see a new improved API, because we need a 
really generic lexer. I think it's not so difficult.


More information about the Digitalmars-d mailing list