Looking for champion - std.lang.d.lex

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Oct 22 14:48:09 PDT 2010


On 10/22/10 16:28 CDT, Sean Kelly wrote:
> Andrei Alexandrescu Wrote:
>>
>> I have in mind the entire implementation of a simple design, but never
>> had the time to execute on it. The tokenizer would work like this:
>>
>> alias Lexer!(
>>       "+", "PLUS",
>>       "-", "MINUS",
>>       "+=", "PLUS_EQ",
>>       ...
>>       "if", "IF",
>>       "else", "ELSE"
>>       ...
>> ) DLexer;
>>
>> Such a declaration generates numeric values DLexer.PLUS etc. and
>> generates an efficient code that extracts a stream of tokens from a
>> stream of text. Each token in the token stream has the ID and the text.
>
> What about, say, floating-point literals?  It seems like the first element of a pair might have to be a regex pattern.


Yah, with regard to such regular patterns (strings, comments, numbers, 
identifiers) there are at least two possibilities that I see:

1. Go the full route of allowing regexen in the definition. This is very 
hard because you need to generate an efficient (N|D)FA during compilation.

2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the 
compile-time table matches, just call onUnrecognizedString(). In 
conjunction with a few simple specialized functions, that makes it very 
simple to define arbitrarily complex lexers where the bulk of the work 
(and the most tedious part) is done by the D compiler.


Andrei



More information about the Digitalmars-d mailing list