Looking for champion - std.lang.d.lex

Tomek Sowiński just at ask.me
Fri Oct 22 14:32:01 PDT 2010


Dnia 22-10-2010 o 21:48:49 Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> napisał(a):

> On 10/22/10 14:02 CDT, Tomek Sowiński wrote:
>> Interesting idea. Here's another: D will soon need bindings for CORBA,
>> Thrift, etc, so lexers will have to be written all over to grok
>> interface files. Perhaps a generic tokenizer which can be parametrized
>> with a lexical grammar would bring more ROI, I got a hunch D's templates
>> are strong enough to pull this off without any source code generation
>> ala JavaCC. The books I read on compilers say tokenization is a solved
>> problem, so the theory part on what a good abstraction should be is
>> done. What you think?
>
> Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer  
> generator.
>
> I have in mind the entire implementation of a simple design, but never  
> had the time to execute on it. The tokenizer would work like this:
>
> alias Lexer!(
>      "+", "PLUS",
>      "-", "MINUS",
>      "+=", "PLUS_EQ",
>      ...
>      "if", "IF",
>      "else", "ELSE"
>      ...
> ) DLexer;

Yes. One remark: native language constructs scale better for a grammar:

enum TokenDef : string {
     Digit = "[0-9]",
     Letter = "[a-zA-Z_]",
     Identifier = Letter~'('~Letter~'|'~Digit~')',
     ...
     Plus = "+",
     Minus = "-",
     PlusEq = "+=",
     ...
     If = "if",
     Else = "else",
     ...
}
alias Lexer!TokenDef DLexer;

BTW, there's a bug related:
http://d.puremagic.com/issues/show_bug.cgi?id=2950

> Such a declaration generates numeric values DLexer.PLUS etc. and  
> generates an efficient code that extracts a stream of tokens from a  
> stream of text. Each token in the token stream has the ID and the text.

All good ideas.

> Comments, strings etc. can be handled in one of several ways but that's  
> a longer discussion.

The discussion's started anyhow. So what're the options?

-- 
Tomek


More information about the Digitalmars-d mailing list