Looking for champion - std.lang.d.lex
Tomek Sowiński
just at ask.me
Fri Oct 22 14:32:01 PDT 2010
Dnia 22-10-2010 o 21:48:49 Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> napisał(a):
> On 10/22/10 14:02 CDT, Tomek Sowiński wrote:
>> Interesting idea. Here's another: D will soon need bindings for CORBA,
>> Thrift, etc, so lexers will have to be written all over to grok
>> interface files. Perhaps a generic tokenizer which can be parametrized
>> with a lexical grammar would bring more ROI, I got a hunch D's templates
>> are strong enough to pull this off without any source code generation
>> ala JavaCC. The books I read on compilers say tokenization is a solved
>> problem, so the theory part on what a good abstraction should be is
>> done. What you think?
>
> Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer
> generator.
>
> I have in mind the entire implementation of a simple design, but never
> had the time to execute on it. The tokenizer would work like this:
>
> alias Lexer!(
> "+", "PLUS",
> "-", "MINUS",
> "+=", "PLUS_EQ",
> ...
> "if", "IF",
> "else", "ELSE"
> ...
> ) DLexer;
Yes. One remark: native language constructs scale better for a grammar:
enum TokenDef : string {
Digit = "[0-9]",
Letter = "[a-zA-Z_]",
Identifier = Letter~'('~Letter~'|'~Digit~')',
...
Plus = "+",
Minus = "-",
PlusEq = "+=",
...
If = "if",
Else = "else",
...
}
alias Lexer!TokenDef DLexer;
BTW, there's a bug related:
http://d.puremagic.com/issues/show_bug.cgi?id=2950
> Such a declaration generates numeric values DLexer.PLUS etc. and
> generates an efficient code that extracts a stream of tokens from a
> stream of text. Each token in the token stream has the ID and the text.
All good ideas.
> Comments, strings etc. can be handled in one of several ways but that's
> a longer discussion.
The discussion's started anyhow. So what're the options?
--
Tomek
More information about the Digitalmars-d
mailing list