Looking for champion - std.lang.d.lex
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Fri Oct 22 12:48:49 PDT 2010
On 10/22/10 14:02 CDT, Tomek Sowiński wrote:
> Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 at digitalmars.com>
> napisał(a):
>
>> As we all know, tool support is important for D's success. Making
>> tools easier to build will help with that.
>>
>> To that end, I think we need a lexer for the standard library -
>> std.lang.d.lex. It would be helpful in writing color syntax
>> highlighting filters, pretty printers, repl, doc generators, static
>> analyzers, and even D compilers.
>>
>> It should:
>>
>> 1. support a range interface for its input, and a range interface for
>> its output
>> 2. optionally not generate lexical errors, but just try to recover and
>> continue
>> 3. optionally return comments and ddoc comments as tokens
>> 4. the tokens should be a value type, not a reference type
>> 5. generally follow along with the C++ one so that they can be
>> maintained in tandem
>>
>> It can also serve as the basis for creating a javascript
>> implementation that can be embedded into web pages for syntax
>> highlighting, and eventually an std.lang.d.parse.
>>
>> Anyone want to own this?
>
> Interesting idea. Here's another: D will soon need bindings for CORBA,
> Thrift, etc, so lexers will have to be written all over to grok
> interface files. Perhaps a generic tokenizer which can be parametrized
> with a lexical grammar would bring more ROI, I got a hunch D's templates
> are strong enough to pull this off without any source code generation
> ala JavaCC. The books I read on compilers say tokenization is a solved
> problem, so the theory part on what a good abstraction should be is
> done. What you think?
Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer
generator.
I have in mind the entire implementation of a simple design, but never
had the time to execute on it. The tokenizer would work like this:
alias Lexer!(
"+", "PLUS",
"-", "MINUS",
"+=", "PLUS_EQ",
...
"if", "IF",
"else", "ELSE"
...
) DLexer;
Such a declaration generates numeric values DLexer.PLUS etc. and
generates an efficient code that extracts a stream of tokens from a
stream of text. Each token in the token stream has the ID and the text.
Comments, strings etc. can be handled in one of several ways but that's
a longer discussion.
The undertaking is doable but nontrivial.
Andrei
More information about the Digitalmars-d
mailing list