Looking for champion - std.lang.d.lex

Sat Oct 23 12:27:06 PDT 2010

On 10/23/10 13:41 CDT, Walter Bright wrote:
> Andrei Alexandrescu wrote:
>> During compilation, such non-tokens are recognized as code by the
>> lexer generator and called appropriately. A comprehensive library of
>> such routines completes a useful library.
>
> I agree, a set of "canned" and heavily optimized lexing functions for
> common things like identifiers, numbers, comments, etc., would make a
> lexing library much more practical.
>
> Those will work great for inventing DSLs, but for existing languages,
> the trouble is that the different languages have subtle variations on
> how they handle them. For example, D's numeric literals allow embedded
> underscores. Go doesn't overflow on numeric literals. Javascript has
> some wacky rules to distinguish a comment from a regex. The \uNNNN
> letters allowed in identifiers in some languages.
>
> So while a general purpose lexing library will be very useful, for
> lexing D code (and Java, Javascript, etc.) a custom one will probably be
> much more practical.

I don't see these two in tension. "General" does not need entail 
"unsuitable for subtle particularities". It is more difficult, but not 
impossible. Again, a general parser that takes care of the 90% of the 
drudgework and gives enough hooks to do the remaining 10%, all as 
efficient as hand-written code.

Andrei