Looking for champion - std.lang.d.lex

Sat Oct 23 15:17:02 PDT 2010

"Andrei Alexandrescu" <SeeWebsiteForEmail at erdani.org> wrote in message 
news:i9vlep$8ao$1 at digitalmars.com...
> On 10/23/10 16:39 CDT, Nick Sabalausky wrote:
>> "Andrei Alexandrescu"<SeeWebsiteForEmail at erdani.org>  wrote in message
>> news:i9v8vq$2gvh$1 at digitalmars.com...
>> What's wrong with regexes? That's pretty typical for lexers.
>
> I mentioned that using regexes is possible but would make it much more 
> difficult to generate good quality lexers.

I see. Maybe a lexer 2.0 thing.

>
> Besides, regexen are IMHO quite awkward at expressing certain things that 
> can be easily parsed by hand, such as comments

//[^\n]*\n

/\*(.|\*[^/])*\*/

Pretty simple as far as regexes go, and I'm far from a regex expert. Plus 
there's nothing stopping the use of a vastly improved regex syntax like GOLD 
uses ( 
http://www.devincook.com/goldparser/doc/grammars/define-terminals.htm ). In 
that, the two regexes above would look like:

{LineCommentChar} = {Printable} - {LF}
LineComment = '//' {LineCommentChar}* {LF}

{BlockCommentChar} = {Printable} - [*]
{BlockCommentCharNoSlash} = {BlockCommentChar} - [/]
BlockComment = '/*' ({BlockCommentChar} | '*' {BlockCommentCharNoSlash})* 
'*/'

And further syntactical improvement is easy to imagine, such as in-line 
character set creation.

> or recursive comments.
>

Granted, although I think there is precident for regex engines that can 
handle matched nested pairs just fine.