Let's stop parser Hell

Tue Jul 31 22:39:45 PDT 2012

On Wednesday, August 01, 2012 07:26:07 Philippe Sigaud wrote:
> On Wed, Aug 1, 2012 at 12:58 AM, Jonathan M Davis <jmdavisProg at gmx.com> 
wrote:
> > On Wednesday, August 01, 2012 00:54:56 Timon Gehr wrote:
> >> Ddoc is typically not required. By default it should be treated as
> >> whitespace. If it is required, one token seems reasonable: The
> >> post-processing of the doc comment is best done as a separate step.
> > 
> > That was how I was intending to deal with ddoc. It's just a nested block
> > comment token. The whole comment string is there, so the ddoc processor
> > can
> > use that to do whatever it does. ddoc isn't part of lexing really. It's a
> > separate thing.
> 
> OK. Same for standard comment and doc comments?

>From the TokenType enum declaration:

    blockComment,         /// $(D /* */)
    lineComment,          /// $(D // )
    nestingBlockComment,  /// $(D /+ +/)

There are then functions which operate on Tokens to give you information about 
them. Among them is isDdocComment, which will return true if the Token type is 
a comment, and that comment is a ddoc comment (i.e. starts with /**, ///, or 
/++ rather than /*, //, or /+). So, anything that wants to process ddoc 
comments can lex them out and process them, and if they want to know what 
symbols that a ddoc comment applies to, then they look at the tokens that 
follow (though a full-on parser would be required to do that correctly).

> I was wondering how to get the code possibly inside a ---- / ----
> block (I never dealt with something like documentation or syntax
> highlighting), but your solution makes it easy:
> 
> Toten(TokenType.DocComment, "/** ... */"), Token(TokenType.Module,
> "module"), ...
> 
> A small specialised parser can then extract text, DDocs macros and
> code blocks from inside the comment. Findind and stripping '----' is
> easy and then the lexer can be locally reinvoked on the slice
> containing the example code.

Yes. The lexer isn't concerned with where the text comes from, and it isn't 
concerned with lexing comments beyond putting them in a token. But that should 
be powerful enough to lex the examples if you've already extracted them.

- Jonathan M Davis