So... let's document dmd

Tue Apr 5 19:59:50 PDT 2016

On Tuesday, 5 April 2016 at 21:37:09 UTC, Walter Bright wrote:
> On 4/5/2016 6:47 AM, Basile B. wrote:
>> Also lexing number doesn't need to be as accurate as the
>> front-end of the compiler (especially if the HL doesnt have a 
>> token type for the
>> illegal "lexem".
>
> That is an interesting design point. If I was doing a 
> highlighter, I'd highlight in red tokens that the compiler 
> would reject, meaning I'd do the accurate number lexing.
>
> Lexing numbers correctly is not trivial, but since the compiler 
> lexer's implementation can be cut/pasted, it is trivial in 
> practice.

Even if when the most naive lexer see a number and consumes until 
a blank, a symbol or an operator, it's clear that this can be 
done:

http://i.imgur.com/ehjps04.png

Actually numbers is the only part of the D lexer where errors can 
be detected.
There's no possible syntax errors otherwise.

But one thing I forget to say in my previous post is that lexing 
can be "multi-pass". The D front-end does everything in a single 
pass, for example it direclty detects tokPlusPlus or tokXorEqu, 
but actually a multi pass lexer can work in 3 sub phases:
1/ split words
2/ detects token families in the words; identifier, keyword, 
operator, etc.
3/ specialize tokens: tokOp.data == "++" -> tokPlusPlus