So... let's document dmd

Tue Apr 5 06:47:55 PDT 2016

On Tuesday, 5 April 2016 at 08:46:30 UTC, Walter Bright wrote:
> On 1/16/2016 7:13 AM, H. S. Teoh via Digitalmars-d wrote:
>> I disagree. I think having the dmd itself (lexer, parser, 
>> etc.) as a
>> library (with the dmd executable merely being the default 
>> frontend) will
>> do D a lot of good.
>>
>> For one thing, IDE's will no longer need to reinvent a D 
>> parser for the
>> purposes of syntax highlighting;
>
> On the other hand, using lexer.d and parse.d as a guide to 
> build your own is a trivial undertaking. The Boost license is 
> designed so this can be done without worrying about making a 
> derived work.
>
> I looked into doing syntax highlighting for my editor, 
> MicroEmacs. It turns out it is not so easy to just use a 
> compiler lexer/parser for it. For one thing, the one used in 
> the compiler is optimized for speed in a forward pass through 
> the text.
>
> But a syntax highlighter in a text editor is different. Suppose 
> I change a character in the middle of a line. All the 
> highlighting from that point forward may change. And to figure 
> out what that change is, the parser/lexer has to start over 
> from the beginning of the file! (Think string literals, nested 
> comments, quoted string literals, etc.) This would make editing 
> slow.

This is how CE highlither works. The lexer used to highlight 
processes line by line. For each line infos about the previous 
line are available (nested comments count, region kind like 
quoted string, raw quoted string, asm, etc).

Also keyword detection might use different dictionaries, up to 3 
(one for the keywords, another for special keywords like 
__FILE__, a third for asm opcodes). Operators doesn't need to 
have a special token (tokxorequ, toplusplus, etc) there's just 
one. Also lexing number doesn't need to be as accurate as the 
front-end of the compiler (especially if the HL doesnt have a 
token type for the illegal "lexem".

Another big difference is that the lexer used by an highlighter 
doesn't store the identifier associated to a token.

Using the front-end (or even libdparse) would require a 
modularisation. But lexing is not hard so I don't think it's 
worse. At last it would only be used by 4 or 5 softwares.