DMD 1.021 and 2.004 releases

Kirk McDonald kirklin.mcdonald at gmail.com
Mon Sep 10 16:18:39 PDT 2007


Walter Bright wrote:
> Kirk McDonald wrote:
> 
>> Walter Bright wrote:
>>
>>> The more unusual feature is the token delimited strings.
>>
>>
>> Which, since there's no nesting going on, are actually very easy to 
>> match. The Pygments lexer matches them with the following regex:
>>
>> q"([a-zA-Z_]\w*)\n.*?\n\1"
> 
> 
> I meant the:
> 
>     q{ these must be valid D tokens { and brackets nest } /* ignore this 
> } */ };
> 

Those are also fairly easy. The Pygments lexer only highlights the 
opening q{ and the closing }. The tokens inside of the string are 
highlighted normally.

Since this lexer is the one used by Dsource, I've thrown together a wiki 
page showing it off:

http://www.dsource.org/projects/dsource/wiki/DelimitedStringHighlighting

A note about this lexer: It uses a combination of regular expressions, a 
state machine, and a stack. When a regex matches, you usually just 
specify that the matching text should be highlighted as such-and-such a 
token. In some cases, though, you want to push a particular state onto 
the stack, which will then swap in a different set of regexes, until 
such time as this new state pops itself off the stack.

Also, it is of course written in Python, so the code below is Python code.

For instance, the rule for the "heredoc" strings, which I mentioned 
previously, looks like this:

         (r'q"([a-zA-Z_]\w*)\n.*?\n\1"', String),

That is, it takes the chunk of text matched by that regex, and 
highlights it as a string.

The entry point for token strings is the following rule:

         (r'q{', String, 'token_string'),

Or: Highlight the token "q{" as a string, then push the 'token_string' 
state onto the stack. (This third argument is optional, and most of the 
rules do not have it.) The 'token_string' state looks like this:

         'token_string': [
             (r'{', Punctuation, 'token_string_nest'),
             (r'}', String, '#pop'),
             include('root'),
         ],
         'token_string_nest': [
             (r'{', Punctuation, '#push'),
             (r'}', Punctuation, '#pop'),
             include('root'),
         ],

include('root') tells it to include the contents of the 'root' state. 
(Which is the state the D lexer starts out in, which has all of the 
regular tokens in it.) '#push' means to push the current state onto the 
stack again, and '#pop' means to pop off of the stack. By putting the 
rules for '{' and '}' before the 'root' state, we override their default 
behavior. (Which is just to be highlighted as punctuation.)

These two nearly-identical states are needed because we only want to 
highlight '}' as a string when it is the last one in the token string. 
When '}' is closing a nested brace, we want to highlight it as regular 
punctuation, and pop off of the stack.

Even if the above is gibberish to you, I still assert that it's quite 
straightforward, and indeed is very much like how the nesting /+ +/ 
comments were already highlighted. (Albeit without the include('root') 
call, and only one extra state.)

All of this is built on the Pygments lexer framework. All I had to do 
was define the big list of regexes, and the occasional extra state (as 
I've outlined above).

-- 
Kirk McDonald
http://kirkmcdonald.blogspot.com
Pyd: Connecting D and Python
http://pyd.dsource.org



More information about the Digitalmars-d-announce mailing list