DMD 1.021 and 2.004 releases
Kirk McDonald
kirklin.mcdonald at gmail.com
Mon Sep 10 16:18:39 PDT 2007
Walter Bright wrote:
> Kirk McDonald wrote:
>
>> Walter Bright wrote:
>>
>>> The more unusual feature is the token delimited strings.
>>
>>
>> Which, since there's no nesting going on, are actually very easy to
>> match. The Pygments lexer matches them with the following regex:
>>
>> q"([a-zA-Z_]\w*)\n.*?\n\1"
>
>
> I meant the:
>
> q{ these must be valid D tokens { and brackets nest } /* ignore this
> } */ };
>
Those are also fairly easy. The Pygments lexer only highlights the
opening q{ and the closing }. The tokens inside of the string are
highlighted normally.
Since this lexer is the one used by Dsource, I've thrown together a wiki
page showing it off:
http://www.dsource.org/projects/dsource/wiki/DelimitedStringHighlighting
A note about this lexer: It uses a combination of regular expressions, a
state machine, and a stack. When a regex matches, you usually just
specify that the matching text should be highlighted as such-and-such a
token. In some cases, though, you want to push a particular state onto
the stack, which will then swap in a different set of regexes, until
such time as this new state pops itself off the stack.
Also, it is of course written in Python, so the code below is Python code.
For instance, the rule for the "heredoc" strings, which I mentioned
previously, looks like this:
(r'q"([a-zA-Z_]\w*)\n.*?\n\1"', String),
That is, it takes the chunk of text matched by that regex, and
highlights it as a string.
The entry point for token strings is the following rule:
(r'q{', String, 'token_string'),
Or: Highlight the token "q{" as a string, then push the 'token_string'
state onto the stack. (This third argument is optional, and most of the
rules do not have it.) The 'token_string' state looks like this:
'token_string': [
(r'{', Punctuation, 'token_string_nest'),
(r'}', String, '#pop'),
include('root'),
],
'token_string_nest': [
(r'{', Punctuation, '#push'),
(r'}', Punctuation, '#pop'),
include('root'),
],
include('root') tells it to include the contents of the 'root' state.
(Which is the state the D lexer starts out in, which has all of the
regular tokens in it.) '#push' means to push the current state onto the
stack again, and '#pop' means to pop off of the stack. By putting the
rules for '{' and '}' before the 'root' state, we override their default
behavior. (Which is just to be highlighted as punctuation.)
These two nearly-identical states are needed because we only want to
highlight '}' as a string when it is the last one in the token string.
When '}' is closing a nested brace, we want to highlight it as regular
punctuation, and pop off of the stack.
Even if the above is gibberish to you, I still assert that it's quite
straightforward, and indeed is very much like how the nesting /+ +/
comments were already highlighted. (Albeit without the include('root')
call, and only one extra state.)
All of this is built on the Pygments lexer framework. All I had to do
was define the big list of regexes, and the occasional extra state (as
I've outlined above).
--
Kirk McDonald
http://kirkmcdonald.blogspot.com
Pyd: Connecting D and Python
http://pyd.dsource.org
More information about the Digitalmars-d-announce
mailing list