Notepad++

Fri Aug 14 09:56:44 PDT 2009

Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:

> Sergey Gromov wrote:
>> Well I think it's hard to create a regular expression engine flexible
>> enough to allow arbitrary highlighting.
> 
> I can't see how it can be at all complicated to find the beginning and 
> end of a C string or character literal.
> 
> This (Posix?) regexp
> 
> "(\\.|[^\\"])*"
> 
> works as I try (though not in the tiny subset of Posix regexps that N++ 
> understands).  But that's an aside - you don't need regexps at all to 
> get it working at this basic level, only a rudimentary concept of escape 
> sequences.
> 
>> I think the best such engine
>> I've seen was Colorer by Igor Russkih, and even there I wasn't able to
>> express D's WYSIWYG or delimited strings.  You need a real programming
>> language for that.
> 
> For WYSIWYG strings, all that's needed is a generic highlighter that 
> supports:
> - the aforementioned string escapes
> - multiple types of string literals distinguished by whether they 
> support string escapes, and not just delimiters
> 
> TextPad's syntax highlighting engine manages 2/3 of this without any 
> regexps (or anything to that effect).  That said, I've just found that 
> it can do a little bit of what remains: I can make it do `...` but not 
> r"..." at the expense of distinguishing string and character literals.
> 
> But token-delimited strings are indeed more complex to deal with.  (How 
> many people do we have putting them to practical use at the moment, for 
> that matter?)

Well, you can write a regexp to handle a simple C string.  That is, if
your regexp is matched against the whole file, which is usually not the
case.  Otherwise you'll have troubles with C string:

"foo\
bar"

or D string:

"foo
bar"

Then you want to highlight string escapes and probably format
specifiers.  Therefore you need not simple regexps but hierarchies of
them, and also you need to know where *internals* of the string start
and end.

Then you have r"foo" which probably can be handled with regexps.

Then you have q"/foo/" where "/" can be anything.  Still can be handled
by extended regexps, even though they won't be regular expressions in
scientific sense.

Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
Regexps cannot translate while substituting, so you must create regexps
for all possible parens.

And of course q"BLAH
whatever BLAH here
BLAH", well, probably nice for help texts.

And these are only strings.  Try to write regexp which treats .__15 as
number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as
number(2), operator(..), number(3).

> Scintilla's definition of a plugin is confusing - normally plugins are 
> things that can be dynamically loaded at runtime, rather than having to 
> compile them in.  If only....

I'm not sure they call them "plugins".  They're lexer modules made so
that lexer is relatively easily extendable.