Notepad++

Mon Aug 17 01:37:47 PDT 2009

Sergey Gromov wrote:
> Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:
> 
>> Sergey Gromov wrote:
>>> Well I think it's hard to create a regular expression engine flexible
>>> enough to allow arbitrary highlighting.
>> I can't see how it can be at all complicated to find the beginning and 
>> end of a C string or character literal.
>>
>> This (Posix?) regexp
>>
>> "(\\.|[^\\"])*"
>>
>> works as I try (though not in the tiny subset of Posix regexps that N++ 
>> understands).  But that's an aside - you don't need regexps at all to 
>> get it working at this basic level, only a rudimentary concept of escape 
>> sequences.
>>
>>> I think the best such engine
>>> I've seen was Colorer by Igor Russkih, and even there I wasn't able to
>>> express D's WYSIWYG or delimited strings.  You need a real programming
>>> language for that.
>> For WYSIWYG strings, all that's needed is a generic highlighter that 
>> supports:
>> - the aforementioned string escapes
>> - multiple types of string literals distinguished by whether they 
>> support string escapes, and not just delimiters
>>
>> TextPad's syntax highlighting engine manages 2/3 of this without any 
>> regexps (or anything to that effect).  That said, I've just found that 
>> it can do a little bit of what remains: I can make it do `...` but not 
>> r"..." at the expense of distinguishing string and character literals.
>>
>> But token-delimited strings are indeed more complex to deal with.  (How 
>> many people do we have putting them to practical use at the moment, for 
>> that matter?)
> 
> Well, you can write a regexp to handle a simple C string.  That is, if
> your regexp is matched against the whole file, which is usually not the
> case.  Otherwise you'll have troubles with C string:
> 
> "foo\
> bar"
> 
> or D string:
> 
> "foo
> bar"
> 
> Then you want to highlight string escapes and probably format
> specifiers.  Therefore you need not simple regexps but hierarchies of
> them, and also you need to know where *internals* of the string start
> and end.
> 
> Then you have r"foo" which probably can be handled with regexps.
> 
> Then you have q"/foo/" where "/" can be anything.  Still can be handled
> by extended regexps, even though they won't be regular expressions in
> scientific sense.
> 
> Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
> Regexps cannot translate while substituting, so you must create regexps
> for all possible parens.

Remember that the whole point of q{} strings was that they should NOT be 
highlighted as strings!