Notepad++
Don
nospam at nospam.com
Mon Aug 17 01:37:47 PDT 2009
Sergey Gromov wrote:
> Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:
>
>> Sergey Gromov wrote:
>>> Well I think it's hard to create a regular expression engine flexible
>>> enough to allow arbitrary highlighting.
>> I can't see how it can be at all complicated to find the beginning and
>> end of a C string or character literal.
>>
>> This (Posix?) regexp
>>
>> "(\\.|[^\\"])*"
>>
>> works as I try (though not in the tiny subset of Posix regexps that N++
>> understands). But that's an aside - you don't need regexps at all to
>> get it working at this basic level, only a rudimentary concept of escape
>> sequences.
>>
>>> I think the best such engine
>>> I've seen was Colorer by Igor Russkih, and even there I wasn't able to
>>> express D's WYSIWYG or delimited strings. You need a real programming
>>> language for that.
>> For WYSIWYG strings, all that's needed is a generic highlighter that
>> supports:
>> - the aforementioned string escapes
>> - multiple types of string literals distinguished by whether they
>> support string escapes, and not just delimiters
>>
>> TextPad's syntax highlighting engine manages 2/3 of this without any
>> regexps (or anything to that effect). That said, I've just found that
>> it can do a little bit of what remains: I can make it do `...` but not
>> r"..." at the expense of distinguishing string and character literals.
>>
>> But token-delimited strings are indeed more complex to deal with. (How
>> many people do we have putting them to practical use at the moment, for
>> that matter?)
>
> Well, you can write a regexp to handle a simple C string. That is, if
> your regexp is matched against the whole file, which is usually not the
> case. Otherwise you'll have troubles with C string:
>
> "foo\
> bar"
>
> or D string:
>
> "foo
> bar"
>
> Then you want to highlight string escapes and probably format
> specifiers. Therefore you need not simple regexps but hierarchies of
> them, and also you need to know where *internals* of the string start
> and end.
>
> Then you have r"foo" which probably can be handled with regexps.
>
> Then you have q"/foo/" where "/" can be anything. Still can be handled
> by extended regexps, even though they won't be regular expressions in
> scientific sense.
>
> Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
> Regexps cannot translate while substituting, so you must create regexps
> for all possible parens.
Remember that the whole point of q{} strings was that they should NOT be
highlighted as strings!
More information about the Digitalmars-d
mailing list