Notepad++

Sergey Gromov snake.scaly at gmail.com
Sat Aug 15 18:39:05 PDT 2009


Sat, 15 Aug 2009 01:36:26 +0100, Stewart Gordon wrote:

> Sergey Gromov wrote:
>> 
>> "foo
>> bar"
> 
> So there is a problem if the highlighter works by matching regexps on a 
> line-by-line basis.  But matching regexps over a whole file is no harder 
> in principle than matching line-by-line and, when the maximal munch 
> principle is never called to action, it can't be much less efficient. 
> (The only bit of C or D strings that relies on maximal munch is octal 
> escapes.)

Highlighting the whole file every time a charater is typed is slow.
Scintilla doesn't do that.  It provides the lexer with a range of
changed lines.  The lexer is then free to choose a larger range if it
cannot deduce context from the initial range.  I tried to ignore this
range and re-highlight the whole file in my lexer.  The performance was
unacceptable.

>> Then you want to highlight string escapes and probably format
>> specifiers.  Therefore you need not simple regexps but hierarchies of
>> them, and also you need to know where *internals* of the string start
>> and end.
> 
> Let's just concentrate for the moment on the simple process of finding 
> the beginning and end of a string.  Here's a snippet of a TextPad syntax 
> file:
> 
> StringsSpanLines = Yes
> StringStart = "
> StringEnd = "
> StringEsc = \
> 
> A possible snippet of lexer code to handle this (which FAIK might be 
> [...]

Sure, TextPad uses a dozen of simple hacks specific to lexing
programming languages.  They're ad-hoc and they're limited to exactly
what TextPad authors thought were important.

Regexps is a different approach.  They are more generic but are limited,
too, because they're slow and don't nest naturally.  Slow means they
must try to re-color as little lines as possible.  Not nestable means
you need to invent some framework around regexps which is another sort
of description language.  If you implement the former naively and ignore
the latter you'll get what presumably N++ has: not a very powerful
system.

It's actually trivial* to implement a lexer for Scintilla which would
work exactly as TextPad does, including use of the same configuration
files.

* That is, if you know exactly how TextPad works.

>> And these are only strings.  Try to write regexp which treats .__15 as
>> number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as
>> number(2), operator(..), number(3).
> <snip>
> 
> We'd need many regexps to handle all possible cases, but a possible set 
> to cover these cases and a few others (listed in a possible order of 
> priority) is:
> 
> \._*[0-9][0-9_]*
> ([1-9][0-9]*)(\.\.)
> [0-9]+\.[0-9]*
> [1-9][0-9]*
> \.\.
> \.
> [a-zA-Z_][a-zA-Z0-9_]*

Basically yes, but they're going to be much more complex.  3Lu...5 is
also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
still, regexps don't nest.  Don't you want to highlight DDoc sections
and macros?



More information about the Digitalmars-d mailing list