Notepad++

Mon Aug 17 13:23:56 PDT 2009

Sergey Gromov wrote:
> Sat, 15 Aug 2009 01:36:26 +0100, Stewart Gordon wrote:
> 
>> Sergey Gromov wrote:
>>> "foo
>>> bar"
>> So there is a problem if the highlighter works by matching regexps on a 
>> line-by-line basis.  But matching regexps over a whole file is no harder 
>> in principle than matching line-by-line and, when the maximal munch 
>> principle is never called to action, it can't be much less efficient. 
>> (The only bit of C or D strings that relies on maximal munch is octal 
>> escapes.)
> 
> Highlighting the whole file every time a charater is typed is slow.
> Scintilla doesn't do that.  It provides the lexer with a range of
> changed lines.  The lexer is then free to choose a larger range if it
> cannot deduce context from the initial range.  I tried to ignore this
> range and re-highlight the whole file in my lexer.  The performance was
> unacceptable.

Of course.  I suppose now that the right strategy is line-by-line with 
some preservation of state between lines:

- Keep a note of the state at the beginning of each line
- When something is changed, re-highlight those lines that have changed
- Carry on re-highlighting until the state is back in sync with what was 
there before.  If this means going way beyond the visible area of the 
file, record the state of the next however many lines as unknown (so 
that it will have another go when/if those lines are later scrolled into 
view).
- If a range of lines that has just come into view begins in unknown 
state, it's up to the particular lexer module to start from the first 
visible line or backtrack as far as it likes to get some context.

Is this anything like how Scintilla works?

<snip>
> It's actually trivial* to implement a lexer for Scintilla which would
> work exactly as TextPad does, including use of the same configuration
> files.
> 
> * That is, if you know exactly how TextPad works.

It would also be straightforward to improve TextPad's scheme to support 
an arbitrary number of string/comment types.  How about this as an 
all-in-one replacement for TP's comment and string syntax directives?

[DelimitedToken1]
Start = /**
End = */
Type = DocComment
SpanLines = Yes
Nest = No

[DelimitedToken2]
Start = /*!
End = */
Type = DocComment
SpanLines = Yes
Nest = No

[DelimitedToken3]
Start = /*
End = */
Type = Comment
SpanLines = Yes
Nest = No

[DelimitedToken4]
Start = /+
End = +/
Type = Comment
SpanLines = Yes
Nest = Yes

[DelimitedToken5]
Start = //
Type = Comment
SpanLines = No
Nest = No

[DelimitedToken6]
Start = r"
End = "
Type = String
SpanLines = Yes
Nest = No

[DelimitedToken7]
Start = `
End = `
Type = String
SpanLines = Yes
Nest = No

[DelimitedToken8]
Start = "
End = "
Esc = \
Type = String
SpanLines = Yes
Nest = No

[DelimitedToken9]
Start = '
End = '
Esc = \
Type = Char
SpanLines = No
Nest = No

There, we have all of D1 covered now, and not a regexp in sight.

<snip>
> Basically yes, but they're going to be much more complex.  3Lu...5 is
> also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
> still, regexps don't nest.  Don't you want to highlight DDoc sections
> and macros?

That would be nice as well, as would being able to do things with 
Doxygen comments.  But let's not try to run before we can walk.

Stewart.