Proposal for SentinelInputRange

Artur Skawina art.08.09 at gmail.com
Thu Feb 28 06:08:09 PST 2013


On 02/28/13 02:11, Walter Bright wrote:
> A SentinelInputRange is an InputRange with the following additions:
> 
> 1. a compile time property 'sentinel' that is the terminating value of the range
> 2. empty is defined as: empty = (front == sentinel)
> 3. it is not necessary for empty to be called before front
> 
> A C style 0-terminated string is an example of a SentinelInputRange.
> 
> The additions to std.range would be:
> 
> 1. isSentinelInputRange(T) which returns true if T is a SentinelInputRange
> 2. a unittest
> 3. documentation of this
> 
> An addition to std.string would be a function that takes a char* and returns a SentinelInputRange.
> 
> Motivation:
> 1. easy conversion of C strings to ranges
> 2. necessary for a fast implementation of a lexer

LookAheadRange, that not only defines 'sentinel', but also 'lookahead' which
defines how far one can safely look ahead. When 'lookahead==0' it becomes
equivalent to your SentinelInputRange, where 'front' and '[0]' is always
valid and can contain the sentinel. When 'lookahead==N', accessing '[N]' is
always valid.

Having said that, I've used this approach in a D lexer, and it does not really
matter in practice - avoiding the length (or '\0' sentinel) check makes a
<~1ms difference when lexing "datetime.d" sized objects (1.5Mbytes+, 460k+ tokens).
Which is practically irrelevant both in an IDE context and a compiler context
- other processing will be be orders of magnitude more expensive. An IDE doesn't
need to re-lex the whole file after every key press and 1ms won't make any
difference for a compiler run.

"Necessary for a fast implementation of a lexer" is an exaggeration, but yes,
it is an obvious optimization that has a measurable impact. Before implementing
it I was expecting a larger improvement too; and was somewhat surprised when the
resulting gain was in the noise (~1ms +/- ~3ms). /Relatively/ it is significant
(10% perf gain is never insignificant), but the /absolute/ perf difference isn't
that large.
Which of course isn't an argument against (SentinelInputLookAhead)Range; a std
API like that would be a good thing.

The advantage of SentinelInputRange/LookAheadRange is that it, unlike an array,
can be lazy. Which starts to matter when you're dealing with /token/ streams.

artur


More information about the Digitalmars-d mailing list