Proposal for SentinelInputRange

Walter Bright newshound2 at digitalmars.com
Thu Feb 28 02:54:24 PST 2013


On 2/27/2013 11:55 PM, Jonathan M Davis wrote:
>> Again, please see how lexer.c works. I assure you, there is no double
>> copying going on, nor is there a double test for the terminating 0.
>
> I know what the lexer does, and remember that it _doesn't_ operate on ranges,
> and there are subtle differences between being able to just use char* and
> trying to handle generic ranges.

Hence the need to invent SentinelInputRange.


> Given how a lexer works (and I have been working on a lexer off and on
> recently), the only real difference is that you'd just use a couple of static
> ifs like
>
> static if(!isSomeString!R)
> {
>      if(range.empty)
>          break; //or whatever you do at the end
> }
>
> static if(isSomeString!R)
> {
>      case 0:
>          break; //or whatever you do at the end
> }

There are so many places where this would occur, it cries out for a new type.


> So, in the case of a lexer, I don't see sentinel ranges as buying us much. You
> end up having to wrap most any range that you pass to the lexer or whatever
> (including strings so that they'll pass isSentinelRange), you lose out on any
> optimizations of any functions that you call which special-case strings
> (though there probably wouldn't be many of those in a lexer), and all you
> avoid is a couple of static ifs.

And NO, THE SOURCE FILE INPUT IS NEITHER WRAPPED NOR DOUBLE COPIED. Here's how 
it's done:

https://github.com/D-Programming-Language/dmd/blob/master/src/root/root.c

line 1012 and 1038

> The idea of sentinels certainly isn't useless, but anything caring about that
> sort of speed is likely to just use strings or arrays, and those can trivially
> be special cased to avoid unnecessary empty checks and to add the check for
> the sentinel, making the whole sentinel range idea an unnecessary complication
> IMHO.

You can't do efficient lookahead without sentinels, either. Lexers are sensitive 
to every instruction executed per character read. No sentinels mean double the 
number of instructions per source character.

InputRanges are an abject failure if "anyone caring about speed" is not going to 
use them. And yes, I care very much about the D lexing speed.




More information about the Digitalmars-d mailing list