Proposal for SentinelInputRange

Jonathan M Davis jmdavisProg at gmx.com
Wed Feb 27 23:55:11 PST 2013


On Wednesday, February 27, 2013 23:33:09 Walter Bright wrote:
> On 2/27/2013 9:28 PM, Jonathan M Davis wrote:
> > But you have to deal with D strings, not C strings if you're dealing with
> > ranges. char* isn't a range. So, unless you're talking about wrapping a
> > char* in a range, char* isn't going to work. And simply appending 0 to
> > the end of a D string isn't enough, because isSentinelnputRange would
> > fail, because std.array.empty doesn't match it. So, you need a wrapper
> > even if it's only to pass the template constraint. That being the case,
> > regardless of whether you're dealing with char* or string, you need a
> > wrapper.
> 
> Again, please see how lexer.c works. I assure you, there is no double
> copying going on, nor is there a double test for the terminating 0.

I know what the lexer does, and remember that it _doesn't_ operate on ranges, 
and there are subtle differences between being able to just use char* and 
trying to handle generic ranges.

And no, the lexer doesn't have a double test. The place you're going to be 
stuck with a double test is most any range which isn't a string, because such 
ranges won't have sentinel values, and there will be no way to add them (as 
you really can't append to ranges), and so they'll end up being wrapped in a 
SentinelRange which will have to check on each popFront whether the wrapped 
range is now empty making it so that front needs to be the sentinel value. And 
most any range which _was_ designed to have a sentinel value would have to be 
managing its own contents (because otherwise, it would just be back to 
wrapping a range and having to check empty), which likely means that it'll 
just be a thin wrapper around a string or array anyway.

Strings will still need to be wrapped, because they won't pass isSentinelRange 
otherwise, but they won't get any extra checks, because the wrapper can just 
check for 0 on the end and append it if it's not there.

> >So, why not just special case strings or arrays in the few situations
> >
> > where something like this is needed, especially when it would be so easy
> > to
> > do?
> 
> Sentinels structure the code differently.

Given how a lexer works (and I have been working on a lexer off and on 
recently), the only real difference is that you'd just use a couple of static 
ifs like

static if(!isSomeString!R)
{
    if(range.empty)
        break; //or whatever you do at the end
}

static if(isSomeString!R)
{
    case 0:
        break; //or whatever you do at the end
}

So, in the case of a lexer, I don't see sentinel ranges as buying us much. You 
end up having to wrap most any range that you pass to the lexer or whatever 
(including strings so that they'll pass isSentinelRange), you lose out on any 
optimizations of any functions that you call which special-case strings 
(though there probably wouldn't be many of those in a lexer), and all you 
avoid is a couple of static ifs.

The idea of sentinels certainly isn't useless, but anything caring about that 
sort of speed is likely to just use strings or arrays, and those can trivially 
be special cased to avoid unnecessary empty checks and to add the check for 
the sentinel, making the whole sentinel range idea an unnecessary complication 
IMHO.

- Jonathan M Davis


More information about the Digitalmars-d mailing list