Proposal for SentinelInputRange

Jonathan M Davis jmdavisProg at gmx.com
Thu Feb 28 10:52:39 PST 2013


On Thursday, February 28, 2013 08:52:45 Walter Bright wrote:
> I've given you two examples (lexer and regexp) where you are certainly not
> stuck with that, and those two cases matter.
> 
> > Pure input ranges fail utterly as you can't save them, so you get _zero_
> > lookahead [...]

My point is that in almost all cases, what'll end up happening with a sentinel 
range which is something like this:

struct SRange(R)
    if(is(Unqual!(ElementType!R) == char))
{
    enum char sentinel = 0;

    @property char front() { return _front; }
    @property bool empty() { return _front == sentinel; }
    void popFront()
    {
        _range.popFront();
        if(_range.empty)
            _front = sentinel;
    }

    this(R range)
    {
        if(_range.empty)
            _front = sentinel;
        else
            _front = _rang.front;
    }

private:
    char _front;
    R _range;
}

Notice that it has to check for empty every time that the front is popped, and 
it can't avoid that, because it's wrapping another range which is not a 
sentinel range. The only time that it can avoid that is if it's managing its 
own memory internally via an array or pointer or whatnot. And if it's just 
going to be a string or an array, I'd argue for simply special casing strings 
or arrays to skip unnecessary empty checks.

Also, with sentinel ranges, you'll be forced to wrap strings because of that 
sentinel marker that's required for isSentinelRange, meaning that you'll get 
something like

struct SRange
{
    enum char sentinel = 0;

    @property char front() { return _front; }
    @property bool empty() { return _front == sentinel; }
    void popFront() { _str = _str[1 .. $]; }

    this(string str)
    {
        _str = str;

        if(_str.empty)
            _str = [sentinel];
        else if(_str[$ - 1] != sentinel)
            _str ~= sentinel;
    }

private:
    string _str;
}

And while the compiler will hopefully optimize away all of the extra overhead 
of the wrapper type, any functions which would be able to optimize what 
they're doing for a string or array will not special case for this wrapper 
type, because they probably won't even know about it. A prime case where that 
might be an issue would be with std.utf.decode or std.utf.decodeFront which do 
some special casing for strings and which any unicode-aware lexer would have 
to use at least some of the time (even if the vast majority of the time, it 
can just check ASCII stuff).

So, for the most part, the only time that you'll get any performance boost out 
of this is with strings or arrays, and you'll take a performance hit if you're 
using functions which special case strings or arrays but not the wrapper 
sentinel type. And yet special-casing strings and arrays will give you exactly 
the same performance boost without this sentinel range concept.

- Jonathan M Davis


More information about the Digitalmars-d mailing list