Proposal for SentinelInputRange
Jonathan M Davis
jmdavisProg at gmx.com
Thu Feb 28 10:52:39 PST 2013
On Thursday, February 28, 2013 08:52:45 Walter Bright wrote:
> I've given you two examples (lexer and regexp) where you are certainly not
> stuck with that, and those two cases matter.
>
> > Pure input ranges fail utterly as you can't save them, so you get _zero_
> > lookahead [...]
My point is that in almost all cases, what'll end up happening with a sentinel
range which is something like this:
struct SRange(R)
if(is(Unqual!(ElementType!R) == char))
{
enum char sentinel = 0;
@property char front() { return _front; }
@property bool empty() { return _front == sentinel; }
void popFront()
{
_range.popFront();
if(_range.empty)
_front = sentinel;
}
this(R range)
{
if(_range.empty)
_front = sentinel;
else
_front = _rang.front;
}
private:
char _front;
R _range;
}
Notice that it has to check for empty every time that the front is popped, and
it can't avoid that, because it's wrapping another range which is not a
sentinel range. The only time that it can avoid that is if it's managing its
own memory internally via an array or pointer or whatnot. And if it's just
going to be a string or an array, I'd argue for simply special casing strings
or arrays to skip unnecessary empty checks.
Also, with sentinel ranges, you'll be forced to wrap strings because of that
sentinel marker that's required for isSentinelRange, meaning that you'll get
something like
struct SRange
{
enum char sentinel = 0;
@property char front() { return _front; }
@property bool empty() { return _front == sentinel; }
void popFront() { _str = _str[1 .. $]; }
this(string str)
{
_str = str;
if(_str.empty)
_str = [sentinel];
else if(_str[$ - 1] != sentinel)
_str ~= sentinel;
}
private:
string _str;
}
And while the compiler will hopefully optimize away all of the extra overhead
of the wrapper type, any functions which would be able to optimize what
they're doing for a string or array will not special case for this wrapper
type, because they probably won't even know about it. A prime case where that
might be an issue would be with std.utf.decode or std.utf.decodeFront which do
some special casing for strings and which any unicode-aware lexer would have
to use at least some of the time (even if the vast majority of the time, it
can just check ASCII stuff).
So, for the most part, the only time that you'll get any performance boost out
of this is with strings or arrays, and you'll take a performance hit if you're
using functions which special case strings or arrays but not the wrapper
sentinel type. And yet special-casing strings and arrays will give you exactly
the same performance boost without this sentinel range concept.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list