Proposal for SentinelInputRange

Jonathan M Davis jmdavisProg at gmx.com
Wed Feb 27 20:47:29 PST 2013


On Thursday, February 28, 2013 05:24:57 deadalnix wrote:
> Can you explain what it does buy us ? Is it faster to check for
> the sentinel value that to check for emptiness ?
> 
> Is it really handy in generic code ? For instance, if I use a
> switch, the actual sentinel value may be conflicting with other
> element in the switch. How does this kind of problem are solved ?
> Do they are problems in the first place ?

In the case of a lexer, it allows you to skip checking for empty most of the 
time. You have something like

switch(front)
{
    case '+': ...
    case '>': ...
     ...
    case 0: //do whatever you do at the end
}

So, as far as the switch statement goes, you don't have to worry about whether 
the range is empty or not. The case for 0 takes care of it and will only get 
checked if it actually happens, whereas without the sentinel value, you have 
to check empty every time you pop front.

Now, the only real benefit that I see for this allowing you to make a string 
zero-terminated (which in the case of a lexer would probably mean copying the 
entire file into a new string which has zero on the end). In general, you'll be 
forced to wrap a range in a sentinel range to get this behavior, which means 
that you're _still_ checking empty all the time, because it has to keep 
checking whether it's supposed to make front 0 now. And that probably means 
that it'll be slightly _more_ expensive to do this for anything other than a 
string. That being the case, it might be better to just special case strings 
rather than come up with this whole new range idea.

I'm also not at all covinced that this is generally useful. It may be that 
it's a great idea for lexers, but what other use cases are there? I don't know 
that there isn't one, but I can't think of any. And the _only_ time that it's 
of any real benefit is definitely going to be when ranges are built with this 
idea in the first place rather than wrapping them, which almost guarantees that 
you're just dealing with arrays, which you can always just special case if you 
need this level of efficiency.

Also, I'd point out that even for strings, doing something like this means 
wrapping them, because their empty isn't defined in a manner which works with 
isSentinelRange. Unlike most other cases, the wrapper wouldn't have to keep 
checking for empty (it could just guarantee that the string it was given ended 
with 0 and then say that it's length was one less than the underlying string), 
but you're still dealing with a wrapper and relying on the compiler to 
optimize that cost away. Also, that totally screws with passing the range to 
any functions which special case strings. Those functions would now also have 
to special case the wrapper type if you wanted the extra efficiency that you get 
from the function understanding that it's a string rather than a range of code 
units or code points (depending on which the sentinel range is claiming it is 
- probably code units in the case of the lexer).

So, I'm inclined to believe that we'd be better off just special casing strings 
in any algorithms that can take advantage of this sort of thing than we would 
be creating this sentinel range idea.

- Jonathan M Davis


More information about the Digitalmars-d mailing list