Dscanner - It exists

Thu Aug 2 00:43:15 PDT 2012

On Thursday, August 02, 2012 08:51:26 Jacob Carlborg wrote:
> On 2012-08-02 08:26, Jonathan M Davis wrote:
> > It's really not all that hard to special case for strings, especially when
> > you're operating primarily on code units. And I think that the lexer
> > should be flexible enough to be usable with ranges other than strings.
> > We're trying to make most stuff in Phobos range-based, not string-based
> > or array-based.
> Ok. I just don't think it's worth giving up some performance or make the
> design overly complicated just to make a range interface. But if ranges
> doesn't cause these problems I'm happy.

A range-based function operating on strings without special-casing them often 
_will_ harm performance. But if you special-case them for strings, then you 
can avoid that performance penalty - especially if you can avoid having to 
decode any characters.

The result is that using range-based functions on strings is generally correct 
without the function writer (or the caller) having to worry about encodings 
and the like, but if they want to eke out all of the performance that they 
can, they need to go to the extra effort of special-casing the function for 
strings. Like much of D, it favors correctness/saftey but allows you to get 
full performance if you work at it a bit harder.

In the case of the lexer, it's really not all that bad - especially since 
string mixins allow me to give the operation that I need (e.g. get the first 
code unit) in the correct way for that particular range type without worrying 
about the details.

For instance, I have this function which I use to generate a mixin any time 
that I want to get the first code unit:

string declareFirst(R)()
    if(isForwardRange!R && is(Unqual!(ElementType!R) == dchar))
{
    static if(isNarrowString!R)
        return "Unqual!(ElementEncodingType!R) first = range[0];";
    else
        return "dchar first = range.front;";
}

So, every line using it becomes

mixin(declareFirst!R());

which really isn't any worse than

char c = str[0];

except that it works with more than just strings. Yes, it's more effort to get 
the lexer working with all ranges of dchar, but I don't think that it's all 
that much worse, it the result is much more flexible.

- Jonathan M Davis