Proposal: takeFront and takeBack

Jonathan M Davis jmdavisProg at gmx.com
Wed Jul 4 16:06:45 PDT 2012


Another option would be to create a wrapper range for strings to be used when 
they already have to be wrapped in another range. Functions which want to make 
string processing more efficient can already specialize on strings (and often 
do), so they can deal with the consumeFront issue directly if they need to 
(though their code might be a bit more concise with it in some cases). The 
problem is when a string is wrapped in another range (e.g the result of map or 
filter), you can't reasonably specialize on them anymore, and you're forced to 
deal with the inefficiency of using both front and popFront.

So, if we created a wrapper range whose front caches the index of the next 
code unit for a subsequent call to popFront, and functions such as filter and 
map wrapped strings in that before wrapping them in their own range types, 
then the wrapped calls to popFront will be more efficient (at least in the case 
when front was called - the call to popFront would be slightly less efficient in 
the cases where front wasn't used). So, we could have something like this:

struct StringCache(C)
    if(is(Unqual!C == char) || is(Unqual!C == wchar))
{
public:

    this(C[] str)
    {
        _str = str;
    }

    @property bool empty() { return _str.empty; }

    @property dchar front()
    {
        import std.utf;
        _nextIndex = 0;
        return decode(_str, _nextIndex);
    }

    void popFront()
    {
        if(_nextIndex)
        {
            _str = _str[_nextIndex .. $];
            _nextIndex = 0;
        }
        else
            _str.popFront();
    }

private:

    C[] _str;
    size_t _nextIndex = 0;
}

auto stringCache(C)(C[] str)
    if(isSomeChar!C)
{
    static if(is(Unqual!C == dchar))
        return str;
    else
        return StringCache!C(str);
}

Obviously, it would need the range functions for forward range and 
bidirectional range added, but that gives the basic idea. And of course a name 
other than StringCache could be used (it's not great, but it's the best that I 
could come up with in the few minutes that I thought about it and the name 
isn't really the main issue here - the design is).

And the performance in comparison to consumeFront is interesting. Without any 
optimizations turned on (not even -release), consumeFront takes about 72 - 79% 
as long as front and popFront separately, whereas StringCache takes about 107 
- 120% as long as front and popFront (so it's actually slower without any 
optimizations, unlike consumeFront). But with all optimizations on (-release -
O -inline), consumeFront takes about 66 - 80% as long, and StringCache takes 
69 - 75% as long. So, with optimizations, StringCache is roughly the same as 
consumeFront. The times are rough and the various runs of the same test vary a 
bit, so clearly stuff like the CPU cache and context switching are having an 
impact, making exact comparisons difficult, but I think that StringCache is fast 
enough to be comparable to consumeFront in speed, and it's a less invasive 
change to start using StringCache than it would be to use hasConsume and 
consumeFront. And actually, if stringCache were changed to operate on _any_ 
range and then just wrap arrays of char[] and wchar[], then it would be even 
_less_ invasive, because it wouldn't even require special casing on the part 
of the caller. For instance, filter's return statement could go from

return FilteredRange(rs);

to

return FilteredRange(stringCache(rs));

and suddenly you get the optimization for anything using filter (though such 
usage definitely seems to indicate that a better name than stringCache should 
be found).

Clearly we need to look at improving the performance of front and popFront as 
much as possible (and the StringCache implementation would benefit from those 
as well), but given the disgreements over consumeFront, it would seem that 
StringCache would be a better solution (certainly, it would directly impact 
less code), and unless front and popFront can be optimized to the point that 
StringCache adds no real measurable benefit (which I doubt will happen), I 
think that it would be worthwhile to add StringCache and have functions like 
filter and map start using it.

I guess that the first thing that we need to do though is look at what can be 
done to improve the performance of front and popFront - the first thing 
probably being optimizations relating to the fact that they're always 
operating from index 0.

Still, what do you folks think of this StringCache idea? Is it more acceptable 
than consumeFront, or do some of you still think that it's not worth having?

- Jonathan M Davis


More information about the Digitalmars-d mailing list