Proposal: takeFront and takeBack

Tue Jul 3 09:37:04 PDT 2012

This seems like it probably merits a bit of discussion, so I'm bringing it up 
here rather than simply opening a pull request.

At present, for some ranges (variably-lengthed ranges such as strings in 
particular), calling front incurs a cost which popFront at least partially 
duplicates. So, the range primitives are inherently inefficient in that they 
force you to incur that extra cost as you iterate over the range. Ideally, 
there would be a way to get front and pop it off at the same time so that you 
incur the cost only once (either that or have front cache its result in a way 
that lets it avoid the extra cost in popFront when popFront is called - though 
that wouldn't work with strings, since for them, the range primitives are free 
functions). So, I'm proposing takeFront and takeBack:

https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b

For most ranges, takeFront does this:

auto takeFront(R)(ref R range)
    if(isInputRange!R && !isNarrowString!R)
{
    assert(!range.empty);
    auto retval = range.front;
    range.popFront();
    return retval;
}

So, it's pretty much the same cost as using front and popFront separately 
(whether it costs more or less probably depends on the exact code and 
optimizations, but it should be comparable). But for strings, it looks like 
this

auto takeFront(R)(ref R range)
    if(isNarrowString!R)
{
    import std.utf;
    assert(!range.empty);
    size_t index = 0;
    auto retval = decode(range, index);
    range = range[index .. $];
    return retval;
}

So, for strings, it'll be more efficient to use takeFront than calling front and 
popFront separately. The idea then is that any user-defined range which can 
implement takeFront more efficiently than the default will define it. Then range-
based functions use takeFront - e.g. range.takeFront() - and if the user-
defined range implements a more efficient version, that one is used and they gain 
the extra efficiency, or if they don't, then the free function is used with 
UFCS, and they incur the same cost that they'd incur calling front and 
popFront separately. So, it's invisible to range-based functions whether a 
range actually implements takeFront itself. takeBack does the same thing as 
takeFront but it does it with back and popBack for bidirectional ranges.

I _think_ that this is a fairly clean solution to the problem, but someone 
else might be able to point out why this is a bad idea, or they might have a 
better idea. And this will have a definite impact on how ranges are normally 
used if we add this, so I'm bringing it up here for discussion. Opinions? 
Thoughts? Insights?

Oh, and if we go with this, ideally, the compiler would be updated to use 
takeFront for foreach instead of front and popFront if a range implements it 
(but still do it the current way if it doesn't). So, if typeof(range) 
implements takeFront,

foreach(e; range) {...}

would then become

for(auto _range = range; !_range.empty;)
{
    auto e = _range.takeFront();
    ...
}

instead of

for(auto _range = range; !_range.empty; _range.popFront())
{
    auto e = _range.front();
    ...
}

but that's an optimization which could be added later.

- Jonathan M Davis