Proposal: takeFront and takeBack
Jonathan M Davis
jmdavisProg at gmx.com
Tue Jul 3 09:37:04 PDT 2012
This seems like it probably merits a bit of discussion, so I'm bringing it up
here rather than simply opening a pull request.
At present, for some ranges (variably-lengthed ranges such as strings in
particular), calling front incurs a cost which popFront at least partially
duplicates. So, the range primitives are inherently inefficient in that they
force you to incur that extra cost as you iterate over the range. Ideally,
there would be a way to get front and pop it off at the same time so that you
incur the cost only once (either that or have front cache its result in a way
that lets it avoid the extra cost in popFront when popFront is called - though
that wouldn't work with strings, since for them, the range primitives are free
functions). So, I'm proposing takeFront and takeBack:
https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b
For most ranges, takeFront does this:
auto takeFront(R)(ref R range)
if(isInputRange!R && !isNarrowString!R)
{
assert(!range.empty);
auto retval = range.front;
range.popFront();
return retval;
}
So, it's pretty much the same cost as using front and popFront separately
(whether it costs more or less probably depends on the exact code and
optimizations, but it should be comparable). But for strings, it looks like
this
auto takeFront(R)(ref R range)
if(isNarrowString!R)
{
import std.utf;
assert(!range.empty);
size_t index = 0;
auto retval = decode(range, index);
range = range[index .. $];
return retval;
}
So, for strings, it'll be more efficient to use takeFront than calling front and
popFront separately. The idea then is that any user-defined range which can
implement takeFront more efficiently than the default will define it. Then range-
based functions use takeFront - e.g. range.takeFront() - and if the user-
defined range implements a more efficient version, that one is used and they gain
the extra efficiency, or if they don't, then the free function is used with
UFCS, and they incur the same cost that they'd incur calling front and
popFront separately. So, it's invisible to range-based functions whether a
range actually implements takeFront itself. takeBack does the same thing as
takeFront but it does it with back and popBack for bidirectional ranges.
I _think_ that this is a fairly clean solution to the problem, but someone
else might be able to point out why this is a bad idea, or they might have a
better idea. And this will have a definite impact on how ranges are normally
used if we add this, so I'm bringing it up here for discussion. Opinions?
Thoughts? Insights?
Oh, and if we go with this, ideally, the compiler would be updated to use
takeFront for foreach instead of front and popFront if a range implements it
(but still do it the current way if it doesn't). So, if typeof(range)
implements takeFront,
foreach(e; range) {...}
would then become
for(auto _range = range; !_range.empty;)
{
auto e = _range.takeFront();
...
}
instead of
for(auto _range = range; !_range.empty; _range.popFront())
{
auto e = _range.front();
...
}
but that's an optimization which could be added later.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list