Proposal: takeFront and takeBack
Jonathan M Davis
jmdavisProg at gmx.com
Wed Jul 4 16:06:45 PDT 2012
Another option would be to create a wrapper range for strings to be used when
they already have to be wrapped in another range. Functions which want to make
string processing more efficient can already specialize on strings (and often
do), so they can deal with the consumeFront issue directly if they need to
(though their code might be a bit more concise with it in some cases). The
problem is when a string is wrapped in another range (e.g the result of map or
filter), you can't reasonably specialize on them anymore, and you're forced to
deal with the inefficiency of using both front and popFront.
So, if we created a wrapper range whose front caches the index of the next
code unit for a subsequent call to popFront, and functions such as filter and
map wrapped strings in that before wrapping them in their own range types,
then the wrapped calls to popFront will be more efficient (at least in the case
when front was called - the call to popFront would be slightly less efficient in
the cases where front wasn't used). So, we could have something like this:
struct StringCache(C)
if(is(Unqual!C == char) || is(Unqual!C == wchar))
{
public:
this(C[] str)
{
_str = str;
}
@property bool empty() { return _str.empty; }
@property dchar front()
{
import std.utf;
_nextIndex = 0;
return decode(_str, _nextIndex);
}
void popFront()
{
if(_nextIndex)
{
_str = _str[_nextIndex .. $];
_nextIndex = 0;
}
else
_str.popFront();
}
private:
C[] _str;
size_t _nextIndex = 0;
}
auto stringCache(C)(C[] str)
if(isSomeChar!C)
{
static if(is(Unqual!C == dchar))
return str;
else
return StringCache!C(str);
}
Obviously, it would need the range functions for forward range and
bidirectional range added, but that gives the basic idea. And of course a name
other than StringCache could be used (it's not great, but it's the best that I
could come up with in the few minutes that I thought about it and the name
isn't really the main issue here - the design is).
And the performance in comparison to consumeFront is interesting. Without any
optimizations turned on (not even -release), consumeFront takes about 72 - 79%
as long as front and popFront separately, whereas StringCache takes about 107
- 120% as long as front and popFront (so it's actually slower without any
optimizations, unlike consumeFront). But with all optimizations on (-release -
O -inline), consumeFront takes about 66 - 80% as long, and StringCache takes
69 - 75% as long. So, with optimizations, StringCache is roughly the same as
consumeFront. The times are rough and the various runs of the same test vary a
bit, so clearly stuff like the CPU cache and context switching are having an
impact, making exact comparisons difficult, but I think that StringCache is fast
enough to be comparable to consumeFront in speed, and it's a less invasive
change to start using StringCache than it would be to use hasConsume and
consumeFront. And actually, if stringCache were changed to operate on _any_
range and then just wrap arrays of char[] and wchar[], then it would be even
_less_ invasive, because it wouldn't even require special casing on the part
of the caller. For instance, filter's return statement could go from
return FilteredRange(rs);
to
return FilteredRange(stringCache(rs));
and suddenly you get the optimization for anything using filter (though such
usage definitely seems to indicate that a better name than stringCache should
be found).
Clearly we need to look at improving the performance of front and popFront as
much as possible (and the StringCache implementation would benefit from those
as well), but given the disgreements over consumeFront, it would seem that
StringCache would be a better solution (certainly, it would directly impact
less code), and unless front and popFront can be optimized to the point that
StringCache adds no real measurable benefit (which I doubt will happen), I
think that it would be worthwhile to add StringCache and have functions like
filter and map start using it.
I guess that the first thing that we need to do though is look at what can be
done to improve the performance of front and popFront - the first thing
probably being optimizations relating to the fact that they're always
operating from index 0.
Still, what do you folks think of this StringCache idea? Is it more acceptable
than consumeFront, or do some of you still think that it's not worth having?
- Jonathan M Davis
More information about the Digitalmars-d
mailing list