Proposal: takeFront and takeBack
Jonathan M Davis
jmdavisProg at gmx.com
Wed Jul 4 12:19:15 PDT 2012
On Wednesday, July 04, 2012 11:17:53 Andrei Alexandrescu wrote:
> On 7/4/12 8:33 AM, deadalnix wrote:
> > Le 04/07/2012 13:04, Roman D. Boiko a écrit :
> >> On Wednesday, 4 July 2012 at 10:54:41 UTC, deadalnix wrote:
> >>> I do agree with that. However, the current proposal have the same
> >>> drawback but on the code that use the range.
> >>>
> >>> Both solutions are messy IMO.
> >>
> >> What is error-prone in current client code?
> >>
> >> If particular algorithm wants to take advantage of a potential speed-up,
> >> it may decide to check whether hasConsume!Range, and call consumeFront
> >> instead of front + popFront. Nobody is obliged to do so.
> >>
> >> The only problem I can see is potential code duplication, which is lower
> >> priority in performance-critical part of code, IMO.
> >
> > You have to maintain 2 version of you algorithm. This is more work to
> > test, to maintain, and it is likely to introduce more bugs.
> >
> > More code == more bugs. More NPATH = more bugs and harder to test and to
> > maintain.
> >
> > In addition, this will result in practice in less generic code, because
> > one may not need the version of the algorithm without consume primitives
> > in his/her own code and not code it at all.
>
> I agree. Can't get behind this proposal.
>
> First, it solves the problem of double decoding for UTF-8 and UTF-16
> strings. Before that, we need to estimate how bad the problem is and how
> we can optimize front/popFront() to alleviate it. For all I can tell, in
> ASCII-heavy strings we're looking at optimizing away ONE redundant
> comparison in a sequence that involves a bounds check, two word
> assignments, and another comparison. Off the top of my head I don't
> think the gains are going to be impressive.
>
> Looking at the current implementation in std.array:
>
> void popFront(A)(ref A a)
> if (isNarrowString!A && isMutable!A && !isStaticArray!A)
> {
> assert(a.length, "Attempting to popFront() past the end of an array
> of "
> ~ typeof(a[0]).stringof);
> a = a[std.utf.stride(a, 0) .. $];
> }
>
> So here we have a bounds check, a call to stride, and two word
> assignments. The implementation of stride itself is:
>
> uint stride(S)(in S str, size_t index) @safe pure
> if (is(S : const(char[])))
> {
> immutable c = str[index];
> if (c < 0x80)
> return 1;
> else
> return strideImpl(c, index);
> }
>
> Here, even if we assume stride is inlined, we have one redundant bounds
> check that we can avoid, and redundant operations with "index" which is
> always 0.
>
> In theory all of these redundant operations can be done away with
> optimization techniques, but probably some aren't. So we should look
> first at optimizing this flow heavily before looking at an API addition
> that has disadvantages in other dimensions.
The problem is that there are inherent costs in calling both front and
popFront which can be avoided if they're done as a single operation. front
must call decode (or code that does its equivalent), and popFront must call
stride (or code that does its equivalent). There are duplicate calculations
there that cannot be avoided as long as they're separate operations.
Now, if want to argue that if we can get that cost down low enough adding
consumeFront is not worth the extra burden on the range API and the developers
using it, then okay. That's a valid argument, and looking to further optimize
front and popFront is beneficial regardless.
But at present, I'm seeing a performance improvement of approximately 70 - 80%
in iterating over strings with consumeFront rather than front and popFront
(depending on the compiler flags and strings used). And odds are, if you're
optimizing front and popFront, then you'll likely get the same optimizations
in consumeFront. Depending on what you're doing while iterating over the
string, those gains will be minimized by the cost of the code being looped
over (just like the performance gains in optimizing an iterator can be buried
in the cost of the associated loop body), but there's a definite gain there in
using consumeFront.
There's also the consideration of other ranges which could benefit from this
proposal. Optimizations to popFront for strings, stride, decode, etc. won't
help them. Now, they may be able to do their own, similar optimizations, but
they could also benefit from consumeFront. However, the big question there is
how many such ranges exist. In Phobos at least, I believe that most of the
ranges which have to do calculations for front end up doing those calculations
in popFront and don't need to redo them in front, because they save front -
which is something that strings can't do (it's one place where a wrapper
struct for strings like Steven keeps bringing up could actually be a
performance gain). So, this may ultimately only be of benefit for strings, and
strings are what I'm personally most worried about anyway (strings and ranges
wrapped around strings). So, I think that the main consideration here is how
this affects the speed of operating on strings (and more importantly, ranges
which wrap strings, since to some extent, code which operates directly on
strings can already specialize on them and do what consumeFront does).
So, if we want us to see how much we can optimize front and popFront for
strings before we look at adding consumeFront, that's fine. But there _will_ be
additional cost to calling both front and popFront. That cannot be avoided as
its intrinsic to what front and popFront tasked with doing. It's just a
question of how small we can make it.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list