Proposal: takeFront and takeBack

Wed Jul 4 12:19:15 PDT 2012

On Wednesday, July 04, 2012 11:17:53 Andrei Alexandrescu wrote:
> On 7/4/12 8:33 AM, deadalnix wrote:
> > Le 04/07/2012 13:04, Roman D. Boiko a écrit :
> >> On Wednesday, 4 July 2012 at 10:54:41 UTC, deadalnix wrote:
> >>> I do agree with that. However, the current proposal have the same
> >>> drawback but on the code that use the range.
> >>> 
> >>> Both solutions are messy IMO.
> >> 
> >> What is error-prone in current client code?
> >> 
> >> If particular algorithm wants to take advantage of a potential speed-up,
> >> it may decide to check whether hasConsume!Range, and call consumeFront
> >> instead of front + popFront. Nobody is obliged to do so.
> >> 
> >> The only problem I can see is potential code duplication, which is lower
> >> priority in performance-critical part of code, IMO.
> > 
> > You have to maintain 2 version of you algorithm. This is more work to
> > test, to maintain, and it is likely to introduce more bugs.
> > 
> > More code == more bugs. More NPATH = more bugs and harder to test and to
> > maintain.
> > 
> > In addition, this will result in practice in less generic code, because
> > one may not need the version of the algorithm without consume primitives
> > in his/her own code and not code it at all.
> 
> I agree. Can't get behind this proposal.
> 
> First, it solves the problem of double decoding for UTF-8 and UTF-16
> strings. Before that, we need to estimate how bad the problem is and how
> we can optimize front/popFront() to alleviate it. For all I can tell, in
> ASCII-heavy strings we're looking at optimizing away ONE redundant
> comparison in a sequence that involves a bounds check, two word
> assignments, and another comparison. Off the top of my head I don't
> think the gains are going to be impressive.
> 
> Looking at the current implementation in std.array:
> 
> void popFront(A)(ref A a)
> if (isNarrowString!A && isMutable!A && !isStaticArray!A)
> {
>      assert(a.length, "Attempting to popFront() past the end of an array
> of "
>              ~ typeof(a[0]).stringof);
>      a = a[std.utf.stride(a, 0) .. $];
> }
> 
> So here we have a bounds check, a call to stride, and two word
> assignments. The implementation of stride itself is:
> 
> uint stride(S)(in S str, size_t index) @safe pure
>      if (is(S : const(char[])))
> {
>      immutable c = str[index];
>      if (c < 0x80)
>          return 1;
>      else
>          return strideImpl(c, index);
>   }
> 
> Here, even if we assume stride is inlined, we have one redundant bounds
> check that we can avoid, and redundant operations with "index" which is
> always 0.
> 
> In theory all of these redundant operations can be done away with
> optimization techniques, but probably some aren't. So we should look
> first at optimizing this flow heavily before looking at an API addition
> that has disadvantages in other dimensions.

The problem is that there are inherent costs in calling both front and 
popFront which can be avoided if they're done as a single operation. front 
must call decode (or code that does its equivalent), and popFront must call 
stride (or code that does its equivalent). There are duplicate calculations 
there that cannot be avoided as long as they're separate operations.

Now, if want to argue that if we can get that cost down low enough adding 
consumeFront is not worth the extra burden on the range API and the developers 
using it, then okay. That's a valid argument, and looking to further optimize 
front and popFront is beneficial regardless.

But at present, I'm seeing a performance improvement of approximately 70 - 80% 
in iterating over strings with consumeFront rather than front and popFront 
(depending on the compiler flags and strings used). And odds are, if you're 
optimizing front and popFront, then you'll likely get the same optimizations 
in consumeFront. Depending on what you're doing while iterating over the 
string, those gains will be minimized by the cost of the code being looped 
over (just like the performance gains in optimizing an iterator can be buried 
in the cost of the associated loop body), but there's a definite gain there in 
using consumeFront.

There's also the consideration of other ranges which could benefit from this 
proposal. Optimizations to popFront for strings, stride, decode, etc. won't 
help them. Now, they may be able to do their own, similar optimizations, but 
they could also benefit from consumeFront. However, the big question there is 
how many such ranges exist. In Phobos at least, I believe that most of the 
ranges which have to do calculations for front end up doing those calculations 
in popFront and don't need to redo them in front, because they save front - 
which is something that strings can't do (it's one place where a wrapper 
struct for strings like Steven keeps bringing up could actually be a 
performance gain). So, this may ultimately only be of benefit for strings, and 
strings are what I'm personally most worried about anyway (strings and ranges 
wrapped around strings). So, I think that the main consideration here is how 
this affects the speed of operating on strings (and more importantly, ranges 
which wrap strings, since to some extent, code which operates directly on 
strings can already specialize on them and do what consumeFront does).

So, if we want us to see how much we can optimize front and popFront for 
strings before we look at adding consumeFront, that's fine. But there _will_ be 
additional cost to calling both front and popFront. That cannot be avoided as 
its intrinsic to what front and popFront tasked with doing. It's just a 
question of how small we can make it.

- Jonathan M Davis