Proposal: takeFront and takeBack

Jonathan M Davis jmdavisProg at gmx.com
Wed Jul 4 22:26:46 PDT 2012


On Thursday, July 05, 2012 00:04:03 Andrei Alexandrescu wrote:
> On 7/4/12 10:11 PM, Jonathan M Davis wrote:
> > On Wednesday, July 04, 2012 21:33:31 Andrei Alexandrescu wrote:
> >> Great. Could you please post some code so we play with it? Thanks.
> > 
> > Okay. You can use this:
> [snip]
> 
> Thanks. I made the following change to popFront:
> 
> @trusted void popFront(A)(ref A a)
> if (isNarrowString!A && isMutable!A && !isStaticArray!A)
> {
>      assert(a.length, "Attempting to popFront() past the end of an array
> of "
>              ~ typeof(a[0]).stringof);
>      immutable c = a[0];
>      if (c < 0x80)
>      {
>          a = a.ptr[1 .. a.length];
>      }
>      else
>      {
>          import core.bitop;
>          immutable msbs = 7 - bsr(~c);
>          if ((msbs >= 2) & (msbs <= 6))
>          {
>              a = a[msbs .. $];
>          }
>          else
>          {
>              //throw new UTFException("Invalid UTF-8 sequence", 0);
>          }
>      }
> }
> 
> For some reason, uncommenting the throwing code makes the function
> significantly slower. That seems to be an issue with the compiler
> because putting the throw in a function seems to restore speed.
> 
> With the above, I get on a Mac:
> 
> ascii 126.61%: old [682 ms, 479 μs, and 3 hnsecs], new [864 ms, 102 μs,
> and 1 hnsec]
> uni   86.76%:  old [1 sec, 888 ms, 17 μs, and 8 hnsecs], new [1 sec, 638
> ms, 76 μs, and 3 hnsecs]
> 
> So the ascii string handling became actually 27% faster whereas the uni
> string handling is 13% slower.
> 
> It might be argued that checking for validity is not the metier of
> popFront; only if you do try to use stuff (e.g. by calling front) should
> one see exceptions. If popFront sees incorrect characters, it should
> just skip them one at a time. Following that argument, the
> implementation may be:
> 
> @trusted void popFront(A)(ref A a)
> if (isNarrowString!A && isMutable!A && !isStaticArray!A)
> {
>      assert(a.length, "Attempting to popFront() past the end of an array
> of "
>              ~ typeof(a[0]).stringof);
>      immutable c = a[0];
>      if (c < 0x80)
>      {
>          a = a.ptr[1 .. a.length];
>      }
>      else
>      {
>          import core.bitop;
>          auto msbs = 7 - bsr(~c);
>          if ((msbs < 2) | (msbs > 6))
>          {
>              msbs = 1;
>          }
>          a = a[msbs .. $];
>      }
> }
> 
> With this code I get:
> 
> ascii 115.39%: old [744 ms, 103 μs, and 6 hnsecs], new [858 ms, 628 μs,
> and 4 hnsecs]
> uni   96.78%:  old [1 sec, 877 ms, and 461 μs], new [1 sec, 817 ms, 14
> μs, and 3 hnsecs]

Hmm. With the version without bounds checking compared against consumeFront, 
I'm getting downright bizarre results. _Without_ optimizations, I'm seeing 
around 85% for both ASCII and unicode, but with full optimizations, it's 
actually at about 140% for ASCII and 102% for unicode (with -release and -
release -O being around 90% and 80% respectively). So, this code seems _very_ 
sensitive to optimizations and whatever's going on with the CPU in terms of 
caching and context switching.

My StringCache proposal is slightly faster than consumeFront, but the 
situation is essentially the same. It's around 80 - 90% of front and you're 
improved popFront without -inline, but with -inline, it's at about 101 - 140%.

So, this is just weird. It looks like the improved popFront makes it so that 
it's very uncertain as to whether it or consumeFront/StringCache will be 
faster. consumeFront and StringCache both use decode rather than popFront, so 
improvements to decode might make them win, but the front/popFront combo would 
get the same improvements (that, and looking at decode, it looks pretty 
streamlined already).

I guess that the various solutions are all close enough that it becomes 
completely a question of the optimizer as to which is going to be faster, much 
as conceptually, consumeFront and StringCache should win (at least for unicode 
strings). That being the case, we _definitely_ don't want to go with 
consumeFront (it's far too invasive), and not even the relatively uninvasive 
StringCache seems like a good idea. It might improve the code, and it might 
make it worse.

I wonder how the results look on gdc when using the improved popFront, given 
how surprising they were with consumeFront.

In any case, I guess that this shows that what we can get with popFront is so 
close to what consumeFront or StringCache would do that we might as well not 
bother with them, which is a _big_ surpise to me. It does pay to benchmark 
code though.

- Jonathan M Davis


More information about the Digitalmars-d mailing list