Proposal: takeFront and takeBack
Jonathan M Davis
jmdavisProg at gmx.com
Wed Jul 4 22:26:46 PDT 2012
On Thursday, July 05, 2012 00:04:03 Andrei Alexandrescu wrote:
> On 7/4/12 10:11 PM, Jonathan M Davis wrote:
> > On Wednesday, July 04, 2012 21:33:31 Andrei Alexandrescu wrote:
> >> Great. Could you please post some code so we play with it? Thanks.
> >
> > Okay. You can use this:
> [snip]
>
> Thanks. I made the following change to popFront:
>
> @trusted void popFront(A)(ref A a)
> if (isNarrowString!A && isMutable!A && !isStaticArray!A)
> {
> assert(a.length, "Attempting to popFront() past the end of an array
> of "
> ~ typeof(a[0]).stringof);
> immutable c = a[0];
> if (c < 0x80)
> {
> a = a.ptr[1 .. a.length];
> }
> else
> {
> import core.bitop;
> immutable msbs = 7 - bsr(~c);
> if ((msbs >= 2) & (msbs <= 6))
> {
> a = a[msbs .. $];
> }
> else
> {
> //throw new UTFException("Invalid UTF-8 sequence", 0);
> }
> }
> }
>
> For some reason, uncommenting the throwing code makes the function
> significantly slower. That seems to be an issue with the compiler
> because putting the throw in a function seems to restore speed.
>
> With the above, I get on a Mac:
>
> ascii 126.61%: old [682 ms, 479 μs, and 3 hnsecs], new [864 ms, 102 μs,
> and 1 hnsec]
> uni 86.76%: old [1 sec, 888 ms, 17 μs, and 8 hnsecs], new [1 sec, 638
> ms, 76 μs, and 3 hnsecs]
>
> So the ascii string handling became actually 27% faster whereas the uni
> string handling is 13% slower.
>
> It might be argued that checking for validity is not the metier of
> popFront; only if you do try to use stuff (e.g. by calling front) should
> one see exceptions. If popFront sees incorrect characters, it should
> just skip them one at a time. Following that argument, the
> implementation may be:
>
> @trusted void popFront(A)(ref A a)
> if (isNarrowString!A && isMutable!A && !isStaticArray!A)
> {
> assert(a.length, "Attempting to popFront() past the end of an array
> of "
> ~ typeof(a[0]).stringof);
> immutable c = a[0];
> if (c < 0x80)
> {
> a = a.ptr[1 .. a.length];
> }
> else
> {
> import core.bitop;
> auto msbs = 7 - bsr(~c);
> if ((msbs < 2) | (msbs > 6))
> {
> msbs = 1;
> }
> a = a[msbs .. $];
> }
> }
>
> With this code I get:
>
> ascii 115.39%: old [744 ms, 103 μs, and 6 hnsecs], new [858 ms, 628 μs,
> and 4 hnsecs]
> uni 96.78%: old [1 sec, 877 ms, and 461 μs], new [1 sec, 817 ms, 14
> μs, and 3 hnsecs]
Hmm. With the version without bounds checking compared against consumeFront,
I'm getting downright bizarre results. _Without_ optimizations, I'm seeing
around 85% for both ASCII and unicode, but with full optimizations, it's
actually at about 140% for ASCII and 102% for unicode (with -release and -
release -O being around 90% and 80% respectively). So, this code seems _very_
sensitive to optimizations and whatever's going on with the CPU in terms of
caching and context switching.
My StringCache proposal is slightly faster than consumeFront, but the
situation is essentially the same. It's around 80 - 90% of front and you're
improved popFront without -inline, but with -inline, it's at about 101 - 140%.
So, this is just weird. It looks like the improved popFront makes it so that
it's very uncertain as to whether it or consumeFront/StringCache will be
faster. consumeFront and StringCache both use decode rather than popFront, so
improvements to decode might make them win, but the front/popFront combo would
get the same improvements (that, and looking at decode, it looks pretty
streamlined already).
I guess that the various solutions are all close enough that it becomes
completely a question of the optimizer as to which is going to be faster, much
as conceptually, consumeFront and StringCache should win (at least for unicode
strings). That being the case, we _definitely_ don't want to go with
consumeFront (it's far too invasive), and not even the relatively uninvasive
StringCache seems like a good idea. It might improve the code, and it might
make it worse.
I wonder how the results look on gdc when using the improved popFront, given
how surprising they were with consumeFront.
In any case, I guess that this shows that what we can get with popFront is so
close to what consumeFront or StringCache would do that we might as well not
bother with them, which is a _big_ surpise to me. It does pay to benchmark
code though.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list