Optimization fun

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Thu Nov 6 16:14:01 PST 2014


On Thu, Nov 06, 2014 at 03:15:04PM -0800, Walter Bright via Digitalmars-d wrote:
> On 11/6/2014 2:58 PM, H. S. Teoh via Digitalmars-d wrote:
> >2) Auto-decoding is evil: the solver currently uses a naïve char[]
> >representation for the puzzle board, and uses countUntil()
> >extensively to locate things on the board. The original code had:
> >
> >	auto offset = currentBoard.countUntil(ch);
> >
> >which, of course, triggers autodecoding (even though there is
> >actually no reason to do so: ch is a char, and therefore countUntil
> >could've used strchr instead). Simply changing that to:
> >
> >	auto offset = currentBoard.representation.countUntil(ch);
> >
> >gave an additional 20%-30% performance boost.
> 
> I've argued this point for over a year, apparently without convincing
> anyone :-(

I dunno, my impression is that some people agree with it, but consider
that the cost of changing it now a little too prohibitive due to the
extensive code breakage it will cause. But the primary obstacle is that
Andrei is not convinced, so he's going to veto any change in this
direction regardless. You two will have to tussle it out to resolve
this. ;-)


> BTW, use .byChar. instead of .representation. because the former
> leaves it typed as 'char' and the latter converts it to 'ubyte'.

I was going to use .byChar, but since I'm doing performance testing on
gdc, and (my version of) gdc hasn't caught up to the latest Phobos yet,
I have to settle with .representation for now. For my purposes it's
harmless, since I'm not actually doing anything with the ubyte
representation once countUntil is done; the return value is used to
index the original char[].


> >3) However, countUntil remains at the top of the profiler's list of
> >hotspots. DMD's optimizer was rather disappointing here, so I looked
> >at gdc's output instead. Digging into the disassembly, I saw that it
> >was using a foreach loop over the char[], but for some reason even
> >gdc -O3 failed to simplify that loop significantly (I'm not sure why
> >-- maybe what threw off the optimizer is the array indexing + length
> >check operation, where in C one would normally jump bump the pointer
> >instead?). Eventually, I tried writing a manual loop:
> >
> >             auto ptr = current.ptr;
> >             auto c = current.length;
> >             while (c > 0 && *ptr != m.id)
> >                 ptr++;
> >             cert.headOffset = ptr - current.ptr;
> >
> >This pushed gdc far enough in the right direction that it finally
> >produced a 3-instruction inner loop. Strangely enough, the assembly
> >had no check for (c > 0)... could it be because there is an assert
> >following the loop claiming that that will never happen, therefore
> >gdc elided the check? I thought Walter hasn't gotten around to
> >implementing optimizations based on assert yet??  Anyway, with this
> >optimization, I managed to shave off another 3%-5% of the total
> >running time.
> 
> Walter hasn't gotten around to implementing optimizations in GDC,
> that's fer sure! :-)

LOL... I wasn't sure whether said optimizations were going to be in the
front end, or in the backend only. It might not be a bad idea to put it
in the front end where applicable, then all compilers will benefit from
it.


T

-- 
If the comments and the code disagree, it's likely that *both* are wrong. -- Christopher


More information about the Digitalmars-d mailing list