New UTF-8 stride function

Kiith-Sa kiithsacmp at gmail.com
Sun May 26 14:04:47 PDT 2013


WRT to the worse Linux64 case:
I recommend infinite-cycling it and testing in perf top.

(If you're on Ubuntu/derivative or maybe Debian, just type "perf 
top",
  it will tell you what package to install, and once installed, 
"perf top" again, while the benchmark is running)

You'll get a precise real-time line-wise (with ability to drill 
down to ASM) profile (like "top", but for functions).

With some command-line options (google "linux perf"), you can 
also look
at cache misses, branch mispredictions, and so on. Compare that 
with the original version and you might find why it's slower.

(Don't have time to test anything right now)


On Sunday, 26 May 2013 at 20:49:36 UTC, Dmitry Olshansky wrote:
> If there is anything that come out of UTF-8 discussion is that 
> I decided to dust off my experimental implementation of UTF-8 
> stride function. Just for fun.
>
> The key difference vs std is in handling non-ASCII case.
> I'm replacing bsr intrinsic with a what I call an "in-register 
> lookup table" (neat stuff that is a piece of cake, thx to CTFE).
>
> See unittest/benchmark here:
> https://gist.github.com/blackwhale/5653927
>
> I'm running tests against wiki titles dumps.
>
> For me the results are mixed but in 2 of 3 _builds_ my version 
> consistently wins (sometimes by as much as 50%).
>
> 1. build is win32 dmd with -release -O -inline  -noboundscheck
> 	my version is consistently faster (results below)
> 2. On linux x64 with the same config
> 	my version is consistently slower
>
> 3. LDC on linux x64 with ldc2 -O3 -d-noboundscheck
> 	my version is again faster with large margin
>
> It's the kind of thing that is tremendously hard to measure 
> accurately since it depends on the workload, architecture and 
> the time spent is very small. So don't take it by word I'm 
> almost certain that something is amiss (compiler switches and 
> whatnot).
>
> Thus I encourage curious folks to measure/analyze it and report 
> back (don't forget to include your processor model).
>
> The unbeatable advantage is however that my version doesn't 
> require bsr/lzcount instruction :) BTW do ARM/PowerPC have any 
> analog of it?
>
> Test files I used:
> https://github.com/blackwhale/gsoc-bench-2012/blob/master/arwiki-latest-all-titles-in-ns0
> https://github.com/blackwhale/gsoc-bench-2012/blob/master/dewiki-latest-all-titles-in-ns0
> https://github.com/blackwhale/gsoc-bench-2012/blob/master/dewiki-latest-all-titles-in-ns0
> https://github.com/blackwhale/gsoc-bench-2012/blob/master/ruwiki-latest-all-titles-in-ns0
>
> Some dumps of my test runs
>
> win32 runs (time taken in usec):
>
> fast_stride ruwiki-latest-all-titles-in-ns0
> stride 313756
> myStride 229650
> myStride 235091
> stride 312563
>
> fast_stride enwiki-latest-all-titles-in-ns0
> stride 346577
> myStride 279915
> myStride 278684
> stride 348902
>
> fast_stride enwiki-latest-all-titles-in-ns0
> stride 345866
> myStride 280902
> myStride 279780
> stride 345653
>
> fast_stride arwiki-latest-all-titles-in-ns0
> stride 46548
> myStride 33840
> myStride 34959
> stride 46342
>
> fast_stride dewiki-latest-all-titles-in-ns0
> stride 79715
> myStride 64719
> myStride 64672
> stride 79848
>
> dmd linux 64 runs
>
> ./fast_stride enwiki-latest-all-titles-in-ns0
> stride 377258
> myStride 630367
> myStride 633262
> stride 378523
>
> ./fast_stride arwiki-latest-all-titles-in-ns0
> stride 33924
> myStride 38807
> myStride 47708
> stride 40160
>
> ./fast_stride arwiki-latest-all-titles-in-ns0
> stride 35110
> myStride 39750
> myStride 49942
> stride 33597



More information about the Digitalmars-d mailing list