New UTF-8 stride function

Dmitry Olshansky dmitry.olsh at gmail.com
Sun May 26 13:49:33 PDT 2013


If there is anything that come out of UTF-8 discussion is that I decided 
to dust off my experimental implementation of UTF-8 stride function. 
Just for fun.

The key difference vs std is in handling non-ASCII case.
I'm replacing bsr intrinsic with a what I call an "in-register lookup 
table" (neat stuff that is a piece of cake, thx to CTFE).

See unittest/benchmark here:
https://gist.github.com/blackwhale/5653927

I'm running tests against wiki titles dumps.

For me the results are mixed but in 2 of 3 _builds_ my version 
consistently wins (sometimes by as much as 50%).

1. build is win32 dmd with -release -O -inline  -noboundscheck
	my version is consistently faster (results below)
2. On linux x64 with the same config
	my version is consistently slower

3. LDC on linux x64 with ldc2 -O3 -d-noboundscheck
	my version is again faster with large margin

It's the kind of thing that is tremendously hard to measure accurately 
since it depends on the workload, architecture and the time spent is 
very small. So don't take it by word I'm almost certain that something 
is amiss (compiler switches and whatnot).

Thus I encourage curious folks to measure/analyze it and report back 
(don't forget to include your processor model).

The unbeatable advantage is however that my version doesn't require 
bsr/lzcount instruction :) BTW do ARM/PowerPC have any analog of it?

Test files I used:
https://github.com/blackwhale/gsoc-bench-2012/blob/master/arwiki-latest-all-titles-in-ns0
https://github.com/blackwhale/gsoc-bench-2012/blob/master/dewiki-latest-all-titles-in-ns0
https://github.com/blackwhale/gsoc-bench-2012/blob/master/dewiki-latest-all-titles-in-ns0
https://github.com/blackwhale/gsoc-bench-2012/blob/master/ruwiki-latest-all-titles-in-ns0

Some dumps of my test runs

win32 runs (time taken in usec):

fast_stride ruwiki-latest-all-titles-in-ns0
stride 313756
myStride 229650
myStride 235091
stride 312563

fast_stride enwiki-latest-all-titles-in-ns0
stride 346577
myStride 279915
myStride 278684
stride 348902

fast_stride enwiki-latest-all-titles-in-ns0
stride 345866
myStride 280902
myStride 279780
stride 345653

fast_stride arwiki-latest-all-titles-in-ns0
stride 46548
myStride 33840
myStride 34959
stride 46342

fast_stride dewiki-latest-all-titles-in-ns0
stride 79715
myStride 64719
myStride 64672
stride 79848

dmd linux 64 runs

./fast_stride enwiki-latest-all-titles-in-ns0
stride 377258
myStride 630367
myStride 633262
stride 378523

./fast_stride arwiki-latest-all-titles-in-ns0
stride 33924
myStride 38807
myStride 47708
stride 40160

./fast_stride arwiki-latest-all-titles-in-ns0
stride 35110
myStride 39750
myStride 49942
stride 33597


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list