char array weirdness

Wed Mar 30 15:49:24 PDT 2016

On 30.03.2016 19:30, Jack Stouffer wrote:
> Just to drive this point home, I made a very simple benchmark. Iterating
> over code points when you don't need to is 100x slower than iterating
> over code units.
[...]
> enum testCount = 1_000_000;
> enum var = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.
> Praesent justo ante, vehicula in felis vitae, finibus tincidunt dolor.
> Fusce sagittis.";
>
> void test()
> {
>      auto a = var.array;
> }
>
> void test2()
> {
>      auto a = var.byCodeUnit.array;
> }
>
> void test3()
> {
>      auto a = var.byGrapheme.array;
> }
[...]
> $ ldc2 -O3 -release -boundscheck=off test.d
> $ ./test
> auto-decoding            1 μs
> byCodeUnit        0 hnsecs
> byGrapheme        11 μs

When byCodeUnit takes no time at all, isn't 1µs infinite times slower, 
instead of 100 times? And I think byCodeUnits's 1µs is so low that noise 
is going to mess with any ratios you make.

byCodeUnit taking no time at all suggests that it's been optimized away 
completely. To avoid that, don't hardcode the test data, and make some 
output that depends on the calculations being actually done. There was a 
little thread about this recently:
http://forum.dlang.org/post/sdmdwyhfgmbppfflkljz@forum.dlang.org

I think creating arrays from the ranges is relatively costly and noisy, 
and it's not of interest when you want to compare iteration speed.