char array weirdness

Wed Mar 30 10:30:36 PDT 2016

On Wednesday, 30 March 2016 at 05:16:04 UTC, H. S. Teoh wrote:
> If we didn't have autodecoding, would be a simple matter of 
> searching for sentinel substrings.  This also indicates that 
> most of the work done by autodecoding is unnecessary -- it's 
> wasted work since most of the string data is treated opaquely 
> anyway.

Just to drive this point home, I made a very simple benchmark. 
Iterating over code points when you don't need to is 100x slower 
than iterating over code units.

import std.datetime;
import std.stdio;
import std.array;
import std.utf;
import std.uni;

enum testCount = 1_000_000;
enum var = "Lorem ipsum dolor sit amet, consectetur adipiscing 
elit. Praesent justo ante, vehicula in felis vitae, finibus 
tincidunt dolor. Fusce sagittis.";

void test()
{
     auto a = var.array;
}

void test2()
{
     auto a = var.byCodeUnit.array;
}

void test3()
{
     auto a = var.byGrapheme.array;
}

void main()
{
     import std.conv : to;
     auto r = benchmark!(test, test2, test3)(testCount);
     auto result = to!Duration(r[0] / testCount);
     auto result2 = to!Duration(r[1] / testCount);
     auto result3 = to!Duration(r[2] / testCount);

     writeln("auto-decoding", "\t\t", result);
     writeln("byCodeUnit", "\t\t", result2);
     writeln("byGrapheme", "\t\t", result3);
}

$ ldc2 -O3 -release -boundscheck=off test.d
$ ./test
auto-decoding	        1 μs
byCodeUnit		0 hnsecs
byGrapheme		11 μs