Table lookups - this is pretty definitive

John Colvin john.loughran.colvin at gmail.com
Tue Apr 1 12:23:09 PDT 2014


On Tuesday, 1 April 2014 at 18:35:50 UTC, Walter Bright wrote:
> Try this benchmark comparing various classification schemes:
> ---------------------------------
> import core.stdc.stdlib;
> import core.stdc.string;
>
> import std.algorithm;
> import std.array;
> import std.ascii;
> import std.datetime;
> import std.range;
> import std.stdio;
> import std.traits;
>
> bool isIdentifierChar0(ubyte c)
> {
>     return isAlphaNum(c) || c == '_' || c == '$';
> }
>
> bool isIdentifierChar1(ubyte c)
> {
>     return ((c >= '0' || c == '$') &&
>             (c <= '9' || c >= 'A')  &&
>             (c <= 'Z' || c >= 'a' || c == '_') &&
>             (c <= 'z'));
> }
>
> immutable bool[256] tab2;
> static this()
> {
>     for (size_t u = 0; u < 0x100; ++u)
>     {
>         tab2[u] = isIdentifierChar0(cast(ubyte)u);
>     }
> }
>
> bool isIdentifierChar2(ubyte c)
> {
>     return tab2[c];
> }
>
> /*********************************************/
>
> int f0()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar0(cast(ubyte)u);
>     }
>     return x;
> }
>
> int f1()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar1(cast(ubyte)u);
>     }
>     return x;
> }
>
> int f2()
> {
>     int x;
>     for (uint u = 0; u < 0x100; ++u)
>     {
>         x += isIdentifierChar2(cast(ubyte)u);
>     }
>     return x;
> }
>
> void main()
> {
>     auto r = benchmark!(f0, f1, f2)(10_000);
>     writefln("Milliseconds %s %s %s", r[0].msecs, r[1].msecs, 
> r[2].msecs);
> }

It's not really possible to tell anything from that benchmark, 
especially with fancy modern optimisers and branch predictors.

1) ldc and gdc both optimise away some/all of the tests 
respectively.
2) once isIdentifierCharX is inlined, the compiler can determine 
what the results will be before-hand.
3) Even without 1) and 2), the results are unrealistically 
(very!) correlated, leading to a branch-prediction bonanza. I'm 
sure you know how significant that can be on modern CPUs. It is 
also very friendly to pre-fetching of the lookup table*

Perhaps have the same benchmark, but working on realistic data 
from a file.


*In my experience, the cost of using lookup tables often only 
appears in real code, where cache pressure and predictability 
becomes worse. This is especially true of lazy, range-based code. 
If several layers of a range use different lookup tables you can 
quickly end up ruining cache performance compared to an eager 
approach.


More information about the Digitalmars-d mailing list