Table lookups - this is pretty definitive

Wed Apr 2 04:49:27 PDT 2014

On Tuesday, 1 April 2014 at 18:35:50 UTC, Walter Bright wrote:
> Try this benchmark comparing various classification schemes:
>
> bool isIdentifierChar1(ubyte c)
> {
>     return ((c >= '0' || c == '$') &&
>             (c <= '9' || c >= 'A')  &&
>             (c <= 'Z' || c >= 'a' || c == '_') &&
>             (c <= 'z'));
> }

I'd like to point out this is quite a complicated function to 
begin with, so it doesn't generalize to all isXXX is ascii, for 
which the tests would be fairly simpler.

In any case, (on my win32 machine) I can go from 810msecs to 
500msecs using this function instead:

bool isIdentifierChar1(ubyte c)
{
     return c <= 'z' && (
             'a' <= c ||
             ('0' <= c && (c <= '9' || c == '_' || ('A' <= c && c 
<= 'Z'))) ||
             c == '$');
}

That said, I'm abusing the fact that 50% of your bench is for 
chars over 0x80. If I loop only on actual ASCII you can find in 
text, (0x20 - 0X80), then those numbers "only" go from "320" => 
"300". Only slightly better, but still a win.

*BUT*, if your functions were to accept any arbitrary codepoint, 
it would absolutely murder.