Table lookups - this is pretty definitive
monarch_dodra
monarchdodra at gmail.com
Wed Apr 2 04:49:27 PDT 2014
On Tuesday, 1 April 2014 at 18:35:50 UTC, Walter Bright wrote:
> Try this benchmark comparing various classification schemes:
>
> bool isIdentifierChar1(ubyte c)
> {
> return ((c >= '0' || c == '$') &&
> (c <= '9' || c >= 'A') &&
> (c <= 'Z' || c >= 'a' || c == '_') &&
> (c <= 'z'));
> }
I'd like to point out this is quite a complicated function to
begin with, so it doesn't generalize to all isXXX is ascii, for
which the tests would be fairly simpler.
In any case, (on my win32 machine) I can go from 810msecs to
500msecs using this function instead:
bool isIdentifierChar1(ubyte c)
{
return c <= 'z' && (
'a' <= c ||
('0' <= c && (c <= '9' || c == '_' || ('A' <= c && c
<= 'Z'))) ||
c == '$');
}
That said, I'm abusing the fact that 50% of your bench is for
chars over 0x80. If I loop only on actual ASCII you can find in
text, (0x20 - 0X80), then those numbers "only" go from "320" =>
"300". Only slightly better, but still a win.
*BUT*, if your functions were to accept any arbitrary codepoint,
it would absolutely murder.
More information about the Digitalmars-d
mailing list