"foreach(i, dchar c; s)" vs "decode"
monarch_dodra
monarchdodra at gmail.com
Sun Nov 25 13:37:24 PST 2012
I spent *all* week benchmarking a string processing function. And
now, at the end of the week, I can safely say that the compiler's
"foreach" is slower than a phobos decode based while loop.
Basically, given a
----
foreach(i, dchar c; s)
{codeCodeCode;}
----
loop, I replaced it with:
----
{
size_t i;
size_t j;
immutable k = s.length;
dchar c;
for ( ; i < k ; i = j )
{
c = decode(s, j);
codeCodeCode;
}
}
----
And my algorithms instantly gained a 10-25% performance
improvement(!). I benched using varied sources of data, in
particular, both ASCII only strings, as well as unicode heavy
text.
Unicode has better gains, but raw ASCII text is *also* has gains
:/
this holds true for both UTF-8 and UTF-16.
UTF-32 is different, because foreach has the "unfair" advantage
of not validating the code points...
I got these results on 2.061 alpha release, with phobos in
release and both -inline and without inline.
So if any of the compiler guys are reading this... I have no idea
how the unicode foreach is actually implemented, but there
*should* be substantial gains to be had...
More information about the Digitalmars-d
mailing list