"foreach(i, dchar c; s)" vs "decode"
Dmitry Olshansky
dmitry.olsh at gmail.com
Tue Nov 27 01:14:55 PST 2012
11/26/2012 1:37 AM, monarch_dodra пишет:
> I spent *all* week benchmarking a string processing function. And now,
> at the end of the week, I can safely say that the compiler's "foreach"
> is slower than a phobos decode based while loop.
>
It was inevitable that one day utf decoding implementation in Phobos
could outmatch the one buried in the compiler/runtime. The latter wasn't
scrutinized nearly as much as the decode in std.uni.
> Basically, given a
> ----
> foreach(i, dchar c; s)
> {codeCodeCode;}
> ----
> loop, I replaced it with:
> ----
> {
> size_t i;
> size_t j;
> immutable k = s.length;
> dchar c;
> for ( ; i < k ; i = j )
> {
> c = decode(s, j);
> codeCodeCode;
> }
> }
> ----
>
> And my algorithms instantly gained a 10-25% performance improvement(!).
> I benched using varied sources of data, in particular, both ASCII only
> strings, as well as unicode heavy text.
Nothing better then a dump of Arabic wiki ? ;)
>
> Unicode has better gains, but raw ASCII text is *also* has gains :/
> this holds true for both UTF-8 and UTF-16.
>
> UTF-32 is different, because foreach has the "unfair" advantage of not
> validating the code points...
>
> I got these results on 2.061 alpha release, with phobos in release and
> both -inline and without inline.
Don't forget the -O -noboundscheck. As some things are safe and thus
always have bounds check.
>
> So if any of the compiler guys are reading this... I have no idea how
> the unicode foreach is actually implemented, but there *should* be
> substantial gains to be had...
And how the compiler generated loop can be better? Fundamentally it has
the same amount of knowledge as the "user-space" code has.
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list