Compiler benchmarks for an alternative to std.uni.asLowerCase.
Jon D via Digitalmars-d
digitalmars-d at puremagic.com
Sun May 8 16:38:31 PDT 2016
I did a performance study on speeding up case conversion in
std.uni.asLowerCase. Specifics for asLowerCase have been added to
issue https://issues.dlang.org/show_bug.cgi?id=11229. Publishing
here as some of the more general observations may be of wider
interest.
Background - Case conversion can generally be sped up by checking
if a character is ascii before invoking a full unicode case
conversion. The single character std.uni.toLower does this
optimization, but std.uni.asLowerCase does not. asLowerCase does
a lazy conversion of a range. For the test, I created a
replacement for asLowerCase which uses map and toLower. In
essence, `map!(x => x.toLower)` or `map!(x => x.byDchar.toLower)`.
Testing was with DMD (2.071) and LDC 1.0.0-beta1 (Phobos 2.070)
on OSX. Compiler settings were `-release -O -boundscheck=off`.
DMD was tested with and without `-inline`. LDC turns on inlining
(-enable-inlining=1) by default with -O, but DMD does not. Texts
tried were in Japanese, Chinese, Finnish, English, German, and
Spanish. Timing was done both including and excluding decoding
from utf-8 to dchar.
Performance delta including decoding to dchar:
| Language group | Pct Ascii | LDC gain | DMD gain | DMD no
inline |
|-----------------+-----------+------------+-----------+----------------|
| Latin | 95-99% | 64% (2.7x) | 93% (14x) | 48%
(1.9x) |
| Asian (Jpn/Chn) | 2.4-3.7% | 36% (1.6x) | 80% (5x) | -1%
Performance delta excluding decoding to dchar:
| Language group | Pct Ascii | LDC gain | DMD gain | DMD no
inline |
|-----------------+-----------+------------+-----------+---------------|
| Latin | 95-99% | 60% (2.5x) | 95% (20x) | 60%
(2.5x) |
| Asian (Jpn/Chn) | 2.4-3.7% | 50% (2x) | 95% (20x) | -2%
Observations:
* mapAsLowerCase was faster than asLowerCase across the board.
That it was better for Asian texts suggests the improvement
involved more just the ascii check optimization.
* Performance varied widely between compilers, and for DMD,
whether the -inline flag was included. The performance delta
between asLowerCase and the mapAsLowerCase replacement was very
dependent on these choices. Similarly, the delta between
inclusion and exclusion of auto-decoding was highly dependent on
these selections.
* DMD improvement by using -inline: 30% for asLowerCase (1.5x),
90% for mapAsLowerCase (10x).
* DMD (-inline) vs LDC: For asLowerCase, LDC was 65-85% faster.
For mapAsLowerCase, DMD was 10-40% faster. There were changes to
the map implementation in 2.071, so these were not equivalent,
but still, it's interesting that DMD beat LDC in this case.
Thoughts:
* The large variances between compiler settings imply extra
diligence when performance tuning at the source code level,
especially for code intended for multiple compilers.
* Perhaps DMD -O should also turn on -inline. This would present
a better performance picture to new users. It's also helpful when
the different compilers agree on rough meaning of compiler
switches.
* Auto-decoding is an oft discussed concern. It doesn't show up
in the table above, but the data I looked at suggests the
cost/penalty may vary quite a bit depending on usage context and
compiler/settings. I wasn't studying aspect explicitly. It may be
worth its own analysis.
Other details:
* Code for mapAsLowerCase and the timing program is at:
https://dpaste.dzfl.pl/a0e2fa1c71fd
* Texts used for timing were books in several languages from the
Project Gutenberg site (http://www.gutenberg.org/), with
boilerplate text removed.
--Jon
More information about the Digitalmars-d
mailing list