[Issue 11229] std.string.toLower is slow

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Sun May 8 13:26:17 PDT 2016


https://issues.dlang.org/show_bug.cgi?id=11229

--- Comment #7 from Jon Degenhardt <jrdemail2000-dlang at yahoo.com> ---
Similar to the previous comment, I tried an alternate implementation for
std.uni.asLowerCase using map with std.uni.toLower (the single character
version). The single character toLower already has the ascii check
optimization.

Timing with LDC was improved in all cases. 2.5-3x for latin languages, 1.5-2x
for Japanese and Chinese. For DMD the improvements were 5x-20x, depending on
whether depending on latin vs asian text and whether decoding to dchar was
included or not.

Program used to do timing is here: https://dpaste.dzfl.pl/a0e2fa1c71fd

Texts used for timing were books in several languages found on the Project
Gutenberg site (http://www.gutenberg.org/), with the boilerplate text removed.
Latin languages tested were in English, Finnish, German, Spanish. 

Timing was done on OSX; DMD 2.071 (-release -O -boundscheck=off -inline); LDC
1.0.0-beta1 (Phobos 2.070.2; -release -O -boundscheck=off).

That the Japanese and Chinese language docs showed improvement suggests the map
+ toLower was faster than std.uni.asLowerCase apart from the ascii
optimization. Texts used for these had only about 3% ascii characters, so the
optimization was rarely used.

The replacement for asLowerCase I used is:

auto mapAsLowerCase(Range)(Range str)
    if (isInputRange!Range && isSomeChar!(ElementEncodingType!Range) &&
        !isConvertibleToString!Range)
{
    static if (ElementEncodingType!Range.sizeof < dchar.sizeof)
    {
        import std.utf : byDchar;
        return str.byDchar.mapAsLowerCase;
    }
    else
    {
        import std.algorithm : map;
        import std.uni : toLower;

        return str.map!(x => x.toLower);
    }
}

--


More information about the Digitalmars-d-bugs mailing list