[Issue 11229] std.string.toLower is slow

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Mon May 9 15:25:05 PDT 2016


https://issues.dlang.org/show_bug.cgi?id=11229

--- Comment #9 from Jon Degenhardt <jrdemail2000-dlang at yahoo.com> ---
(In reply to Jack Stouffer from comment #8)
> (In reply to Jon Degenhardt from comment #7)
> > auto mapAsLowerCase(Range)(Range str)
> >     if (isInputRange!Range && isSomeChar!(ElementEncodingType!Range) &&
> >         !isConvertibleToString!Range)
> > {
> >     static if (ElementEncodingType!Range.sizeof < dchar.sizeof)
> >     {
> >         import std.utf : byDchar;
> >         return str.byDchar.mapAsLowerCase;
> >     }
> >     else
> >     {
> >         import std.algorithm : map;
> >         import std.uni : toLower;
> >         
> >         return str.map!(x => x.toLower);
> >     }
> > }
> 
> I attempted to replace asLowerCase with this, but it's blocked by
> https://issues.dlang.org/show_bug.cgi?id=16005

The version I wrote was reasonable for performance analysis, but it is not
fully consistent functionally with the current version of asLowerCase. The
current version of asLowerCase does "full case folding", which means the
character length may expand. The version I wrote (and the single character
version of toLower it calls) do "simple case folding", where the character
length may expand. An example is "Latin Capital Letter I With Dot Above" (İ,
u+130). In simple case folding, it becomes "Latin small letter I" (the ascii
lower case letter I, u+0069). In full case folding, it becomes the character
sequence [0069 0307] (lower case I followed by 'combining dot above).

Both case folding approaches have valid uses. In retrospect, what my analysis
did not do is differentiate the cost of the ascii check from the cost of
full-case folding. 

One reasonable question is if the ascii check can be incorporated into
std.uni.toCaser. If so, that would preserve the current full case-folding
functionality. There still might be value to having a simple-case folding
version as well, but that could be treated as a separate topic.

--


More information about the Digitalmars-d-bugs mailing list