[Issue 1235] New: std.string.tolower() fails on certain utf8	characters
    d-bugmail at puremagic.com 
    d-bugmail at puremagic.com
       
    Tue May 15 17:08:33 PDT 2007
    
    
  
http://d.puremagic.com/issues/show_bug.cgi?id=1235
           Summary: std.string.tolower() fails on certain utf8 characters
           Product: D
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Phobos
        AssignedTo: bugzilla at digitalmars.com
        ReportedBy: d at chqrlie.org
import std.string;
int main(char[][] args)
{
    printf("tolower(\"\\u0130e\") -> \"%.*s\"\n", tolower("\u0130e"));
    return 0;
}
produces incorrect output:
tolower("\u0130e") -> "i e"
Bug comes from erroneous code in phobos/std/string.d line 843:
                    if (r.length != i + j)
                        r = r[0 .. i + j];
Turkish dotted capital I (U+0130) is correctly converted to ASCII i (u+0069). 
But converted character does not use the same number of bytes as original
character.  The code above is therefore incorrect.  As far as I understand the
implementation, it could be removed completely.
A similar issue is present in toupper(), with the additional twist that
conversion to uppercase should not be special cased for the ASCII subset in the
Turkish Locale.
Additionally, non ASCII code is triggered by if (c >= 0x7F) where it should be
if (c > 0x7F).
-- 
    
    
More information about the Digitalmars-d-bugs
mailing list