D on next-gen consoles and for game development
    Dmitry Olshansky 
    dmitry.olsh at gmail.com
       
    Fri May 24 08:36:10 PDT 2013
    
    
  
24-May-2013 18:38, Manu пишет:
> On 24 May 2013 19:49, Jacob Carlborg <doob at me.com <mailto:doob at me.com>>
> wrote:
>
>     On 2013-05-23 23:42, Joseph Rushton Wakeling wrote:
>
>         I'm also in agreement with Manu.  There may well already be bugs
>         for some of
>         them -- e.g. there is one for toUpperInPlace which he referred
>         to, and the
>         source of the allocation is clear and is even responsible for
>         other bugs:
>         http://d.puremagic.com/issues/__show_bug.cgi?id=9629
>         <http://d.puremagic.com/issues/show_bug.cgi?id=9629>
>
>
>     toUpper/lower cannot be made in place if it should handle all
>     Unicode. Some characters will change their length when convert
>     to/from uppercase. Examples of these are the German double S and
>     some Turkish I.
>
>
> ß and SS are both actually 2 bytes, so it works in UTF-8 at least! ;)
Okay, here you go - an UTF-8 table of cased sin :)
Codepoint - upper-case - lower-case
0x01e9e : 0x000df - 3 : 2
0x0023a : 0x02c65 - 2 : 3
0x0023e : 0x02c66 - 2 : 3
0x02c7e : 0x0023f - 3 : 2
0x02c7f : 0x00240 - 3 : 2
0x02c6f : 0x00250 - 3 : 2
0x02c6d : 0x00251 - 3 : 2
0x02c70 : 0x00252 - 3 : 2
0x0a78d : 0x00265 - 3 : 2
0x0a7aa : 0x00266 - 3 : 2
0x02c62 : 0x0026b - 3 : 2
0x02c6e : 0x00271 - 3 : 2
0x02c64 : 0x0027d - 3 : 2
0x01e9e : 0x000df - 3 : 2
0x02c62 : 0x0026b - 3 : 2
0x02c64 : 0x0027d - 3 : 2
0x0023a : 0x02c65 - 2 : 3
0x0023e : 0x02c66 - 2 : 3
0x02c6d : 0x00251 - 3 : 2
0x02c6e : 0x00271 - 3 : 2
0x02c6f : 0x00250 - 3 : 2
0x02c70 : 0x00252 - 3 : 2
0x02c7e : 0x0023f - 3 : 2
0x02c7f : 0x00240 - 3 : 2
0x0a78d : 0x00265 - 3 : 2
0x0a7aa : 0x00266 - 3 : 2
And this is only with 1:1 mapping.
Generated by:
void main(){
     import std.uni, std.utf, std.stdio;
     char buf[4];
     foreach(dchar ch; unicode.Cased_Letter.byCodepoint){
         dchar upper = toUpper(ch);
         dchar lower = toLower(ch);
         int uLen = encode(buf, upper);
         int lLen = encode(buf, lower);
         if(uLen != lLen)
             writefln("0x%05x : 0x%05x - %d : %d", upper, lower, uLen, 
lLen);
     }
}
-- 
Dmitry Olshansky
    
    
More information about the Digitalmars-d
mailing list