Working with utf

Thu Jun 14 07:42:15 PDT 2007

On Thu, 14 Jun 2007 15:48:50 +0200, Frits van Bommel wrote:

> Derek Parnell wrote:
>> On Thu, 14 Jun 2007 15:13:35 +0200, Simen Haugen wrote:
>> 
>>> "Derek Parnell" <derek at psych.ward> wrote in message 
>>> news:n1j2izm4a0x5.413sc7jjzk2x.dlg at 40tude.net...
>>>> Convert to utf32 (dchar[]) then do your stuff and convert back to latin-1
>>>> when you're done. Each dchar[] element is a single character.
>>> You're kidding me, right? Then I only have to convert to utf-32 when reading 
>>> a file, and back to latin-1 when writing. Thats great! (except I have to 
>>> modify a lot of char[] to dchar[])
>> 
>> dchar[] Y;
>> char[]  Z;
>> 
>>  Y = std.utf.toUTF32(Z);
> 
> Except his input is encoded as Latin-1, not UTF-8. 

I read the OP as saying he was already converting Latin-1 to utf8 and was
nowe concerned about converting utf8 to utf32, thus I gave that toUTF32()
hint. 

> Conversion is still 
> trivial though:
> ---
> auto latin1 = cast(ubyte[]) std.file.read("some_latin-1_file.txt");
> dchar[] utf = new dchar[](latin1.length);
> for(size_t i = 0; i < latin1.length; i++) {
>      utf[i] = latin1[i];
> }
> ---
> and the other way around.
> (The first 256 code points of Unicode are identical to Latin-1)

I was not aware of that. So if one needs to convert from Latin-1 to utf8
...

  import std.utf;

   dchar[] Latin1toUTF32(ubyte[] pLatin1Text)
   {
       dchar[] utf;

       utf.length = pLatin1Text.length;
       foreach(i, b; pLatin1Text)
              utf[i] = b;
       return utf;
   }

   char[] Latin1toUTF8(ubyte[] pLatin1Text)
   {
       return std.utf.toUTF8(Latin1toUTF32(pLatin1Text));
   }

import std.stdio;

void main()
{
    ubyte[] td;

    td.length = 256;
    for (int i = 0; i < 256; i++)
       td[i] = i;

    // On windows, set the code page to 65001 
    // and the font to Lucinda Console.
    // eg. C:\> chcp 65001
    //     Active code page: 65001
    std.stdio.writefln("%s", Latin1toUTF8(td));
}
-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell