Working with utf
Derek Parnell
derek at psych.ward
Thu Jun 14 07:42:15 PDT 2007
On Thu, 14 Jun 2007 15:48:50 +0200, Frits van Bommel wrote:
> Derek Parnell wrote:
>> On Thu, 14 Jun 2007 15:13:35 +0200, Simen Haugen wrote:
>>
>>> "Derek Parnell" <derek at psych.ward> wrote in message
>>> news:n1j2izm4a0x5.413sc7jjzk2x.dlg at 40tude.net...
>>>> Convert to utf32 (dchar[]) then do your stuff and convert back to latin-1
>>>> when you're done. Each dchar[] element is a single character.
>>> You're kidding me, right? Then I only have to convert to utf-32 when reading
>>> a file, and back to latin-1 when writing. Thats great! (except I have to
>>> modify a lot of char[] to dchar[])
>>
>> dchar[] Y;
>> char[] Z;
>>
>> Y = std.utf.toUTF32(Z);
>
> Except his input is encoded as Latin-1, not UTF-8.
I read the OP as saying he was already converting Latin-1 to utf8 and was
nowe concerned about converting utf8 to utf32, thus I gave that toUTF32()
hint.
> Conversion is still
> trivial though:
> ---
> auto latin1 = cast(ubyte[]) std.file.read("some_latin-1_file.txt");
> dchar[] utf = new dchar[](latin1.length);
> for(size_t i = 0; i < latin1.length; i++) {
> utf[i] = latin1[i];
> }
> ---
> and the other way around.
> (The first 256 code points of Unicode are identical to Latin-1)
I was not aware of that. So if one needs to convert from Latin-1 to utf8
...
import std.utf;
dchar[] Latin1toUTF32(ubyte[] pLatin1Text)
{
dchar[] utf;
utf.length = pLatin1Text.length;
foreach(i, b; pLatin1Text)
utf[i] = b;
return utf;
}
char[] Latin1toUTF8(ubyte[] pLatin1Text)
{
return std.utf.toUTF8(Latin1toUTF32(pLatin1Text));
}
import std.stdio;
void main()
{
ubyte[] td;
td.length = 256;
for (int i = 0; i < 256; i++)
td[i] = i;
// On windows, set the code page to 65001
// and the font to Lucinda Console.
// eg. C:\> chcp 65001
// Active code page: 65001
std.stdio.writefln("%s", Latin1toUTF8(td));
}
--
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell
More information about the Digitalmars-d
mailing list