Working with utf
Frits van Bommel
fvbommel at REMwOVExCAPSs.nl
Thu Jun 14 07:32:44 PDT 2007
Simen Haugen wrote:
> I tested this now, and it works like a charm. This means I can finally get
> rid of all my convertions between utf8 and latin1! (together with all these
> hidden bugs)
>
> Thanks a lot for all your help.
>
> "Frits van Bommel" <fvbommel at REMwOVExCAPSs.nl> wrote in message
> news:f4rh01$lkt$2 at digitalmars.com...
>> Except his input is encoded as Latin-1, not UTF-8. Conversion is still
>> trivial though:
>> ---
>> auto latin1 = cast(ubyte[]) std.file.read("some_latin-1_file.txt");
>> dchar[] utf = new dchar[](latin1.length);
>> for(size_t i = 0; i < latin1.length; i++) {
>> utf[i] = latin1[i];
>> }
>> ---
>> and the other way around.
>> (The first 256 code points of Unicode are identical to Latin-1)
If you only ever need to represent Latin-1 (but need string functions,
not just array functions), wchar[] will also work, and only take half
the memory.
If you don't need string functions, of course, you can just keep it as
ubyte[]s the whole time.
(By "string functions" I mean stuff like case conversions, console
output and so on. In particular, note that slicing & indexing works on
all arrays, not just strings)
More information about the Digitalmars-d
mailing list