Working with utf

Thu Jun 14 07:32:44 PDT 2007

Simen Haugen wrote:
> I tested this now, and it works like a charm. This means I can finally get 
> rid of all my convertions between utf8 and latin1! (together with all these 
> hidden bugs)
> 
> Thanks a lot for all your help.
> 
> "Frits van Bommel" <fvbommel at REMwOVExCAPSs.nl> wrote in message 
> news:f4rh01$lkt$2 at digitalmars.com...
>> Except his input is encoded as Latin-1, not UTF-8. Conversion is still 
>> trivial though:
>> ---
>> auto latin1 = cast(ubyte[]) std.file.read("some_latin-1_file.txt");
>> dchar[] utf = new dchar[](latin1.length);
>> for(size_t i = 0; i < latin1.length; i++) {
>>     utf[i] = latin1[i];
>> }
>> ---
>> and the other way around.
>> (The first 256 code points of Unicode are identical to Latin-1) 

If you only ever need to represent Latin-1 (but need string functions, 
not just array functions), wchar[] will also work, and only take half 
the memory.
If you don't need string functions, of course, you can just keep it as 
ubyte[]s the whole time.
(By "string functions" I mean stuff like case conversions, console 
output and so on. In particular, note that slicing & indexing works on 
all arrays, not just strings)