latin-1 encoding

Fri Jan 12 08:24:41 PST 2007

Simen Haugen wrote:
> "Johan Granberg" wrote:
>> What are you trying to do? It would be helpfull to know if you want to 
>> read
>> files in latin-1 or if you want your whole program to use it internally.
> 
> Reading and writing files. 

Now I'm no expert in character encodings, but isn't Latin-1 just the 
first 256 codepoints (or whatever they're called) of Unicode, packed 
into a single byte per character?

If so, it should be pretty trivial to convert latin-1 characters to 
Unicode, either to wchar[]/dchar[] by direct one-to-one assignment (no 
multibyte sequences possible) or to char[] by using std.utf.encode, like 
this:

-----
// warning: incomplete, untested code

ubyte[] data_lat1;

// ... fill data_lat1 array

char[] data_utf8;    // perhaps preallocate this to a reasonable length

foreach(c; data_lat1) {
     std.utf.encode(data_utf8, c);
}
-----

And UTF to Latin-1 should be pretty easy too:
-----
// again: incomplete, untested code

char[] data_utf;    // wchar[] and dchar[] should work as well

ubyte[] data_lat1;  // again, preallocate a reasonable array if you want

size_t i = 0;
while(i < data_utf.length) {
     dchar c = std.utf.decode(data_utf, i);    // advances i
     assert(c < 0x100);      // make sure it fits
     data_lat1 ~= c;
}
-----

I should note that by 'preallocate' I mean '"new" an array and set the 
length to 0'.
Setting the length to 0 is important since otherwise your output will 
get appended to the end of a default-initialized array, which isn't what 
you want ;)