latin-1 encoding
Frits van Bommel
fvbommel at REMwOVExCAPSs.nl
Fri Jan 12 08:24:41 PST 2007
Simen Haugen wrote:
> "Johan Granberg" wrote:
>> What are you trying to do? It would be helpfull to know if you want to
>> read
>> files in latin-1 or if you want your whole program to use it internally.
>
> Reading and writing files.
Now I'm no expert in character encodings, but isn't Latin-1 just the
first 256 codepoints (or whatever they're called) of Unicode, packed
into a single byte per character?
If so, it should be pretty trivial to convert latin-1 characters to
Unicode, either to wchar[]/dchar[] by direct one-to-one assignment (no
multibyte sequences possible) or to char[] by using std.utf.encode, like
this:
-----
// warning: incomplete, untested code
ubyte[] data_lat1;
// ... fill data_lat1 array
char[] data_utf8; // perhaps preallocate this to a reasonable length
foreach(c; data_lat1) {
std.utf.encode(data_utf8, c);
}
-----
And UTF to Latin-1 should be pretty easy too:
-----
// again: incomplete, untested code
char[] data_utf; // wchar[] and dchar[] should work as well
ubyte[] data_lat1; // again, preallocate a reasonable array if you want
size_t i = 0;
while(i < data_utf.length) {
dchar c = std.utf.decode(data_utf, i); // advances i
assert(c < 0x100); // make sure it fits
data_lat1 ~= c;
}
-----
I should note that by 'preallocate' I mean '"new" an array and set the
length to 0'.
Setting the length to 0 is important since otherwise your output will
get appended to the end of a default-initialized array, which isn't what
you want ;)
More information about the Digitalmars-d-learn
mailing list