ASCII to UTF8 Conversion - is this right?
Pragma
ericanderton at yahoo.removeme.com
Mon Dec 18 09:19:29 PST 2006
Here's something that came up recently. As some of you may already
know, I've been doing some work with forum data recently.
I wanted to move some old forum data, which was stored in ASCII over to
UTF8 via D. The problem is that some of the data has characters in the
0x80-0xff range, which causes UTF-BOM detection to fail.
So I rolled the following function to 'transcode' these characters:
char[] ASCII2UTF8(char[] value){
char[] result;
for(uint i=0; i<value.length; i++){
char ch = value[i];
if(ch < 0x80){
result ~= ch;
}
else{
result ~= 0xC0 | (ch >> 6);
result ~= 0x80 | (ch & 0x3F);
debug writefln("converted: %0.2X to %0.2X %0.2X",ch, result[$-2],
result[$-1]);
}
}
return result;
}
So my question is, while this conversion is done against a literal
interpretation of the UTF-8 spec: is this the correct way to treat these
characters?
Should I be taking user locale into account? Are high-ASCII chars
considered to be universal?
--
- EricAnderton at yahoo
More information about the Digitalmars-d-learn
mailing list