Read non-UTF8 file
Nrgyzer
nrgyzer at gmail.com
Tue Feb 22 08:56:41 PST 2011
== Auszug aus Stewart Gordon (smjg_1998 at yahoo.com)'s Artikel
> What compiler version/platform are you using? I had to fix some errors before it would
> compile on mine (1.066/2.051 Windows).
> On 19/02/2011 13:42, Nrgyzer wrote:
> <snip>
> > Now... and with writefln("%s", cast(ubyte[]) convertToUTF8(f.readLine())); I get the following:
> >
> > [195, 131, 164]
> > [195, 131, 182]
> > [195, 131, 188]
> It took a while for me to make sense of what's going on!
> The expressions (0xC0 | (ch >> 6)) and (0x80 | (ch & 0x3F)) both have type int. It
> appears that, in D2, if you append an int to a string then it treats the int as a Unicode
> codepoint and automagically converts it to UTF-8. But why is it doing it on the first
> byte and not the second? This looks like a bug.
> Casting each UTF-8 byte value to a char
> if (ch < 0x80) {
> result ~= cast(char) ch;
> } else {
> result ~= cast(char) (0xC0 | (ch >> 6));
> result ~= cast(char) (0x80 | (ch & 0x3F));
> }
> gives the expected output
> [195, 164]
> [195, 182]
> [195, 188]
> HTH
> Stewart.
I also wondered because I've used the same code in D1 and it worked without any problems. Anyway... thanks :)
More information about the Digitalmars-d-learn
mailing list