Read non-UTF8 file

Nrgyzer nrgyzer at gmail.com
Sun Feb 20 06:53:31 PST 2011


== Auszug aus spir (denis.spir at gmail.com)'s Artikel
> On 02/19/2011 02:42 PM, Nrgyzer wrote:
> > == Auszug aus Stewart Gordon (smjg_1998 at yahoo.com)'s Artikel
> >> On 13/02/2011 21:49, Nrgyzer wrote:
> >> <snip>
> >>> It compiles and works as long as the returned char-array/string
of f.readLine() doesn't
> >>> contain non-UTF8 character(s). If it contains such chars,
writeln() doesn't write
> >>> anything to the console. Is there any chance to read such files?
> >> Please post sample input that shows the problem, and the output
generated by replacing the
> >> writeln call with
> >>       writefln("%s", cast(ubyte[]) convertToUTF8(f.readLine()));
> >> so that we can see what it is actually reading in.
> >> Stewart.
> >
> > My file contains the following:
> >
> > �
> > �
> > �
> >
> > Now... and with writefln("%s", cast(ubyte[])
convertToUTF8(f.readLine())); I get the following:
> >
> > [195, 131, 164]
> > [195, 131, 182]
> > [195, 131, 188]
> At first sight, I find your input strange. Actually, it looks like
utf-8 (195
> is common when representing converted latin text). But having 3
times (195,
> 131) which is the code for 'Ã' is weird.
> What is your source text, what is its encoding, and where does it
come from?
> What don't you /start/ and tell us about that?
> Denis

It seems that my input chars doesn't show correctly above... it
contains the following chars:

0xE4 (or 228), 0xF6 (or 246) and 0xFC (or 252)

I used notepad to create the file and saved it as ANSI encoding. The
file is for testing purposes only.


More information about the Digitalmars-d-learn mailing list