Reading ASCII file with some codes above 127 (exten ascii)
H. S. Teoh
hsteoh at quickfur.ath.cx
Wed May 23 12:27:17 PDT 2012
On Wed, May 23, 2012 at 09:09:27PM +0200, Paul wrote:
> On Wednesday, 23 May 2012 at 19:01:53 UTC, Graham Fawcett wrote:
[...]
> >So I think what you're trying to do is
> >
> >1. read a Latin-1 file, into unicode (internally in D)
> >2. do splitLines(), etc., generating some result
> >3. Convert the result back to latin-1, and output it.
> >
> >Is that right?
> >Graham
>
> Exactly.
The safest way is probably to read it as binary data (i.e. byte[]), then
do the conversion into UTF8, then process it, and finally convert it
back to latin-1 (in binary form) and output it.
D assumes Unicode internally; if you try to read a Latin-1 file as
char[], you may be running into some implicit UTF conversions that are
corrupting the data. Best use byte[] for reading/writing, and do
conversions to/from UTF-8 internally for processing.
T
--
Doubt is a self-fulfilling prophecy.
More information about the Digitalmars-d-learn
mailing list