Reading ASCII file with some codes above 127 (exten ascii)

H. S. Teoh hsteoh at quickfur.ath.cx
Wed May 23 12:27:17 PDT 2012


On Wed, May 23, 2012 at 09:09:27PM +0200, Paul wrote:
> On Wednesday, 23 May 2012 at 19:01:53 UTC, Graham Fawcett wrote:
[...]
> >So I think what you're trying to do is
> >
> >1. read a Latin-1 file, into unicode (internally in D)
> >2. do splitLines(), etc., generating some result
> >3. Convert the result back to latin-1, and output it.
> >
> >Is that right?
> >Graham
> 
> Exactly.

The safest way is probably to read it as binary data (i.e. byte[]), then
do the conversion into UTF8, then process it, and finally convert it
back to latin-1 (in binary form) and output it.

D assumes Unicode internally; if you try to read a Latin-1 file as
char[], you may be running into some implicit UTF conversions that are
corrupting the data. Best use byte[] for reading/writing, and do
conversions to/from UTF-8 internally for processing.


T

-- 
Doubt is a self-fulfilling prophecy.


More information about the Digitalmars-d-learn mailing list