Reading ASCII file with some codes above 127 (exten ascii)

Graham Fawcett fawcett at uwindsor.ca
Wed May 23 11:04:55 PDT 2012


On Wednesday, 23 May 2012 at 15:48:20 UTC, Paul wrote:
> On Monday, 14 May 2012 at 12:58:20 UTC, Graham Fawcett wrote:
>> On Sunday, 13 May 2012 at 21:03:45 UTC, Paul wrote:
>>> I am reading a file that has a few extended ASCII codes (e.g. 
>>> degree symdol). Depending on how I read the file in and what 
>>> I do with it the error shows up at different points.  I'm 
>>> pretty sure it all boils down to the these extended ascii 
>>> codes.
>>>
>>> Can I just tell dmd that I'm reading a Latin1 or ISO 8859-1 
>>> file?
>>> I've messed with the std.encoding module but really can't 
>>> figure out what I need to do.
>>>
>>> There must be a simple solution to this.
>>
>> This seems to work:
>>
>>
>> import std.stdio, std.file, std.encoding;
>>
>> void main()
>> {
>>    auto latin = cast(Latin1String) read("/tmp/hi.8859");
>>    string s;
>>    transcode(latin, s);
>>    writeln(s);
>> }
>>
>>
>> Graham
>
> I thought I was in good shape with your above suggestion.  I 
> does help me read and process text.  But when I go to print it 
> out I have problems.
>
> Here is my input file:
> °F
>
> Here is my code:
> import std.stdio;
> import std.string;
> import std.file;
> import std.encoding;
>
> // Main function
> void main(){
>     auto fout = File("out.txt","w");
>     auto latinS = cast(Latin1String) read("in.txt");
>     string uniS;
>     transcode(latinS, uniS);
>     foreach(line; uniS.splitLines()){
>        transcode(line, latinS);
>        fout.writeln(line);
>        fout.writeln(latinS);
>     }
> }
>
> Here is the output:
> °F
> [cast(immutable(Latin1Char))176, cast(immutable(Latin1Char))70]
>
> If I print the Unicode string I get an extra weird character.  
> If I print the Unicode string retranslated to Latin1, it get 
> weird pseudo-code.
> Can you help?

I tried the program and it seemed to work for me.

What program are you using to read "out.txt"? Are you sure it 
supports UTF-8, and knows to open the file as UTF-8? (This looks 
suspiciously like a tool's attempt to misinterpret a UTF-8 string 
as Latin-1.)

If you're on a Unix system, what does "file in.txt out.txt" 
report?

Graham



More information about the Digitalmars-d-learn mailing list