Reading ASCII file with some codes above 127 (exten ascii)

Paul phshaffer at gmail.com
Wed May 23 11:43:03 PDT 2012


On Wednesday, 23 May 2012 at 18:04:56 UTC, Graham Fawcett wrote:
> On Wednesday, 23 May 2012 at 15:48:20 UTC, Paul wrote:
>> On Monday, 14 May 2012 at 12:58:20 UTC, Graham Fawcett wrote:
>>> On Sunday, 13 May 2012 at 21:03:45 UTC, Paul wrote:
>>>> I am reading a file that has a few extended ASCII codes 
>>>> (e.g. degree symdol). Depending on how I read the file in 
>>>> and what I do with it the error shows up at different 
>>>> points.  I'm pretty sure it all boils down to the these 
>>>> extended ascii codes.
>>>>
>>>> Can I just tell dmd that I'm reading a Latin1 or ISO 8859-1 
>>>> file?
>>>> I've messed with the std.encoding module but really can't 
>>>> figure out what I need to do.
>>>>
>>>> There must be a simple solution to this.
>>>
>>> This seems to work:
>>>
>>>
>>> import std.stdio, std.file, std.encoding;
>>>
>>> void main()
>>> {
>>>   auto latin = cast(Latin1String) read("/tmp/hi.8859");
>>>   string s;
>>>   transcode(latin, s);
>>>   writeln(s);
>>> }
>>>
>>>
>>> Graham
>>
>> I thought I was in good shape with your above suggestion.  I 
>> does help me read and process text.  But when I go to print it 
>> out I have problems.
>>
>> Here is my input file:
>> °F
>>
>> Here is my code:
>> import std.stdio;
>> import std.string;
>> import std.file;
>> import std.encoding;
>>
>> // Main function
>> void main(){
>>    auto fout = File("out.txt","w");
>>    auto latinS = cast(Latin1String) read("in.txt");
>>    string uniS;
>>    transcode(latinS, uniS);
>>    foreach(line; uniS.splitLines()){
>>       transcode(line, latinS);
>>       fout.writeln(line);
>>       fout.writeln(latinS);
>>    }
>> }
>>
>> Here is the output:
>> °F
>> [cast(immutable(Latin1Char))176, cast(immutable(Latin1Char))70]
>>
>> If I print the Unicode string I get an extra weird character.  
>> If I print the Unicode string retranslated to Latin1, it get 
>> weird pseudo-code.
>> Can you help?
>
> I tried the program and it seemed to work for me.
>
> What program are you using to read "out.txt"? Are you sure it 
> supports UTF-8, and knows to open the file as UTF-8? (This 
> looks suspiciously like a tool's attempt to misinterpret a 
> UTF-8 string as Latin-1.)
>
> If you're on a Unix system, what does "file in.txt out.txt" 
> report?
>
> Graham

Hmmm.  I'm not communicating well.
I want to read and write ASCII.  The only reason I'm converting 
to Unicode is because D needs it (as I understand).

Yes if I open °F in notepad++ and tell notepad++ that it is 
UTF-8, it shows °F.

I want to:
1) Read an ascii file that may have codes above 127.
2) Convert to unicode so D funcs like .splitLines() can work with 
it.
3) Convert back to ascii so that stuff like °F writes out as it 
was read in.

If I open in.txt and out.txt in an ascii editor, °F should look 
the same in both files with the editor encoding the files as 
ANSI/ASCII.  I thought my program was doing just that.
Thanks for your assistance.


More information about the Digitalmars-d-learn mailing list