Reading ASCII file with some codes above 127 (exten ascii)

Fri May 25 02:05:29 PDT 2012

On Wed, 23 May 2012 22:02:25 +0100, Paul <phshaffer at gmail.com> wrote:
>> This works, though it's ugly:
>>
>>
>>     foreach(line; uniS.splitLines()) {
>>        transcode(line, latinS);
>>        fout.writeln((cast(char[]) latinS));
>>     }
>>
>> The Latin1String type, at the storage level, is a ubyte[]. By casting  
>> to char[], you can get a similar-to-string thing that writeln() can  
>> handle.
>>
>> Graham
>
> Awesome!  What a lesson! Thannk you!
>
> So if anyone is following this thread heres my code now.  This reads a  
> text file(encoded in Latin1 which is basic ascii with extended ascii  
> codes), allows D to work with it in unicode, and then spits it back out  
> as Latin1.
>
> I wonder about the speed between this method and Era's home-spun  
> solution?
>
> import std.stdio;
> import std.string;
> import std.file;
> import std.encoding;
>
> // Main function
> void main(){
>      auto fout = File("out.txt","w");
>      auto latinS = cast(Latin1String) read("in.txt");
>      string uniS;
>      transcode(latinS, uniS);
>      foreach(line; uniS.splitLines()){
>         transcode(line, latinS);
>         fout.writeln((cast(char[]) latinS));
>      }
> }

The only thing which would worry me about this code is the cast(char[]) in  
the final writeln.. I know some parts of phobos verify the char data is  
correct UTF-8 and this line casts latin-1 to char[] which can potentially  
create invalid UTF-8 data.  That said, I had a really quick look at the  
phobos code for File.writeln and I'm not sure whether this function does  
any UTF-8 validation.  I would be happier if the latin-1 was written as a  
stream of bytes with no assumed interpretation, IMO.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/