Reading ASCII file with some codes above 127 (exten ascii)
Regan Heath
regan at netmail.co.nz
Fri May 25 02:05:29 PDT 2012
On Wed, 23 May 2012 22:02:25 +0100, Paul <phshaffer at gmail.com> wrote:
>> This works, though it's ugly:
>>
>>
>> foreach(line; uniS.splitLines()) {
>> transcode(line, latinS);
>> fout.writeln((cast(char[]) latinS));
>> }
>>
>> The Latin1String type, at the storage level, is a ubyte[]. By casting
>> to char[], you can get a similar-to-string thing that writeln() can
>> handle.
>>
>> Graham
>
> Awesome! What a lesson! Thannk you!
>
> So if anyone is following this thread heres my code now. This reads a
> text file(encoded in Latin1 which is basic ascii with extended ascii
> codes), allows D to work with it in unicode, and then spits it back out
> as Latin1.
>
> I wonder about the speed between this method and Era's home-spun
> solution?
>
> import std.stdio;
> import std.string;
> import std.file;
> import std.encoding;
>
> // Main function
> void main(){
> auto fout = File("out.txt","w");
> auto latinS = cast(Latin1String) read("in.txt");
> string uniS;
> transcode(latinS, uniS);
> foreach(line; uniS.splitLines()){
> transcode(line, latinS);
> fout.writeln((cast(char[]) latinS));
> }
> }
The only thing which would worry me about this code is the cast(char[]) in
the final writeln.. I know some parts of phobos verify the char data is
correct UTF-8 and this line casts latin-1 to char[] which can potentially
create invalid UTF-8 data. That said, I had a really quick look at the
phobos code for File.writeln and I'm not sure whether this function does
any UTF-8 validation. I would be happier if the latin-1 was written as a
stream of bytes with no assumed interpretation, IMO.
R
--
Using Opera's revolutionary email client: http://www.opera.com/mail/
More information about the Digitalmars-d-learn
mailing list