Files and UTF

Mike Surette mjsurette at gmail.com
Wed Aug 5 17:39:36 UTC 2020


In my efforts to learn D I am writing some code to read files in 
different UTF encodings with the aim of having them end up as 
UTF-8 internally. As a start I have the following code:

import std.stdio;
import std.file;

void main(string[] args)
{
     if (args.length == 2)
     {
         if (args[1].exists && args[1].isFile)
         {
             auto f = File(args[1]);
             writeln(args[1]);

             for (auto i = 1; i <= 3; ++i)
                 write(f.readln);
         }
     }
}

It works well outputting the file name and first three lines of 
the file properly, without any regard to the encoding of the 
file. The exception to this is if the file is UTF-16, with both 
LE and BE encodings, two characters representing the BOM are 
printed.

I assume that write detects the encoding of the string returned 
by readln and prints it correctly rather than readln reading in 
as a consistent encoding. Is this correct?

Is there a way to remove the BOM from the input buffer and still 
know the encoding of the file?

Is there a D idiomatic way to do what I want to do?

Mike


More information about the Digitalmars-d-learn mailing list