Files and UTF
Mike Surette
mjsurette at gmail.com
Wed Aug 5 17:39:36 UTC 2020
In my efforts to learn D I am writing some code to read files in
different UTF encodings with the aim of having them end up as
UTF-8 internally. As a start I have the following code:
import std.stdio;
import std.file;
void main(string[] args)
{
if (args.length == 2)
{
if (args[1].exists && args[1].isFile)
{
auto f = File(args[1]);
writeln(args[1]);
for (auto i = 1; i <= 3; ++i)
write(f.readln);
}
}
}
It works well outputting the file name and first three lines of
the file properly, without any regard to the encoding of the
file. The exception to this is if the file is UTF-16, with both
LE and BE encodings, two characters representing the BOM are
printed.
I assume that write detects the encoding of the string returned
by readln and prints it correctly rather than readln reading in
as a consistent encoding. Is this correct?
Is there a way to remove the BOM from the input buffer and still
know the encoding of the file?
Is there a D idiomatic way to do what I want to do?
Mike
More information about the Digitalmars-d-learn
mailing list