[Issue 15949] New: Improve readtext handling of byte order mark (BOM)
via Digitalmars-d-bugs
digitalmars-d-bugs at puremagic.com
Thu Apr 21 13:02:45 PDT 2016
https://issues.dlang.org/show_bug.cgi?id=15949
Issue ID: 15949
Summary: Improve readtext handling of byte order mark (BOM)
Product: D
Version: D2
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P1
Component: phobos
Assignee: nobody at puremagic.com
Reporter: Jesse.K.Phillips+D at gmail.com
Problem:
I've hit this many times in Windows. I try to read in a file with
std.file.readText and get: "Syntax error at line 0"
This is because some Microsoft program has decided to insert a UTF-8 Byte Order
Mark (BOM) into the beginning of the file (0xEF 0xBB 0xBF). But readText really
shouldn't automatically convert a file's content based on the BOM specified.
Suggested fix:
I think readText should validate and skip the BOM. It should check that the BOM
is not UTF-16LE (0xFF 0xFE), UTF-16BE (0xFE 0xFF), UTF-32LE (FF FE 00 00),
UTF-32BE (0x00 0x00 0xFE 0xFF), if it is one of those then it should throw an
exception that the file being read is one of those encoding and will not be
converted to UTF-8 string.
The corresponding std.file.readText!wstring and std.file.readText!dstring
should perform equivalent validation. If it is no cost to change the byte order
then that should be done.
1. https://en.wikipedia.org/wiki/Byte_order_mark
--
More information about the Digitalmars-d-bugs
mailing list