UTF-8 requested. BOM is for UTF-16
Jonathan M Davis
newsgroup.d at jmdavisprog.com
Mon Sep 16 19:09:58 UTC 2019
On Sunday, September 15, 2019 5:50:27 PM MDT solidstate1991 via Digitalmars-
d wrote:
> This is what I get when I try to run a unittest on one of my
> projects.
>
> Here's the file that generates this error:
> https://github.com/ZILtoid1991/pixelperfectengine/blob/master/pixelperfect
> engine/src/PixelPerfectEngine/graphics/extensions.d
Well, without spending a fair bit of time digging into it, I can't say for
sure what's going on, but the file reading stuff in Phobos doesn't tend to
do much with BOMs. Rather, it tends to assume the encoding based on the
type. So, for instance, readText expects the file to be in UTF-8 if it's
told to provide an array of char, and it expects the file to be in UTF-16
with the native encoding of the machine (so, usually UTF-16LE) if it's told
to provide an array of wchar (and of course, UTF-32 for if it's told an
array of dchar). As such, if it's told to read UTF-8, and it finds that the
file is UTF-16, then you're going to get a UTFException, because the data is
invalid UTF-8.
I don't know for sure that that's what's happening here, because there
doesn't appear to be a call to read or readText in that module, and I don't
have time right now to go digging to see what it's actually doing. But odds
are that whatever is reading in the file is going to have to either cycle
through the possible UTF encodings until it find one that doesn't throw (and
then convert that to UTF-8 if that's what's desired), or it's going to have
to look to see whether there's a BOM, and then read the file in based on the
BOM (or lack thereof). As things stand, Phobos does a good job when the
encoding is known ahead of time, but it's far more annoying to use when it
isn't.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list