Parsing D files with non-unicode characters
Neia Neutuladh
neia at ikeran.org
Wed Nov 7 17:37:11 UTC 2018
On Wed, 07 Nov 2018 15:51:13 +0000, Jonathan Marler wrote:
> I hadn't seen that you provided a link to the file. After I found it, I
> played with it a bit. It looks like if you add a UTF-8 BOM in the
> beginning then DMD successfully parses it. However, from my quick scan
> of lexer.d, I didn't see anywhere in the code that actually changes how
> it decodes the file based on the the presence of the BOM. Does anyone
> know if it does? Is DMD supposed to allow multi-byte UTF-8 characters
> if there is no BOM? If so, then this is a bug.
It can handle multibyte UTF8 characters without a byte order mark. Should
be straightforward to test this:
echo '/* ſ™🅻 */' > file.d
dmd -c file.d
On dmd 2.081.1, the byte order mark changes nothing for me.
More information about the Digitalmars-d
mailing list