Parsing D files with non-unicode characters

Wed Nov 7 17:37:11 UTC 2018

On Wed, 07 Nov 2018 15:51:13 +0000, Jonathan Marler wrote:
> I hadn't seen that you provided a link to the file.  After I found it, I
> played with it a bit.  It looks like if you add a UTF-8 BOM in the
> beginning then DMD successfully parses it. However, from my quick scan
> of lexer.d, I didn't see anywhere in the code that actually changes how
> it decodes the file based on the the presence of the BOM.  Does anyone
> know if it does?  Is DMD supposed to allow multi-byte UTF-8 characters
> if there is no BOM?  If so, then this is a bug.

It can handle multibyte UTF8 characters without a byte order mark. Should 
be straightforward to test this:

  echo '/* ſ™🅻 */' > file.d
  dmd -c file.d

On dmd 2.081.1, the byte order mark changes nothing for me.