Why is BOM required to use unicode in tokens?

Paul Backus snarwin at gmail.com
Tue Sep 15 02:23:31 UTC 2020


On Tuesday, 15 September 2020 at 01:49:13 UTC, James Blachly 
wrote:
> I wish to write a function including ∂x and ∂y (these are 
> trivial to type with appropriate keyboard shortcuts - alt+d on 
> Mac), but without a unicode byte order mark at the beginning of 
> the file, the lexer rejects the tokens.
>
> It is not apparently easy to insert such marks (AFAICT no 
> common tool does this specifically), while other languages work 
> fine (i.e., accept unicode in their source) without it.
>
> Is there a downside to at least presuming UTF-8?

According to the spec [1] this should Just Work. I'd recommend 
filing a bug.

[1] https://dlang.org/spec/lex.html#source_text


More information about the Digitalmars-d-learn mailing list