Why is BOM required to use unicode in tokens?

wjoe invalid at example.com
Wed Sep 16 15:01:54 UTC 2020


On Tuesday, 15 September 2020 at 01:49:13 UTC, James Blachly 
wrote:
> I wish to write a function including ∂x and ∂y (these are 
> trivial to type with appropriate keyboard shortcuts - alt+d on 
> Mac), but without a unicode byte order mark at the beginning of 
> the file, the lexer rejects the tokens.
>
> It is not apparently easy to insert such marks (AFAICT no 
> common tool does this specifically), while other languages work 
> fine (i.e., accept unicode in their source) without it.
>
> Is there a downside to at least presuming UTF-8?

As you probably already know BOM means byte order mark so it is 
only relevant for multi byte encodings (UTF-16, UTF-32). A BOM 
for UTF-8 isn't required an in fact it's discouraged.

Your editor should automatically insert a BOM if appropriate when 
you save your file. Probably you need to select the appropriate 
encoding for your file. Typically this is available in the 'Save 
as..' dialog, or the settings.



More information about the Digitalmars-d-learn mailing list