Why is BOM required to use unicode in tokens?

Tue Sep 15 14:59:03 UTC 2020

On 9/15/20 10:18 AM, James Blachly wrote:
> On 9/15/20 4:36 AM, Dominikus Dittes Scherkl wrote:
>> On Tuesday, 15 September 2020 at 06:49:08 UTC, Jon Degenhardt wrote:
>>> On Tuesday, 15 September 2020 at 02:23:31 UTC, Paul Backus wrote:
>>>> Identifiers start with a letter, _, or universal alpha, and are 
>>>> followed by any number of letters, _, digits, or universal alphas. 
>>>> Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D 
>>>> of the C99 Standard.
>>>
>>> I was unable to find the definition of a "universal alpha", or 
>>> whether that includes non-ascii alphabetic characters.
>>
>> ISO/IEC 9899:1999 (E)
>> Annex D
>>
>> Universal character names for identifiers
>> -----------------------------------------
> ....
>> -----------------------
>>
>> This is outdated to the brim. Also it doesn't allow for letter-like 
>> symbols (which is debatable, but especially the mathematical ones like 
>> double-struck letters are intended for such use).
>> Instead of some old C-Standard, D should better rely directly on the 
>> properties from UnicodeData.txt, which is updated with every new 
>> unicode version.
>>
> 
> Thanks to Paul, Jon, Dominikus and H.S. for thoughtful responses.
> 
> What will it take (i.e. order of difficulty) to get this fixed -- will 
> merely a bug report (and PR, not sure if I can tackle or not) do it, or 
> will this require more in-depth discussion with compiler maintainers?

I'm thinking your issue will not be fixed (just like we don't allow $abc 
to be an identifier). But the spec can be fixed to refer to the correct 
standards.

-Steve