Why is BOM required to use unicode in tokens?
Steven Schveighoffer
schveiguy at gmail.com
Tue Sep 15 14:59:03 UTC 2020
On 9/15/20 10:18 AM, James Blachly wrote:
> On 9/15/20 4:36 AM, Dominikus Dittes Scherkl wrote:
>> On Tuesday, 15 September 2020 at 06:49:08 UTC, Jon Degenhardt wrote:
>>> On Tuesday, 15 September 2020 at 02:23:31 UTC, Paul Backus wrote:
>>>> Identifiers start with a letter, _, or universal alpha, and are
>>>> followed by any number of letters, _, digits, or universal alphas.
>>>> Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D
>>>> of the C99 Standard.
>>>
>>> I was unable to find the definition of a "universal alpha", or
>>> whether that includes non-ascii alphabetic characters.
>>
>> ISO/IEC 9899:1999 (E)
>> Annex D
>>
>> Universal character names for identifiers
>> -----------------------------------------
> ....
>> -----------------------
>>
>> This is outdated to the brim. Also it doesn't allow for letter-like
>> symbols (which is debatable, but especially the mathematical ones like
>> double-struck letters are intended for such use).
>> Instead of some old C-Standard, D should better rely directly on the
>> properties from UnicodeData.txt, which is updated with every new
>> unicode version.
>>
>
> Thanks to Paul, Jon, Dominikus and H.S. for thoughtful responses.
>
> What will it take (i.e. order of difficulty) to get this fixed -- will
> merely a bug report (and PR, not sure if I can tackle or not) do it, or
> will this require more in-depth discussion with compiler maintainers?
I'm thinking your issue will not be fixed (just like we don't allow $abc
to be an identifier). But the spec can be fixed to refer to the correct
standards.
-Steve
More information about the Digitalmars-d-learn
mailing list