Why is BOM required to use unicode in tokens?
James Blachly
james.blachly at gmail.com
Tue Sep 15 14:18:20 UTC 2020
On 9/15/20 4:36 AM, Dominikus Dittes Scherkl wrote:
> On Tuesday, 15 September 2020 at 06:49:08 UTC, Jon Degenhardt wrote:
>> On Tuesday, 15 September 2020 at 02:23:31 UTC, Paul Backus wrote:
>>> Identifiers start with a letter, _, or universal alpha, and are
>>> followed by any number of letters, _, digits, or universal alphas.
>>> Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D of
>>> the C99 Standard.
>>
>> I was unable to find the definition of a "universal alpha", or whether
>> that includes non-ascii alphabetic characters.
>
> ISO/IEC 9899:1999 (E)
> Annex D
>
> Universal character names for identifiers
> -----------------------------------------
...
> -----------------------
>
> This is outdated to the brim. Also it doesn't allow for letter-like
> symbols (which is debatable, but especially the mathematical ones like
> double-struck letters are intended for such use).
> Instead of some old C-Standard, D should better rely directly on the
> properties from UnicodeData.txt, which is updated with every new unicode
> version.
>
Thanks to Paul, Jon, Dominikus and H.S. for thoughtful responses.
What will it take (i.e. order of difficulty) to get this fixed -- will
merely a bug report (and PR, not sure if I can tackle or not) do it, or
will this require more in-depth discussion with compiler maintainers?
James
More information about the Digitalmars-d-learn
mailing list