identifiers & "unialpha"

Fri Sep 22 11:55:36 PDT 2006

Thomas Kuehne wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> http://www.digitalmars.com/d/lex.html#identifier
> # Identifiers start with a letter, _, or universal alpha, and are followed
> # by any number of letters, _, digits, or universal alphas. Universal
> # alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the
> # C99 Standard.)
> 
> Why is D referencing "ISO/IEC 9899:1999 (E) Appendix D" for defining
> "universal alpha"? "ISO/IEC 9899:1999 (E) Appendix D" isn't listing
> "universal alpha".
> 
> Sample:
> \u00B7 (MIDDLE DOT, Other_Punctuation) isn't an "universal alpha" but
> allowed by Appendix D in identifiers.
> 
> "ISO/IEC 9899:1999 (E) Appendix D" itself is referencing
> "ISO/IEC TR 10176:1998" for the character data. I strongly suggest to
> drop the redirection via "Appendix D" and use
> "ISO/IEC TR 10176 (current)" instead of the dated version
> "ISO/IEC TR 10176:1998". The 1998 version didn't yet include quite a
> chunk of CJK and Math characters that can be found in the current version.

I'd like to leave things as they are for 1.0. I don't think that 
anyone's code will be adversely affected by not having the latest alpha 
character additions to identifiers, and I also don't think math 
characters should be part of identifiers. What is CJK?

As it is now, it matches standard C's definition of identifiers, which 
is the intent of the reference. I haven't checked, but I think it 
matches Java's idea of an identifier character, too.

P.S. It also bugs me that the unicode people can't seem to make up their 
minds. Do character sets really need to change every 2 or 3 years?