identifiers & "unialpha"
Thomas Kuehne
thomas-dloop at kuehne.cn
Fri Sep 22 09:40:35 PDT 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Sean Kelly schrieb am 2006-09-22:
> Thomas Kuehne wrote:
>>
>> http://www.digitalmars.com/d/lex.html#identifier
>> # Identifiers start with a letter, _, or universal alpha, and are followed
>> # by any number of letters, _, digits, or universal alphas. Universal
>> # alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the
>> # C99 Standard.)
>>
>> Why is D referencing "ISO/IEC 9899:1999 (E) Appendix D" for defining
>> "universal alpha"? "ISO/IEC 9899:1999 (E) Appendix D" isn't listing
>> "universal alpha".
>>
>> Sample:
>> \u00B7 (MIDDLE DOT, Other_Punctuation) isn't an "universal alpha" but
>> allowed by Appendix D in identifiers.
>>
>> "ISO/IEC 9899:1999 (E) Appendix D" itself is referencing
>> "ISO/IEC TR 10176:1998" for the character data. I strongly suggest to
>> drop the redirection via "Appendix D" and use
>> "ISO/IEC TR 10176 (current)" instead of the dated version
>> "ISO/IEC TR 10176:1998". The 1998 version didn't yet include quite a
>> chunk of CJK and Math characters that can be found in the current version.
>
> Agreed. Incidentally, the 2003 revision to the C++ standard ("ISO/IEC
> 14882:2003(E)"), Appendix E, contains a revised copy of the character
> table (which is likely from "ISO/IEC TR 10176:2003") and appears to have
> done away with the "special characters" section entirely. So I suspect
> your suggestion would eliminate the problem you mention above as well?
Yes. How about this rewrite:
# Identifier:
# IdentiferStart
# IdentiferStart IdentifierChars
#
# IdentifierChars:
# IdentiferChar
# IdentiferChar IdentifierChars
#
# IdentifierStart:
# _
# Letter
#
# IdentifierChar:
# IdentiferStart
# Number
# NonspacingMark
#
# Identifiers start with a letter, or _ and are followed
# by any number of letters, _, or digits. Letters, Numbers and
# NonspacingMarks are those defined in ISO/IEC TR 10176.
Accessing ISO standarts can be complicated. Here are the crossreferences
for Unicode's UnicodeData.txt. For the relation between Unicode and
ISO10176 see
http://en.wikipedia.org/wiki/ISO/IEC_10646#Differences_between_ISO_10646_and_Unicode
Letters:
Uppercase_Letter (Lu)
Lowercase_Letter (Ll)
Titlecase_Letter (Lt)
Modifier_Letter (Lm)
Other_Letter (Lo)
NonspacingMarks:
Nonspacing_Mark (Mn)
Numbers:
Decimal_Number (Nd)
Letter_Number (Nl)
Other_Number (No)
Thomas
-----BEGIN PGP SIGNATURE-----
iD8DBQFFFB8/LK5blCcjpWoRAnMPAJsEaehF35W70k8S+BXbSSHXOeum8wCfR1UU
XeNEnZrWU8TYWSfzikQPm/8=
=n9aW
-----END PGP SIGNATURE-----
More information about the Digitalmars-d
mailing list