identifiers & "unialpha"

Thomas Kuehne thomas-dloop at kuehne.cn
Fri Sep 22 09:40:35 PDT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sean Kelly schrieb am 2006-09-22:
> Thomas Kuehne wrote:
>> 
>> http://www.digitalmars.com/d/lex.html#identifier
>> # Identifiers start with a letter, _, or universal alpha, and are followed
>> # by any number of letters, _, digits, or universal alphas. Universal
>> # alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the
>> # C99 Standard.)
>> 
>> Why is D referencing "ISO/IEC 9899:1999 (E) Appendix D" for defining
>> "universal alpha"? "ISO/IEC 9899:1999 (E) Appendix D" isn't listing
>> "universal alpha".
>> 
>> Sample:
>> \u00B7 (MIDDLE DOT, Other_Punctuation) isn't an "universal alpha" but
>> allowed by Appendix D in identifiers.
>> 
>> "ISO/IEC 9899:1999 (E) Appendix D" itself is referencing
>> "ISO/IEC TR 10176:1998" for the character data. I strongly suggest to
>> drop the redirection via "Appendix D" and use
>> "ISO/IEC TR 10176 (current)" instead of the dated version
>> "ISO/IEC TR 10176:1998". The 1998 version didn't yet include quite a
>> chunk of CJK and Math characters that can be found in the current version.
>
> Agreed.  Incidentally, the 2003 revision to the C++ standard ("ISO/IEC 
> 14882:2003(E)"), Appendix E, contains a revised copy of the character 
> table (which is likely from "ISO/IEC TR 10176:2003") and appears to have 
> done away with the "special characters" section entirely.  So I suspect 
> your suggestion would eliminate the problem you mention above as well?

Yes. How about this rewrite:

# Identifier:
#	IdentiferStart
#	IdentiferStart IdentifierChars
#
# IdentifierChars:
#	IdentiferChar
#	IdentiferChar IdentifierChars
#
# IdentifierStart:
#	_
#	Letter
#
# IdentifierChar:
#	IdentiferStart
#	Number
#	NonspacingMark
#
# Identifiers start with a letter, or _ and are followed 
# by any number of letters, _, or digits. Letters, Numbers and
# NonspacingMarks are those defined in ISO/IEC TR 10176.

Accessing ISO standarts can be complicated. Here are the crossreferences 
for Unicode's UnicodeData.txt. For the relation between Unicode and 
ISO10176 see 
http://en.wikipedia.org/wiki/ISO/IEC_10646#Differences_between_ISO_10646_and_Unicode

Letters:
	Uppercase_Letter (Lu)
	Lowercase_Letter (Ll)
	Titlecase_Letter (Lt)
	Modifier_Letter (Lm)
	Other_Letter (Lo)

NonspacingMarks:
	Nonspacing_Mark (Mn)

Numbers:
	Decimal_Number (Nd)
	Letter_Number (Nl)
	Other_Number (No)

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFFFB8/LK5blCcjpWoRAnMPAJsEaehF35W70k8S+BXbSSHXOeum8wCfR1UU
XeNEnZrWU8TYWSfzikQPm/8=
=n9aW
-----END PGP SIGNATURE-----



More information about the Digitalmars-d mailing list