identifiers & "unialpha"

Fri Sep 22 14:53:35 PDT 2006

Thomas Kuehne wrote:
> Walter Bright schrieb am 2006-09-22:
>> What is CJK?
> 
> CJK: Chinese, Japanese & Korean
> 0x20000 .. 0x2A6D6 CJK Ideograph Extension B
> 0x2F800 .. 0x2FA1D CJK COMPATIBILITY IDEOGRAPHS

Thank-you.

>> As it is now, it matches standard C's definition of identifiers, which 
>> is the intent of the reference. I haven't checked, but I think it 
>> matches Java's idea of an identifier character, too.
> 
> ISO/IEC 9899:1999 (E) Appendix D
> # 1) This clause lists the hexadecimal code values that are valid in
> # universal character names in identiﬁers.
> 
> Whereas Appendix D defines valid characters in identifiers, D uses it
> as a source for "universal alpha". As a consequence std.uni.isUniAlpha
> claims that \u00B7 (MIDDLE DOT) is a letter...

I guess I don't see why C99 would say . is a valid identifier character 
if it isn't an alpha. It's all confusing to me, and I think needlessly 
complicated. Is \u00B7 the only difference?

> 
>> P.S. It also bugs me that the unicode people can't seem to make up their 
>> minds. Do character sets really need to change every 2 or 3 years?
> 
> Task at hand: Create a table of all characters used by humans all over
> the world and minimize friction due to political issues
> (e.g. characters' names). Except for bug fixes (typos...) the unicode people
> usually only extend previous versions of the standard.

Chinese, Japanese, and Korean are hardly obscure so I don't see why the 
character sets for them seem to need large numbers of additions this 
late in the game.