identifiers & "unialpha"
Walter Bright
newshound at digitalmars.com
Fri Sep 22 14:53:35 PDT 2006
Thomas Kuehne wrote:
> Walter Bright schrieb am 2006-09-22:
>> What is CJK?
>
> CJK: Chinese, Japanese & Korean
> 0x20000 .. 0x2A6D6 CJK Ideograph Extension B
> 0x2F800 .. 0x2FA1D CJK COMPATIBILITY IDEOGRAPHS
Thank-you.
>> As it is now, it matches standard C's definition of identifiers, which
>> is the intent of the reference. I haven't checked, but I think it
>> matches Java's idea of an identifier character, too.
>
> ISO/IEC 9899:1999 (E) Appendix D
> # 1) This clause lists the hexadecimal code values that are valid in
> # universal character names in identifiers.
>
> Whereas Appendix D defines valid characters in identifiers, D uses it
> as a source for "universal alpha". As a consequence std.uni.isUniAlpha
> claims that \u00B7 (MIDDLE DOT) is a letter...
I guess I don't see why C99 would say . is a valid identifier character
if it isn't an alpha. It's all confusing to me, and I think needlessly
complicated. Is \u00B7 the only difference?
>
>> P.S. It also bugs me that the unicode people can't seem to make up their
>> minds. Do character sets really need to change every 2 or 3 years?
>
> Task at hand: Create a table of all characters used by humans all over
> the world and minimize friction due to political issues
> (e.g. characters' names). Except for bug fixes (typos...) the unicode people
> usually only extend previous versions of the standard.
Chinese, Japanese, and Korean are hardly obscure so I don't see why the
character sets for them seem to need large numbers of additions this
late in the game.
More information about the Digitalmars-d
mailing list