identifiers & "unialpha"
Sean Kelly
sean at f4.ca
Fri Sep 22 15:38:31 PDT 2006
Walter Bright wrote:
> Thomas Kuehne wrote:
>>
>> ISO/IEC 9899:1999 (E) Appendix D
>> # 1) This clause lists the hexadecimal code values that are valid in
>> # universal character names in identifiers.
>>
>> Whereas Appendix D defines valid characters in identifiers, D uses it
>> as a source for "universal alpha". As a consequence std.uni.isUniAlpha
>> claims that \u00B7 (MIDDLE DOT) is a letter...
>
> I guess I don't see why C99 would say . is a valid identifier character
> if it isn't an alpha. It's all confusing to me, and I think needlessly
> complicated. Is \u00B7 the only difference?
No, there are other differences as well. I think C99 was simply
referring to the latest version of the document available in 1999, and
it has since been revised (in 2003, apparently). But I have no idea why
characters present in the 1999 doc are not present in the 2003 doc. To
pass the buck even further, "ISO/IEC TR 10176:2003" Annex A says the
following:
This list comprises the letters (combining or not), syllables, and
ideographs from ISO/IEC 10646-1, together with the modifier letters
and marks conventionally used as parts of words.
So their list of characters is copied from the Unicode standard (ISO/IEC
10646). I can only conclude that the Unicode standard changed between
1999-2003 and ISO/IEC 10176 simply incorporated the new list. But who
knows why the list was changed.
This does raise an interesting point however. Since the C and C++
standards separately refer to SO/IEC 10176 for their character list, the
identifiers a compliant C99 and C++2003 compiler should accept are
different. This seems contrary to the usual C++ practice of deferring
to the C standard on semantic issues.
>>> P.S. It also bugs me that the unicode people can't seem to make up
>>> their minds. Do character sets really need to change every 2 or 3 years?
>>
>> Task at hand: Create a table of all characters used by humans all over
>> the world and minimize friction due to political issues
>> (e.g. characters' names). Except for bug fixes (typos...) the unicode
>> people
>> usually only extend previous versions of the standard.
>
> Chinese, Japanese, and Korean are hardly obscure so I don't see why the
> character sets for them seem to need large numbers of additions this
> late in the game.
Me either. But then I'm not terribly inclined to read the Unicode
standards committee minutes to find out either :-)
Sean
More information about the Digitalmars-d
mailing list