[Issue 3455] Some Unicode characters not allowed in identifiers
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Fri Oct 30 11:40:07 PDT 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3455
--- Comment #2 from Andrei Alexandrescu <andrei at metalanguage.com> 2009-10-30 11:40:05 PDT ---
(In reply to comment #1)
> As http://www.digitalmars.com/d/1.0/lex.html#identifier very clearly states,
> the allowed characters in identifiers are those defined in the C99 standard,
> ISO/IEC 9899:1999(E) Annex D. Have a look at it:
> http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1124.pdf
>
> 9, code point 0xff19, is not in that list. The maximum one is 0xd7a3, in fact.
> This is not a bug, this is an enhancement.
>
> However, rather than an arbitrary and frozen list, I /would/ prefer basing it
> simply on Unicode properties, such as Java's choice: identifiers may start with
> letters or numeric letters, and may contain, in addition to those, connecting
> punctuation, decimal digits, and combining and non-spacing marks. In other
> words:
>
> Identifiers may start with code points from the general categories Ll, Lm, Lo,
> Lt, Lu, Nl.
>
> Identifiers may contain code points from the general categories Ll, Lm, Lo, Lt,
> Lu, Mc, Mn, Nd, Nl, No, Pc.
>
> Java also allows Cc and Cf, of whose usefulness I'm not so convinced. These are
> control characters and things like "soft hyphen", which isn't even supposed to
> be displayed unless the word line-wraps. Too much potential for confusion IMHO.
Oh ok. Thanks Matti. I'm leaving this as an enhancement request. Currently the
error message is:
invalid UTF-8 sequence
unsupported char 0x99
This is factually incorrect because the UTF-8 sequence is correct. I suggest
instead:
Unicode character 0xFF19 not allowed in a symbol
Andrei
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
More information about the Digitalmars-d-bugs
mailing list