Why UTF-8/16 character encodings?

H. S. Teoh hsteoh at quickfur.ath.cx
Mon May 27 18:15:51 PDT 2013


On Tue, May 28, 2013 at 02:54:30AM +0200, Torje Digernes wrote:
> On Tuesday, 28 May 2013 at 00:34:20 UTC, Manu wrote:
> >On 28 May 2013 09:05, Walter Bright <newshound2 at digitalmars.com>
> >wrote:
> >
> >>On 5/27/2013 3:18 PM, H. S. Teoh wrote:
> >>
> >>>Well, D *does* support non-English identifiers, y'know... for
> >>>example:
> >>>
> >>>        void main(string[] args) {
> >>>                int число = 1;
> >>>                foreach (и; 0..100)
> >>>                        число += и;
> >>>                writeln(число);
> >>>        }
> >>>
> >>>Of course, whether that's a good practice is a different
> >>>story. :)
> >>>
> >>
> >>I've recently come to the opinion that that's a bad idea, and D
> >>should not
> >>support it.
> >>
> >
> >Why? You said previously that you'd love to support extended
> >operators ;)
> 
> I find features such as support for uncommon symbols in variables a
> strength as it makes some physics formulas a bit easier to read in
> code form, which in my opinion is a good thing.

I think there's a difference between allowing math symbols (which
includes things like (a subset of) Greek letters that mathematicians
love) in identifiers, and allowing full Unicode. What if you're assigned
to maintain code containing identifiers that has letters that don't
appear in any of your installed fonts?

I think it's OK to allow math symbols, but allowing the entire set of
Unicode characters is going a bit too far, IMO. For one thing, if some
code has identifiers written in Arabic, I wouldn't be able to understand
the code, simply because I'd have a hard time telling different
identifiers apart.  Besides, if the rest of the language (keywords,
Phobos, etc.) are in English, then I don't see any compelling reason to
use a different language in identifiers, other than to submit IODCC
entries. :-P

C doesn't support Unicode identifiers, for one thing, but I've seen
working C code written by people who barely understand any English -- it
didn't stop them at all. (The comments were of course in their native
language -- the compiler ignores everything inside anyway so 8-bit
native encodings or even UTF-8 can be sneaked in without provoking
compiler errors.)


T

-- 
WINDOWS = Will Install Needless Data On Whole System -- CompuMan


More information about the Digitalmars-d mailing list