Why UTF-8/16 character encodings?

Wed May 29 03:10:52 PDT 2013

On Tuesday, 28 May 2013 at 01:17:37 UTC, H. S. Teoh wrote:
> On Tue, May 28, 2013 at 02:54:30AM +0200, Torje Digernes wrote:
>> On Tuesday, 28 May 2013 at 00:34:20 UTC, Manu wrote:
>> >On 28 May 2013 09:05, Walter Bright 
>> ><newshound2 at digitalmars.com>
>> >wrote:
>> >
>> >>On 5/27/2013 3:18 PM, H. S. Teoh wrote:
>> >>
>> >>>Well, D *does* support non-English identifiers, y'know... 
>> >>>for
>> >>>example:
>> >>>
>> >>>        void main(string[] args) {
>> >>>                int число = 1;
>> >>>                foreach (и; 0..100)
>> >>>                        число += и;
>> >>>                writeln(число);
>> >>>        }
>> >>>
>> >>>Of course, whether that's a good practice is a different
>> >>>story. :)
>> >>>
>> >>
>> >>I've recently come to the opinion that that's a bad idea, 
>> >>and D
>> >>should not
>> >>support it.
>> >>
>> >
>> >Why? You said previously that you'd love to support extended
>> >operators ;)
>> 
>> I find features such as support for uncommon symbols in 
>> variables a
>> strength as it makes some physics formulas a bit easier to 
>> read in
>> code form, which in my opinion is a good thing.
>
> I think there's a difference between allowing math symbols 
> (which
> includes things like (a subset of) Greek letters that 
> mathematicians
> love) in identifiers, and allowing full Unicode. What if you're 
> assigned
> to maintain code containing identifiers that has letters that 
> don't
> appear in any of your installed fonts?
>
> I think it's OK to allow math symbols, but allowing the entire 
> set of
> Unicode characters is going a bit too far, IMO. For one thing, 
> if some
> code has identifiers written in Arabic, I wouldn't be able to 
> understand
> the code, simply because I'd have a hard time telling different
> identifiers apart.  Besides, if the rest of the language 
> (keywords,
> Phobos, etc.) are in English, then I don't see any compelling 
> reason to
> use a different language in identifiers, other than to submit 
> IODCC
> entries. :-P
>
> C doesn't support Unicode identifiers, for one thing, but I've 
> seen
> working C code written by people who barely understand any 
> English -- it
> didn't stop them at all. (The comments were of course in their 
> native
> language -- the compiler ignores everything inside anyway so 
> 8-bit
> native encodings or even UTF-8 can be sneaked in without 
> provoking
> compiler errors.)
>
>
> T
I think there is very little difference, both cases are 
artificially limiting the allowable symbols. Other symbols 
relevant in other fields which does not happen to use Greek 
symbols primarily, are they to be treated differently?

What you propose is a built in code standard for D, based on your 
feelings on a topic.

If what you fear is that unicode will suddenly make cooperation 
impossible I doubt you are right, after all there is all kind of 
ways to make terrible variable names (q,w,e,r ... qq,qw). If any 
such identifiers show up in a project I assume they are cleaned 
up, why wouldn't the same happen to unicode if they are causing 
problems? Think about it, it should happen even faster because 
the symbol might not be accessible for everyone, where a 
single/double letter gibberish one is perfectly reproducible and 
might grow into the project confusing every new reader. Are you 
going to argue for disallowing variables that are not a compound 
word or a dictionary word in English?