Updating D beyond Unicode 2.0

Sun Sep 23 01:01:12 UTC 2018

On Saturday, September 22, 2018 10:07:38 AM MDT Neia Neutuladh via 
Digitalmars-d wrote:
> On Saturday, 22 September 2018 at 08:52:32 UTC, Jonathan M Davis
>
> wrote:
> > Unicode identifiers may make sense in a code base that is going
> > to be used solely by a group of developers who speak a
> > particular language that uses a number a of non-ASCII
> > characters (especially languages like Chinese or Japanese), but
> > it has no business in any code that's intended for
> > international use. It just causes problems.
>
> You have a problem when you need to share a codebase between two
> organizations using different languages. "Just use ASCII" is not
> the solution. "Use a language that most developers in both
> organizations can use" is. That's *usually* going to be English,
> but not always. For instance, a Belorussian company doing
> outsourcing work for a Russian company might reasonably write
> code in Russian.
>
> If you're writing for a global audience, as most open source code
> is, you're usually going to use the most widely spoken language.

My point is that if your code base is definitely only going to be used
within a group of people who are using a keyboard that supports a Unicode
character that you want to use, then it's not necessarily a problem to use
it, but if you're writing code that may be seen or used by a general
audience (especially if it's going to be open source), then it needs to be
in ASCII, or it's a serious problem. Even if it's a character like lambda
that most everyone is going to understand, many, many programmers are not
going to be able type it on their keyboards, and that's going to cause
nothing but problems.

For better or worse, English is the international language of science and
engineering, and that includes programming. So, any programs that are
intended to be seen and used by the world at large need to be in ASCII. And
the biggest practical issue with that is whether a character is even on a
typical keyboard. Using a Unicode character in a program makes it so that
make programmers cannot type it. And even given the large breadth of Unicode
characters, you could even have a keyboard that supports a number of Unicode
characters and still not have the Unicode character in question. So, open
source programs need to be in ASCII.

Now, I don't know that it's a problem to support a wide range of Unicode
characters in identifiers when you consider the issues of folks whose native
language is not English (especially when it's a language like Chinese or
Japanese), but open source programs should only be using ASCII identifiers.
And unfortunately, sometimes, the fact that a language supports Unicode
identifiers has lead English speakers to do stupid things like use the
lambda character in identifiers. So, I can understand Walter's reticence to
go further with supporting Unicode identifiers, but on the other hand, when
you consider how many people there are on the planet who use a language that
doesn't even use the latin alphabet, it's arguably a good idea to fully
support Unicode identifiers.

- Jonathan M Davis