Updating D beyond Unicode 2.0

Mon Sep 24 13:49:46 UTC 2018

On 9/22/18 12:56 PM, Neia Neutuladh wrote:
> On Saturday, 22 September 2018 at 12:35:27 UTC, Steven Schveighoffer wrote:
>> But aren't we arguing about the wrong thing here? D already accepts 
>> non-ASCII identifiers.
> 
> Walter was doing that thing that people in the US who only speak English 
> tend to do: forgetting that other people speak other languages, and that 
> people who speak English can learn other languages to work with people 
> who don't speak English.

I don't think he was doing that. I think what he was saying was, D tried 
to accommodate users who don't normally speak English, and they still 
use English (for the most part) for coding.

I'm actually surprised there isn't much code out there that is written 
with other identifiers besides ASCII, given that C99 supported them. I 
assumed it was because they weren't supported. Now I learn that they are 
supported, yet almost all C code I've ever seen is written in English. 
Perhaps that's just because I don't frequent foreign language sites 
though :) But many people here speak English as a second language, and 
vouch for their cultures still using English to write code.

> He was saying it's inevitably a mistake to use 
> non-ASCII characters in identifiers and that nobody does use them in 
> practice.

I would expect people probably do try to use them in practice, it's just 
that the problems they run into aren't worth the effort 
(tool/environment support). But I have no first or even second hand 
experience with this. It does seem like Walter has a lot of experience 
with it though.

> Walter talking like that sounds like he'd like to remove support for 
> non-ASCII identifiers from the language. I've gotten by without 
> maintaining a set of personal patches on top of DMD so far, and I'd like 
> it if I didn't have to start.

I don't think he was saying that. I think he was against expanding 
support for further Unicode identifiers because the the first effort did 
not produce any measurable benefit. I'd be shocked from the recent 
positions of Walter and Andrei if they decided to remove non-ASCII 
identifiers that are currently supported, thereby breaking any existing 
code.

>> What languages need an upgrade to unicode symbol names? In other 
>> words, what symbols aren't possible with the current support?
> 
> Chinese and Japanese have gained about eleven thousand symbols since 
> Unicode 2.
> 
> Unicode 2 covers 25 writing systems, while Unicode 11 covers 146. Just 
> updating to Unicode 3 would give us Cherokee, Ge'ez (multiple 
> languages), Khmer (Cambodian), Mongolian, Burmese, Sinhala (Sri Lanka), 
> Thaana (Maldivian), Canadian aboriginal syllabics, and Yi (Nuosu).

Very interesting! I would agree that we should at least add support for 
unicode symbols that are used in spoken languages, especially if we 
already have support for symbols that aren't ASCII already. I don't see 
the downside, especially if you can already use Unicode 2.0 symbols for 
identifiers (the ship has already sailed).

It could be a good incentive to get kids in countries where English 
isn't commonly spoken to try D out as a first programming language ;) 
Using your native language to show example code could be a huge benefit 
for teaching coding.

My recommendation is to put the PR up for review (that you said you had 
ready) and see what happens. Having an actual patch to talk about could 
change minds. At the very least, it's worth not wasting your efforts 
that you have already spent. Even if it does need a DIP, the PR can show 
that one less piece of effort is needed to get it implemented.

-Steve