Updating D beyond Unicode 2.0
Steven Schveighoffer
schveiguy at gmail.com
Mon Sep 24 13:49:46 UTC 2018
On 9/22/18 12:56 PM, Neia Neutuladh wrote:
> On Saturday, 22 September 2018 at 12:35:27 UTC, Steven Schveighoffer wrote:
>> But aren't we arguing about the wrong thing here? D already accepts
>> non-ASCII identifiers.
>
> Walter was doing that thing that people in the US who only speak English
> tend to do: forgetting that other people speak other languages, and that
> people who speak English can learn other languages to work with people
> who don't speak English.
I don't think he was doing that. I think what he was saying was, D tried
to accommodate users who don't normally speak English, and they still
use English (for the most part) for coding.
I'm actually surprised there isn't much code out there that is written
with other identifiers besides ASCII, given that C99 supported them. I
assumed it was because they weren't supported. Now I learn that they are
supported, yet almost all C code I've ever seen is written in English.
Perhaps that's just because I don't frequent foreign language sites
though :) But many people here speak English as a second language, and
vouch for their cultures still using English to write code.
> He was saying it's inevitably a mistake to use
> non-ASCII characters in identifiers and that nobody does use them in
> practice.
I would expect people probably do try to use them in practice, it's just
that the problems they run into aren't worth the effort
(tool/environment support). But I have no first or even second hand
experience with this. It does seem like Walter has a lot of experience
with it though.
> Walter talking like that sounds like he'd like to remove support for
> non-ASCII identifiers from the language. I've gotten by without
> maintaining a set of personal patches on top of DMD so far, and I'd like
> it if I didn't have to start.
I don't think he was saying that. I think he was against expanding
support for further Unicode identifiers because the the first effort did
not produce any measurable benefit. I'd be shocked from the recent
positions of Walter and Andrei if they decided to remove non-ASCII
identifiers that are currently supported, thereby breaking any existing
code.
>> What languages need an upgrade to unicode symbol names? In other
>> words, what symbols aren't possible with the current support?
>
> Chinese and Japanese have gained about eleven thousand symbols since
> Unicode 2.
>
> Unicode 2 covers 25 writing systems, while Unicode 11 covers 146. Just
> updating to Unicode 3 would give us Cherokee, Ge'ez (multiple
> languages), Khmer (Cambodian), Mongolian, Burmese, Sinhala (Sri Lanka),
> Thaana (Maldivian), Canadian aboriginal syllabics, and Yi (Nuosu).
Very interesting! I would agree that we should at least add support for
unicode symbols that are used in spoken languages, especially if we
already have support for symbols that aren't ASCII already. I don't see
the downside, especially if you can already use Unicode 2.0 symbols for
identifiers (the ship has already sailed).
It could be a good incentive to get kids in countries where English
isn't commonly spoken to try D out as a first programming language ;)
Using your native language to show example code could be a huge benefit
for teaching coding.
My recommendation is to put the PR up for review (that you said you had
ready) and see what happens. Having an actual patch to talk about could
change minds. At the very least, it's worth not wasting your efforts
that you have already spent. Even if it does need a DIP, the PR can show
that one less piece of effort is needed to get it implemented.
-Steve
More information about the Digitalmars-d
mailing list