Updating D beyond Unicode 2.0

Sun Sep 23 19:06:26 UTC 2018

On Saturday, 22 September 2018 at 08:52:32 UTC, Jonathan M Davis 
wrote:

>
> Honestly, I was horrified to find out that emojis were even in 
> Unicode. It makes no sense whatsover. Emojis are supposed to be 
> sequences of characters that can be interepreted as images. 
> Treating them like Unicode symbols is like treating entire 
> words like Unicode symbols. It's just plain stupid and a clear 
> sign that Unicode has gone completely off the rails (if it was 
> ever on them). Unfortunately, it's the best tool that we have 
> for the job.

According to the Unicode website, 
http://unicode.org/standard/WhatIsUnicode.html,

"""
Support of Unicode forms the foundation for the representation of 
languages and symbols in all major operating systems, search 
engines, browsers, laptops, and smart phones—plus the Internet 
and World Wide Web (URLs, HTML, XML, CSS, JSON, etc.)"""

Note, unicode supports symbols, not just characters.

The smiley face symbol predates its ':-)' usage in ascii text, 
https://www.smithsonianmag.com/arts-culture/who-really-invented-the-smiley-face-2058483/. It's fundamentally a symbol, not a sequence of characters. Therefore it is not unreasonable for it to be encoded with a unicode number. I do agree though, of course, that it would seem bizarre to use an emoji as a D identifier.

The early history of computer science is completely dominated by 
cultures who use latin script based characters, and hence, quiet 
reasonably, text encoding and its automated visual representation 
by compute based devices is dominated by the requirements of 
latin script languages. However, the world keeps turning and, 
despite DT's best efforts, China et al. look to become dominant. 
Even if not China, the chances are that eventually a non-latin 
script based language will become very important. Parochial views 
like "all open source code should be in ASCII" will look silly.

However, until that time D developers have to spend their time 
where it can be most useful. Hence the condition of whether to 
apply Neia's patch / ideas or not mainly depends on how much 
effort the donwstream effort will be (debuggers etc. as Walter 
pointed out), and how much the gain is. As unicode 2.0 is already 
supported I would take a guess that the vast majority of people 
with access to a computer can already enter identifiers in D that 
are rich enough for them. As Adam said though, it would be a good 
idea to at least ask!