OT (Was: Re: is(ElementType!(char[2]) == dchar - why?)

Tue Dec 11 22:08:33 UTC 2018

On Tuesday, December 11, 2018 2:11:49 PM MST H. S. Teoh via Digitalmars-d-
learn wrote:
> On Tue, Dec 11, 2018 at 09:02:41PM +0000, bauss via Digitalmars-d-learn 
wrote:
> > On Tuesday, 11 December 2018 at 18:10:48 UTC, H. S. Teoh wrote:
> [...]
>
> > > Autodecoding raises its ugly head again. :-/
>
> [...]
>
> > Has it ever had anything else?
>
> LOL... well, we (or some of us) were deceived by its pretty tail for a
> while, until we realized that it was just a façade, and Unicode really
> didn't work the way we thought it did.

Yeah. Auto-decoding came about, because Andrei misunderstood Unicode and
thought that code points were complete characters (likely because the
Unicode standard weirdly likes to refer to them as characters), and he
didn't know about graphemes. At the time, many of us were just as clueless
as he was (in many cases, more so), and auto-decoding made sense. You
supposedly got full correctness by default and could work around it for
increased performance when you needed to (and the standard library did that
for you where it mattered, reducing how much you had to care). Walter knew
better, but he wasn't involved enough with Phobos development to catch on
until it was too late. It's only later when more folks involved came to a
fuller understanding of Unicode that auto-decoding started to be panned.

For instance, I very much doubt that you would find much from the D
community talking about how horrible auto-decoding is back in 2010, whereas
you probably could find plenty by 2015, and every time it comes up now,
folks complain about it. Previously, folks would get annoyed about the
restrictions, but the restrictions made sense with the understanding that
code points were the actual characters, and you didn't want code to be
chopping them up. But once it became more widely understood that code points
were also potentially pieces of characters, you no longer had the same
defense against the annoyances caused by how narrow strings are treated, and
so it just became annoying. We went from newcomers getting annoyed, but
those who understood the reasons behind auto-decoding being fine with it
(because it supposedly made their code correct and prevented bugs) to almost
everyone involved being annoyed about it. The newcomers who don't understand
it still get annoyed by it, but instead of the ones who do understand it
telling them about how it's helping keep Unicode handling correct, the folks
who understand what's going now tell everyone how terrible auto-decoding is.

So, the narrative that auto-decoding is terrible has now become the status
quo, whereas before, it was actually considered to be a good thing by the D
community at large, because it supposedly ensured Unicode correctness. It
was still annoying, but that was because Unicode is annoying. Now, Unicode
is still annoying, but auto-decoding is understood to make it even more so
without actually helping.

The one bright side out of all of this that makes it so that I don't think
that auto-decoding is entirely bad is that it shoves the issue in everyone's
faces so that everyone is forced to learn at least the basics about Unicode,
whereas if we didn't have it, many folks would likely just treat char as a
complete character and merrily write code that can't handle Unicode, since
that's what usually happens in most programs in most languages (some
languages do use the equivalent of wchar for their char, but most code still
treats their char type as if it were a complete character). The fact that we
have char, wchar, and dchar _does_ help raise the issue on its own, but
auto-decoding makes it very hard to ignore. Now, that doesn't mean that I
think that we should have auto-decoding (ideally, we'd figure out how to
remove it), but the issues that it's caused have resulted in a lot of
developers becoming much more knowledgeable about Unicode and therefore more
likely to write code that handles Unicode correctly.

- Jonathan M Davis