First Impressions!

Jonathan M Davis newsgroup.d at jmdavisprog.com
Thu Nov 30 17:56:58 UTC 2017


On Thursday, November 30, 2017 03:37:37 Walter Bright via Digitalmars-d 
wrote:
> On 11/30/2017 2:39 AM, Joakim wrote:
> > Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some
> > starting off with the earlier UCS-2:
> >
> > https://en.m.wikipedia.org/wiki/UTF-16#Usage
> >
> > Not saying either is better, each has their flaws, just pointing out
> > it's more than just Windows.
>
> I stand corrected.

I get the impression that the stuff that uses UTF-16 is mostly stuff that
picked an encoding early on in the Unicode game and thought that they picked
one that guaranteed that a code unit would be an entire character. Many of
them picked UCS-2 and then switched later to UTF-16, but once they picked a
16-bit encoding, they were kind of stuck.

Others - most notably C/C++ and the *nix world - picked UTF-8 for backwards
compatibility, and once it became clear that UCS-2 / UTF-16 wasn't going to
cut it for a code unit representing a character, most stuff that went
Unicode went UTF-8.

Language-wise, I think that most of the UTF-16 is driven by the fact that
Java went with UCS-2 / UTF-16, and C# followed them (both because they were
copying Java and because the Win32 API had gone with UCS-2 / UTF-16). So,
that's had a lot of influence on folks, though most others have gone with
UTF-8 for backwards compatibility and because it typically takes up less
space for non-Asian text. But the use of UTF-16 in Windows, Java, and C#
does seem to have resulted in some folks thinking that wide characters means
Unicode, and narrow characters meaning ASCII.

I really wish that everything would just got to UTF-8 and that UTF-16 would
die, but that would just break too much code. And if we were willing to do
that, I'm sure that we could come up with a better encoding than UTF-8 (e.g.
getting rid of Unicode normalization as being a thing and never having
multiple encodings for the same character), but _that_'s never going to
happen.

- Jonathan M Davis



More information about the Digitalmars-d mailing list