The Case Against Autodecode

Jonathan M Davis via Digitalmars-d digitalmars-d at puremagic.com
Tue May 31 11:34:54 PDT 2016


On Tuesday, May 31, 2016 16:29:33 Joakim via Digitalmars-d wrote:
> UTF-8 is an antiquated hack that needs to be eradicated.  It
> forces all other languages than English to be twice as long, for
> no good reason, have fun with that when you're downloading text
> on a 2G connection in the developing world.  It is unnecessarily
> inefficient, which is precisely why auto-decoding is a problem.
> It is only a matter of time till UTF-8 is ditched.

Considering that *nix land uses UTF-8 almost exclusively, and many C
libraries do even on Windows, I very much doubt that UTF-8 is going anywhere
anytime soon - if ever. The Win32 API does use UTF-16, and Java and C# do,
but vast sea of code that is C or C++ generally uses UTF-8 as do plenty of
other programming languages.

And even aside from English, most European languages are going to be more
efficient with UTF-8, because they're still primarily ASCII even if they
contain characters that are not. Stuff like Chinese is definitely worse in
UTF-8 than it would be in UTF-16, but there are a lot of languages other
than English which are going to encode better with UTF-8 than UTF-16 - let
alone UTF-32.

Regardless, UTF-8 isn't going anywhere anytime soon. _Way_ too much uses it
for it to be going anywhere, and most folks have no problem with that. Any
attempt to get rid of it would be a huge, uphill battle.

But D supports UTF-8, UTF-16, _and_ UTF-32 natively - even without involving
the standard library - so anyone who wants to avoid UTF-8 is free to do so.

- Jonathan M Davis



More information about the Digitalmars-d mailing list