First Impressions!

Jonathan M Davis newsgroup.d at jmdavisprog.com
Sat Dec 2 04:08:54 UTC 2017


On Friday, December 01, 2017 15:54:31 Walter Bright via Digitalmars-d wrote:
> On 11/30/2017 9:56 AM, Jonathan M Davis wrote:
> > I'm sure that we could come up with a better encoding than UTF-8 (e.g.
> > getting rid of Unicode normalization as being a thing and never having
> > multiple encodings for the same character), but _that_'s never going to
> > happen.
>
> UTF-8 is not the cause of that particular problem, it's caused by the
> Unicode committee being a committee. Other Unicode problems are caused by
> the committee trying to add semantic information to code points, which
> causes nothing but problems. I.e. the committee forgot that Unicode is a
> character set, and nothing more.

Oh, definitely. UTF-8 is arguably the best that Unicode has, but Unicode in
general is what's broken, because the folks designing it made poor choices.
And personally, I think that their worst decisions tend to be at the code
point level (e.g. having the same character being representable by different
combinations of code points).

Quite possbily the most depressing thing that I've run into with Unicode
though was finding out that emojis had their own code points. Emojis are
specifically representable by a sequence of existing characters (usually
ASCII), because they came from folks trying to represent pictures with text.
The fact that they're then trying to put those pictures into the Unicode
standard just blatantly shows that the Unicode folks have lost sight of what
they're up to. It's like if they started trying to add Unicode characters
for words. It makes no sense. But unfortunately, we just have to live with
it... :(

- Jonathan M Davis



More information about the Digitalmars-d mailing list