The Case Against Autodecode

Thu Jun 2 16:25:15 PDT 2016

On Thursday, June 02, 2016 22:27:16 John Colvin via Digitalmars-d wrote:
> On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:
> > I wonder what rationale there is for Unicode to have two
> > different sequences of codepoints be treated as the same. It's
> > madness.
>
> There are languages that make heavy use of diacritics, often
> several on a single "character". Hebrew is a good example. Should
> there be only one valid ordering of any given set of diacritics
> on any given character? It's an interesting idea, but it's not
> how things are.

Yeah. I'm inclined to think that the fact that there are multiple
normalizations was a huge mistake on the part of the Unicode folks, but
we're stuck dealing with it. And as horrible as it is for most cases, maybe
it _does_ ultimately make sense because of certain use cases; I don't know.
But bad idea or not, we're stuck. :(

- Jonathan M Davis