The Case Against Autodecode

Thu Jun 2 14:44:50 PDT 2016

On 6/2/16 5:38 PM, cym13 wrote:
> Allow me to try another angle:
>
> - There are different levels of unicode support and you don't want to
> support them all transparently. That's understandable.

Cool.

> - The level you choose to support is the code point level. There are
> many good arguments about why this isn't a good default but you won't
> change your mind. I don't like that at all and I'm not alone but let's
> forget the entirety of the vocal D community for a moment.

You mean all 35 of them?

It's not about changing my mind! A massive thing that the code point 
level handling is the incumbent, and that changing it would need to mark 
an absolutely Earth-shattering improvement to be worth it!

> - A huge part of unicode chars can be normalized to fit your
> definition. That way not everything work (far from it) but a
> sufficiently big subset works.

Cool.

> - On the other hand without normalization it just doesn't make any
> sense from a user perspective.The ö example has clearly shown that
> much, you even admitted it yourself by stating that many counter
> arguments would have worked had the string been normalized).

Yah, operating at code point level does not come free of caveats. It is 
vastly superior to operating on code units, and did I mention it's the 
incumbent.

> - The most proeminent problem is with graphems that can have different
> representations as those that can't be normalized can't be searched as
> dchars as well.

Yah, I'd say if the program needs graphemes the option is there. Phobos 
by default deals with code points which are not perfect but are 
independent of representation, produce meaningful and consistent results 
with std.algorithm etc.

> - If autodecoding to code points is to stay and in an effort to find a
> compromise then normalizing should be done by default. Sure it would
> take some more time but it wouldn't break any code (I think) and would
> actually make things more correct. They still wouldn't be correct but
> I feel that something as crazy as unicode cannot be tackled
> generically anyway.

Some more work on normalization at strategic points in Phobos would be 
interesting!

Andrei