The Case Against Autodecode

Tue May 31 13:05:20 PDT 2016

On Tuesday, May 31, 2016 21:48:36 Timon Gehr via Digitalmars-d wrote:
> On 31.05.2016 21:40, Wyatt wrote:
> > On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote:
> >> The 'length' of a character is not one in all contexts.
> >> The following text takes six columns in my terminal:
> >>
> >> 日本語
> >> 123456
> >
> > That's a property of your font and font rendering engine, not Unicode.
>
> Sure. Hence "context". If you are e.g. trying to manually underline some
> text in console output, for example in a compiler error message,
> counting the number of characters will not actually be what you want,
> even though it works reliably for ASCII text.
>
> > (Also, it's probably not quite six columns; most fonts I've tested, 漢字
> > are rendered as something like 1.5 characters wide, assuming your
> > terminal doesn't overlap them.)
> >
> > -Wyatt
>
> It's precisely six columns in my terminal (also in emacs and in gedit).
>
> My point was, how can std.algorithm ever guess correctly what you
> /actually/ intended to do?

It can't, which is precisely why having it select for you was a bad design
decision. The programmer needs to be making that decision. And the fact that
Phobos currently makes that decision for you means that it's often doing the
wrong thing, and the fact that it chose to decode code points by default
means that it's often eating up unnecessary cycles to boot.

- Jonathan M Davis