The Case Against Autodecode

Tue May 31 09:12:28 PDT 2016

On Sunday, May 29, 2016 13:47:32 H. S. Teoh via Digitalmars-d wrote:
> On Sun, May 29, 2016 at 03:55:22PM -0400, Andrei Alexandrescu via 
Digitalmars-d wrote:
> > So now code points are good? -- Andrei
>
> It depends on what you're trying to accomplish. That's the point we're
> trying to get at.  For some operations, working with code points makes
> the most sense. But for other operations, it does not.  There is no one
> representation that is best for all situations; it needs to be decided
> on a case-by-case basis.  Which is why forcing everything to decode to
> code points eventually leads to problems.

Exactly. And even a given function can't necessarily always be defined to
use a specific level of Unicode, because whether that's correct or not
depends on what the programmer is actually trying to do with the function.
And then there are cases where the programmer knows enough about the data
that they're dealing with that they're able to operate at a different level
of Unicode than would normally be correct. The most obvious example of that
is when you know that your strings are pure ASCII, but it's not the only
case.

We should strive to make Phobos operate correctly on strings by default
where we can, but there are cases where the programmer needs to know enough
to specify the behavior that they want, and deciding for them is just going
to lead to behavior that happens to be right some of the time while making
it hard for code using Phobos to have the correct behavior the rest of the
time. And the default behavior that we currently have is inefficient to
boot.

- Jonathan M Davis