The Case Against Autodecode
Marc Schütz via Digitalmars-d
digitalmars-d at puremagic.com
Fri May 13 04:00:19 PDT 2016
On Friday, 13 May 2016 at 10:38:09 UTC, Jonathan M Davis wrote:
> Ideally, algorithms would be Unicode aware as appropriate, but
> the default would be to operate on code units with wrappers to
> handle decoding by code point or grapheme. Then it's easy to
> write fast code while still allowing for full correctness.
> Granted, it's not necessarily easy to get correct code that
> way, but anyone who wants fully correctness without caring
> about efficiency can just use ranges of graphemes. Ranges of
> code points are rare regardless.
char[], wchar[] etc. can simply be made non-ranges, so that the
user has to choose between .byCodePoint, .byCodeUnit (or
.representation as it already exists), .byGrapheme, or even
higher-level units like .byLine or .byWord. Ranges of char, wchar
however stay as they are today. That way it's harder to
accidentally get it wrong.
>
> Based on what I've seen in previous conversations on
> auto-decoding over the past few years (be it in the newsgroup,
> on github, or at dconf), most of the core devs think that
> auto-decoding was a major blunder that we continue to pay for.
> But unfortunately, even if we all agree that it was a huge
> mistake and want to fix it, the question remains of how to do
> that without breaking tons of code - though since AFAIK, Andrei
> is still in favor of auto-decoding, we'd have a hard time going
> forward with plans to get rid of it even if we had come up with
> a good way of doing so. But I would love it if we could get rid
> of auto-decoding and clean up string handling in D.
There is a simple deprecation path that's already been suggested.
`isInputRange` and friends can output a helpful deprecation
warning when they're called with a range that currently triggers
auto-decoding.
More information about the Digitalmars-d
mailing list