The Case Against Autodecode

Jonathan M Davis via Digitalmars-d digitalmars-d at puremagic.com
Tue May 31 18:56:57 PDT 2016


On Tuesday, May 31, 2016 20:38:14 Nick Sabalausky via Digitalmars-d wrote:
> On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:
> > On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:
> >> On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote:
> >>> Let's put the question this way. Given the following string, what do
> >>> *you*  think walkLength should return?
> >>>
> >>>     şŭt̥ḛ́k̠
> >>
> >> The number of code units in the string. That's the contract promised and
> >> honored by Phobos. -- Andrei
> >
> > Code points I mean. -- Andrei
>
> Yes, we know it's the contract. ***That's the problem.*** As everybody
> is saying, it *SHOULDN'T* be the contract.
>
> Why shouldn't it be the contract? Because it's proven itself, both
> logically (as presented by pretty much everybody other than you in both
> this and other threads) and empirically (in phobos, warp, and other user
> code) to be both the least useful and most PITA option.

Exactly. Operating at the code point level rarely makes sense. What sorts of
algorithms purposefully do that in a typical program? Unless you're doing
very specific Unicode stuff or somehow know that your strings don't contain
any graphemes that are made up of multiple code points, operating at the
code point level is just bug-prone, and unless you're using dchar[]
everywhere, it's slow to boot, because you're strings have to be decoded
whether the algorithm needs to or not.

I think that it's very safe to say that the vast majority of string
algorithms are either able to operate at the code unit level without
decoding (though possibly encoding another string to match - e.g. with a
string comparison or search), or they have to operate at the grapheme level
in order to deal with full characters. A code point is borderline useless on
its own. It's just a step above the different UTF encodings without actually
getting to proper characters.

- Jonathan M Davis




More information about the Digitalmars-d mailing list