The Case Against Autodecode

Andrei Alexandrescu via Digitalmars-d digitalmars-d at puremagic.com
Fri May 27 06:32:14 PDT 2016


On 5/27/16 7:19 AM, Chris wrote:
> On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote:
> [snip]
>>
>> I would agree only with the amendment "...if used naively", which is
>> important. Knowledge of how autodecoding works is a prerequisite for
>> writing fast string code in D. Also, little code should deal with one
>> code unit or code point at a time; instead, it should use standard
>> library algorithms for searching, matching etc. When needed, iterating
>> every code unit is trivially done through indexing.
>
> I disagree.

Misunderstanding.

> "if used naively" shouldn't be the default. A user (naively)
> expects string algorithms to work as efficiently as possible without
> overheads.

That's what happens with autodecoding.

>> Also allow me to point that much of the slowdown can be addressed
>> tactically. The test c < 0x80 is highly predictable (in ASCII-heavy
>> text) and therefore easily speculated. We can and we should arrange
>> code to minimize impact.
>
> And what if you deal with non-ASCII heavy text? Does the user have to
> guess an micro-optimize for simple use cases?

Misunderstanding.

>>> 5. Very few algorithms require decoding.
>>
>> The key here is leaving it to the standard library to do the right
>> thing instead of having the user wonder separately for each case.
>> These uses don't need decoding, and the standard library correctly
>> doesn't involve it (or if it currently does it has a bug):
>>
>> s.find("abc")
>> s.findSplit("abc")
>> s.findSplit('a')
>> s.count!(c => "!()-;:,.?".canFind(c)) // punctuation
>>
>> However the following do require autodecoding:
>>
>> s.walkLength
>> s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation
>> s.count!(c => c >= 32) // non-control characters
>>
>> Currently the standard library operates at code point level even
>> though inside it may choose to use code units when admissible. Leaving
>> such a decision to the library seems like a wise thing to do.
>
> But how is the user supposed to know without being a core contributor to
> Phobos?

Misunderstanding. All examples work properly today because of 
autodecoding. -- Andrei



More information about the Digitalmars-d mailing list