The Case Against Autodecode
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Fri May 27 06:32:14 PDT 2016
On 5/27/16 7:19 AM, Chris wrote:
> On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote:
> [snip]
>>
>> I would agree only with the amendment "...if used naively", which is
>> important. Knowledge of how autodecoding works is a prerequisite for
>> writing fast string code in D. Also, little code should deal with one
>> code unit or code point at a time; instead, it should use standard
>> library algorithms for searching, matching etc. When needed, iterating
>> every code unit is trivially done through indexing.
>
> I disagree.
Misunderstanding.
> "if used naively" shouldn't be the default. A user (naively)
> expects string algorithms to work as efficiently as possible without
> overheads.
That's what happens with autodecoding.
>> Also allow me to point that much of the slowdown can be addressed
>> tactically. The test c < 0x80 is highly predictable (in ASCII-heavy
>> text) and therefore easily speculated. We can and we should arrange
>> code to minimize impact.
>
> And what if you deal with non-ASCII heavy text? Does the user have to
> guess an micro-optimize for simple use cases?
Misunderstanding.
>>> 5. Very few algorithms require decoding.
>>
>> The key here is leaving it to the standard library to do the right
>> thing instead of having the user wonder separately for each case.
>> These uses don't need decoding, and the standard library correctly
>> doesn't involve it (or if it currently does it has a bug):
>>
>> s.find("abc")
>> s.findSplit("abc")
>> s.findSplit('a')
>> s.count!(c => "!()-;:,.?".canFind(c)) // punctuation
>>
>> However the following do require autodecoding:
>>
>> s.walkLength
>> s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation
>> s.count!(c => c >= 32) // non-control characters
>>
>> Currently the standard library operates at code point level even
>> though inside it may choose to use code units when admissible. Leaving
>> such a decision to the library seems like a wise thing to do.
>
> But how is the user supposed to know without being a core contributor to
> Phobos?
Misunderstanding. All examples work properly today because of
autodecoding. -- Andrei
More information about the Digitalmars-d
mailing list