Major performance problem with std.array.front()

Nick Sabalausky SeeWebsiteToContactMe at semitwist.com
Sun Mar 9 01:24:03 PST 2014


On 3/7/2014 6:33 PM, H. S. Teoh wrote:
> On Fri, Mar 07, 2014 at 11:13:50PM +0000, Sarath Kodali wrote:
>> On Friday, 7 March 2014 at 22:35:47 UTC, Sarath Kodali wrote:
>>>
>>> +1
>>> In Indian languages, a character consists of one or more UNICODE
>>> code points. For example, in Sanskrit "ddhrya"
>>> http://en.wikipedia.org/wiki/File:JanaSanskritSans_ddhrya.svg
>>> consists of 7 UNICODE code points. So to search for this char I
>>> have to use string search.
>>>
>>> - Sarath
>>
>> Oops, incomplete reply ...
>>
>> Since a single "alphabet" in Indian languages can contain multiple
>> code-points, iterating over single code-points is like iterating
>> over char[] for non English European languages. So decode is of no
>> use other than decreasing the performance. A raw char[] comparison
>> is much faster.
>
> Yes. The more I think about it, the more auto-decoding sounds like a
> wrong decision. The question, though, is whether it's worth the massive
> code breakage needed to undo it. :-(
>

I'm leaning the same way too. But I also think Andrei is right that, at 
this point in time, it'd be a terrible move to change things so that "by 
code unit" is default. For better or worse, that ship has sailed.

Perhaps we *can* deal with the auto-decoding problem not by killing 
auto-decoding, but by marginalizing it in an additive way:

Convincing arguments have been made that any string-processing code 
which *isn't* done entirely with the official Unicode algorithms is 
likely wrong *regardless* of whether std.algorithm defaults to 
per-code-unit or per-code-point.

So...How's this?: We add any of these Unicode algorithms we may be 
missing, encourage their use for strings, discourage use of 
std.algorithm for string processing, and in the meantime, just do our 
best to reduce unnecessary decoding wherever possible. Then we call it a 
day and all be happy :)



More information about the Digitalmars-d mailing list