Major performance problem with std.array.front()

Fri Mar 7 14:35:46 PST 2014

On Friday, 7 March 2014 at 20:43:45 UTC, Vladimir Panteleev wrote:
> On Friday, 7 March 2014 at 19:57:38 UTC, Andrei Alexandrescu 
> wrote:
>> Allow me to enumerate the functions of std.algorithm and how 
>> they work today and how they'd work with the proposed change. 
>> Let s be a variable of some string type.
>
>> s.canFind('é') currently works as expected.
>
> No, it doesn't.
>
> import std.algorithm;
>
> void main()
> {
>     auto s = "cassé";
>     assert(s.canFind('é'));
> }
>
> That's the whole problem - all this hot steam and it still does 
> not work properly. Because it can't - not without pulling in 
> all of the Unicode algorithms implicitly, and that would be 
> much worse.
>
>> I went down std.algorithm in the order listed in its 
>> documentation and found pernicious issues with almost every 
>> single algorithm.
>
> All of your examples are variations of one and the same case: 
> searching for a non-ASCII dchar or dchar literal.
>
> How often does this pattern occur in real programs? I think the 
> only real metric is to try the change and find out.
>
>> Clearly one might argue that their app has no business dealing 
>> with diacriticals or Asian characters. But that's the typical 
>> provincial view that marred many languages' approach to UTF 
>> and internationalization.
>
> So is yours, if you think that making everything magically a 
> dchar is going to solve all problems.
>
> The TDPL example only showcases the problem. Yes, it works with 
> Swedish. Now try it again with Sanskrit.

+1
In Indian languages, a character consists of one or more UNICODE 
code points. For example, in Sanskrit "ddhrya" 
http://en.wikipedia.org/wiki/File:JanaSanskritSans_ddhrya.svg 
consists of 7 UNICODE code points. So to search for this char I 
have to use string search.

- Sarath