Major performance problem with std.array.front()

Sat Mar 8 15:59:20 PST 2014

On 3/8/14, 1:13 PM, Vladimir Panteleev wrote:
> On Saturday, 8 March 2014 at 20:50:49 UTC, Andrei Alexandrescu wrote:
>> On 3/8/14, 12:38 PM, Vladimir Panteleev wrote:
>>> On Saturday, 8 March 2014 at 20:05:36 UTC, Andrei Alexandrescu wrote:
>>>> That sounds quite like C++ plus ICU. It doesn't strike me as the
>>>> golden standard for Unicode integration.
>>>
>>> Why not? Because it sounds like D needs exactly that. Plus its amazing
>>> slicing and range capabilities, of course.
>>
>> Pretty much everyone using ICU hates it.
>
> I admit I never used it personally.

Time to do due diligence :o).

> I just thought you meant that
> implied "D implementations of relevant Unicode algorithms, adapted to D
> style (range interface)". Is there more to this than the limitations of
> C++ or the implementers' design choices?
>
>>> Have you or anyone you personally know tried to process text in D
>>> containing a writing system such as Sanskrit's?
>>
>> No. Point being?
>
> Point being, we don't have solid data to conclude whether D's current
> approach is actually good enough for such cases as you claim.

My only claim is that recognizing and iterating strings by code point is 
better than doing things by the octet.

> We do have one post in this thread:
> http://forum.dlang.org/post/jlgfkxlrhlzdpwkpsrot@forum.dlang.org
>
>> I think there are too large risks for that,
>
> For what? We have not discussed a possible plan yet. Are you referring
> to Walter Bright's proposal?

Any plan to inflict a large breaking change for strings incurs a risk. 
To add insult to injury, the improvement brought about by the change is 
debatable.

>> and it's quite unclear this is solving a problem. "Slightly better
>> Unicode support" is hardly a good justification.
>
> What this will solve:
>
> 1. Eliminating dangerous constructs, such as s.countUntil and s.indexOf
> both returning integers, yet possibly having different values in
> circumstances that the developer may not foresee.

I disagree there's any danger. They deal in code points, end of story.

> 2. Very high complexity of implementations (the ElementEncodingType
> problem previously mentioned).

I disagree with "very high". Besides if you want to do Unicode you gotta 
crack some eggs.

> 3. Hidden, difficult-to-detect performance problems. The reason why this
> thread was started. I've had to deal with them in several places myself.

I disagree with "hidden, difficult to detect". Also I'd add that I'd 
rather not have hidden, difficult to detect correctness problems.

> 4. Encourage D programmers to write Unicode-capable code that is correct
> in the full sense of the word.

I disagree we are presently discouraging them. I do agree a change would 
make certain things clearer. But not enough to nearly make up for the 
breakage.

> I think the above list has enough weight to merit at least considering
> *some* breaking changes.

I think a better approach is to figure what to add.

Andrei