Major performance problem with std.array.front()

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sun Mar 9 11:34:16 PDT 2014


On 3/9/14, 11:19 AM, Peter Alexander wrote:
> On Sunday, 9 March 2014 at 17:48:47 UTC, Andrei Alexandrescu wrote:
>> On 3/9/14, 10:34 AM, Peter Alexander wrote:
>>> If we assume strings are normalized then substring search, equality
>>> testing, sorting all work the same with either code units or code
>>> points.
>>
>> But others such as edit distance or equal(some_string, some_wstring)
>> will not.
>
> equal(string, wstring) should either not compile, or would be overloaded
> to do the right thing.

These would be possible designs each with its pros and cons. The current 
design works out of the box across all encodings. It has its own pros 
and cons. Puts in perspective what should and shouldn't be.

> In an ideal world, char, wchar, and dchar should
> not be comparable.

Probably. But that has nothing to do with equal() working.

> Edit distance on code points is of questionable utility. Like Vladimir
> says, its meaning is pretty philosophical, even in ASCII (is "\r\n"
> really two "edits"? What is an "edit"?)

Nothing philosophical - it's as cut and dried as it gets. An edit is as 
defined by the Levenshtein algorithm using code points as the unit of 
comparison.

>>> I can't think of any case where you would want to count characters.
>>
>> wc
>
> % echo € | wc -c
> 4
>
> :-)

Noice.

>> (Generally: I've always been very very very doubtful about arguments
>> that start with "I can't think of..." because I've historically tried
>> them so many times, and with terrible results.)
>
> Fair point... but it's not as if we would be removing the ability (you
> could always do s.byCodePoint.count); we are talking about defaults. The
> argument that we shouldn't iterate by code unit by default because
> people might want to count code points is without substance. Also, with
> the proposal, string.count(dchar) would encode the dchar to a string
> first for performance, so it would still work.

That's a good enhancement for the current design as well - care to 
submit a request for it?

> Anyway, I think this discussion isn't really going anywhere so I think
> I'll agree to disagree and retire.

The part that advocates a breaking change will not indeed lead anywhere. 
The parts where we improve Unicode support for D is very fertile.


Andrei



More information about the Digitalmars-d mailing list