Major performance problem with std.array.front()

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Fri Mar 7 11:57:23 PST 2014


On 3/6/14, 6:37 PM, Walter Bright wrote:
> In "Lots of low hanging fruit in Phobos" the issue came up about the
> automatic encoding and decoding of char ranges.
[snip]
> Is there any hope of fixing this?

There's nothing to fix.

Allow me to enumerate the functions of std.algorithm and how they work 
today and how they'd work with the proposed change. Let s be a variable 
of some string type.

1.

s.all!(x => x == 'é') currently works as expected. Proposed: fails silently.

2.

s.any!(x => x == 'é') currently works as expected. Proposed: fails silently.

3.

s.canFind!(x => x == 'é') currently works as expected. Proposed: fails 
silently.

4.

s.canFind('é') currently works as expected. Proposed: fails silently.

5.

s.count() currently works as expected. Proposed: fails silently.

6.

s.count!((a, b) => std.uni.toLower(a) == std.uni.toLower(b))("é") 
currently works as expected (with the known issues of lowercase 
conversion). Proposed: fails silently.

7.

s.count('é') currently works as expected. Proposed: fails silently.

8.

s.countUntil("a") currently work as expected. Proposed: fails silently. 
This applies to all variations of countUntil.

9.

s.endsWith('é') currently works as expected. Proposed: fails silently.

10.

s.find('é') currently works as expected. Proposed: fails silently. This 
applies to other variations of find that include custom predicates.

11.

...

I went down std.algorithm in the order listed in its documentation and 
found pernicious issues with almost every single algorithm.

I designed the range behavior of strings after much thinking and 
consideration back in the day when I designed std.algorithm. It was 
painfully obvious (but it seems to have been forgotten now that it's 
working so well) that approaching strings as arrays of char[] would 
break almost every single algorithm leaving us essentially in the 
pre-UTF C++aveman era.

Making strings bidirectional ranges has been a very good choice within 
the constraints. There was already a string type, and that was 
immutable(char)[], and a bunch of code depended on that definition.

Clearly one might argue that their app has no business dealing with 
diacriticals or Asian characters. But that's the typical provincial view 
that marred many languages' approach to UTF and internationalization. If 
you know your string is ASCII, the remedy is simple - don't use char[] 
and friends. From day 1, the type "char" was meant to mean "code unit of 
UTF characters".

So please ponder the above before going to do surgery on the patient 
that's going to kill him.


Andrei



More information about the Digitalmars-d mailing list