Major performance problem with std.array.front()

Sat Mar 8 19:53:32 PST 2014

On Sunday, 9 March 2014 at 03:26:40 UTC, Andrei Alexandrescu 
wrote:
> And it's not like people aren't talking. In contrast, D has 
> been (and often rightly) criticized in the past for things like 
> floating point performance and garbage collection. No evidence 
> we are having an acute performance problem with UTF strings.

The size of this thread is one factor. But I see your point - I 
agree that is evidently not one of D's more glaring current 
problems. I hope I never alluded to that not being the case. That 
doesn't mean the problem doesn't exist at all, though.

>> If UTF decoding was explicit, the problem would stand out. I 
>> don't think
>> this is a valid argument.
>
> Yours? Indeed isn't, if what you want is iterate by code unit 
> (= meaningless for all but ASCII strings) by default.

I don't understand this argument. Iterating by code unit is not 
meaningless if you don't want to extract meaning from each unit 
iteration. For example, if you're parsing JSON or XML, you only 
care about the syntax characters, which are all ASCII. And there 
is no confusion of "what exactly are we counting here".

>> This was debated... people should not be looking at individual 
>> code
>> points, unless they really know what they're doing.
>
> Should they be looking at code units instead?

No. They should only be looking at substrings.

Unless they're e.g. parsing a computer language (regardless if it 
has international text data), as above.

>> We are going in circles. People should have very good reasons 
>> for
>> looking at individual graphemes as well.
>
> And it's good we have increasing support for graphemes. I don't 
> think they should be the default.

I don't think so either. Did I somehow imply that?

> What is an objective summary? Those who want to inflict massive 
> breakage are not even done arguing we have a better design.

 From my POV, I could say I see consensus, with just you defending 
a decision you made a while ago :) But I'd prefer a constructive 
discussion.

Anyway, I don't want to "inflict massive breakage" either. I want 
the amount of breakage to be a justified cost of fixing a mistake 
and permanently improving the language's design going forward.

Here's what I have so far, BTW:
http://wiki.dlang.org/Element_type_of_string_ranges
I'll have to review it in the morning. Or rather, afternoon, 
given that it's 6 AM here.

> I'm afraid burden of proof is on you.

Why? I'm not saying that if you can't produce an example of 
breakage then your arguments are invalid. Rather, concrete 
examples give us a concrete problem to work with. I'm not trying 
to put any "burden of proof" on anyone.

> That's great. Yes, we're exchanging jabs right now which is not 
> our best use of time. Also in the interest of time, please 
> understand you'd need to show the second coming if you want to 
> break backward compatibility. Additions are a much better path.

Even a teensy-weensy breakage? :)

> Far as I'm concerned every breakage of string processing is 
> unacceptable or at least very undesirable.

In all seriousness, at this point I'm worried that you will 
defend the status quo even if the breakage turns out minimal. 
Instead of dealing with absolutes, advantages and disadvantages 
should be weighed against another (even with the 
breaking-backwards-compatibility penalty being very high).

> Unit. s.byChar.front is a (possibly ref, possibly qualified) 
> char.

So... does byChar for wstrings do the same thing as byWchar? And 
what if you want to iterate a wstring by char? Wouldn't it be 
better to have byChar/byWchar/byDchar be a range of 
char/wchar/dchar regardless of the string type, and have 
byCodeUnit which iterates by the code unit type?