Major performance problem with std.array.front()
Vladimir Panteleev
vladimir at thecybershadow.net
Sat Mar 8 19:53:32 PST 2014
On Sunday, 9 March 2014 at 03:26:40 UTC, Andrei Alexandrescu
wrote:
> And it's not like people aren't talking. In contrast, D has
> been (and often rightly) criticized in the past for things like
> floating point performance and garbage collection. No evidence
> we are having an acute performance problem with UTF strings.
The size of this thread is one factor. But I see your point - I
agree that is evidently not one of D's more glaring current
problems. I hope I never alluded to that not being the case. That
doesn't mean the problem doesn't exist at all, though.
>> If UTF decoding was explicit, the problem would stand out. I
>> don't think
>> this is a valid argument.
>
> Yours? Indeed isn't, if what you want is iterate by code unit
> (= meaningless for all but ASCII strings) by default.
I don't understand this argument. Iterating by code unit is not
meaningless if you don't want to extract meaning from each unit
iteration. For example, if you're parsing JSON or XML, you only
care about the syntax characters, which are all ASCII. And there
is no confusion of "what exactly are we counting here".
>> This was debated... people should not be looking at individual
>> code
>> points, unless they really know what they're doing.
>
> Should they be looking at code units instead?
No. They should only be looking at substrings.
Unless they're e.g. parsing a computer language (regardless if it
has international text data), as above.
>> We are going in circles. People should have very good reasons
>> for
>> looking at individual graphemes as well.
>
> And it's good we have increasing support for graphemes. I don't
> think they should be the default.
I don't think so either. Did I somehow imply that?
> What is an objective summary? Those who want to inflict massive
> breakage are not even done arguing we have a better design.
From my POV, I could say I see consensus, with just you defending
a decision you made a while ago :) But I'd prefer a constructive
discussion.
Anyway, I don't want to "inflict massive breakage" either. I want
the amount of breakage to be a justified cost of fixing a mistake
and permanently improving the language's design going forward.
Here's what I have so far, BTW:
http://wiki.dlang.org/Element_type_of_string_ranges
I'll have to review it in the morning. Or rather, afternoon,
given that it's 6 AM here.
> I'm afraid burden of proof is on you.
Why? I'm not saying that if you can't produce an example of
breakage then your arguments are invalid. Rather, concrete
examples give us a concrete problem to work with. I'm not trying
to put any "burden of proof" on anyone.
> That's great. Yes, we're exchanging jabs right now which is not
> our best use of time. Also in the interest of time, please
> understand you'd need to show the second coming if you want to
> break backward compatibility. Additions are a much better path.
Even a teensy-weensy breakage? :)
> Far as I'm concerned every breakage of string processing is
> unacceptable or at least very undesirable.
In all seriousness, at this point I'm worried that you will
defend the status quo even if the breakage turns out minimal.
Instead of dealing with absolutes, advantages and disadvantages
should be weighed against another (even with the
breaking-backwards-compatibility penalty being very high).
> Unit. s.byChar.front is a (possibly ref, possibly qualified)
> char.
So... does byChar for wstrings do the same thing as byWchar? And
what if you want to iterate a wstring by char? Wouldn't it be
better to have byChar/byWchar/byDchar be a range of
char/wchar/dchar regardless of the string type, and have
byCodeUnit which iterates by the code unit type?
More information about the Digitalmars-d
mailing list