Major performance problem with std.array.front()

Tue Mar 18 07:06:18 PDT 2014

Am Mon, 10 Mar 2014 17:44:22 -0400
schrieb Nick Sabalausky <SeeWebsiteToContactMe at semitwist.com>:

> On 3/7/2014 8:40 AM, Michel Fortin wrote:
> > On 2014-03-07 03:59:55 +0000, "bearophile" <bearophileHUGS at lycos.com> said:
> >
> >> Walter Bright:
> >>
> >>> I understand this all too well. (Note that we currently have a
> >>> different silent problem: unnoticed large performance problems.)
> >>
> >> On the other hand your change could introduce Unicode-related bugs in
> >> future code (that the current Phobos avoids) (and here I am not
> >> talking about code breakage).
> >
> > The way Phobos works isn't any more correct than dealing with code
> > units. Many graphemes span on multiple code points -- because of
> > combined diacritics or character variant modifiers -- and decoding at
> > the code-point level is thus often insufficient for correctness.
> >
> 
> Well, it is *more* correct, as many western languages are more likely in 
> current Phobos to "just work" in most cases. It's just that things still 
> aren't completely correct overall.
> 
> >  From my experience, I'd suggest these basic operations for a "string
> > range" instead of the regular range interface:
> >
> > .empty
> > .frontCodeUnit
> > .frontCodePoint
> > .frontGrapheme
> > .popFrontCodeUnit
> > .popFrontCodePoint
> > .popFrontGrapheme
> > .codeUnitLength (aka length)
> > .codePointLength (for dchar[] only)
> > .codePointLengthLinear
> > .graphemeLengthLinear
> >
> > Someone should be able to mix all the three 'front' and 'pop' function
> > variants above in any code dealing with a string type. In my XML parser
> > for instance I regularly use frontCodeUnit to avoid the decoding penalty
> > when matching the next character with an ASCII one such as '<' or '&'.
> > An API like the one above forces you to be aware of the level you're
> > working on, making bugs and inefficiencies stand out (as long as you're
> > familiar with each representation).
> >
> > If someone wants to use a generic array/range algorithm with a string,
> > my opinion is that he should have to wrap it in a range type that maps
> > front and popFront to one of the above variant. Having to do that should
> > make it obvious that there's an inefficiency there, as you're using an
> > algorithm that wasn't tailored to work with strings and that more
> > decoding than strictly necessary is being done.
> >
> 
> I actually like this suggestion quite a bit.

+1 Reminds me of my proposal for Rust
(https://github.com/mozilla/rust/issues/7043#issuecomment-19187984)

-- 
Marco